Re: Review Request 50934: HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50934/ --- (Updated Aug. 11, 2016, 4:36 p.m.) Review request for hive and Eugene Koifman. Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-14233 This JIRA proposes to improve vectorization for ACID by eliminating row-by-row stitching when reading back ACID files. In the current implementation, a vectorized row batch is created by populating the batch one row at a time, before the vectorized batch is passed up along the operator pipeline. This row-by-row stitching limitation was because of the fact that the ACID insert/update/delete events from various delta files needed to be merged together before the actual version of a given row was found out. HIVE-14035 has enabled us to break away from that limitation by splitting ACID update events into a combination of delete+insert. In fact, it has now enabled us to create splits on delta files. Building on top of HIVE-14035, this JIRA proposes to solve this earlier bottleneck in the vectorized code path for ACID by now directly reading row batches from the underlying ORC files and avoiding any stitching altogether. Once a row batch is read from the split (which may be on a base/delta file), the deleted rows will be found by cross-referencing them against a data structure that will just keep track of deleted events (found in the deleted_delta files). This will lead to a large performance gain when reading ACID files in vectorized fashion, while enabling further optimizations in future that can be done on top of that. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31c5406f500c122a11eccef25b92d357cd4 ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java e46ca51eff9c230147166e9428d7f462d2f9e772 ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java PRE-CREATION ql/src/test/queries/clientpositive/acid_vectorization.q 832909bdb1bc79e01163373beed03eaaffcefd3d ql/src/test/results/clientpositive/acid_vectorization.q.out 1792979156ec361c85882ac8b6968e93d42b5f31 Diff: https://reviews.apache.org/r/50934/diff/ Testing --- Thanks, Saket Saurabh
[jira] [Created] (HIVE-14523) ACID performance improvement patches
Saket Saurabh created HIVE-14523: Summary: ACID performance improvement patches Key: HIVE-14523 URL: https://issues.apache.org/jira/browse/HIVE-14523 Project: Hive Issue Type: Test Affects Versions: 2.2.0 Reporter: Saket Saurabh Assignee: Saket Saurabh Priority: Trivial Attachments: HIVE-14035_14199_14233.01.patch This is a trivial non-functional JIRA that combines the features introduced HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14514) OrcRecordUpdater should clone writerOptions when creating delete event writers
Saket Saurabh created HIVE-14514: Summary: OrcRecordUpdater should clone writerOptions when creating delete event writers Key: HIVE-14514 URL: https://issues.apache.org/jira/browse/HIVE-14514 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 2.2.0 Reporter: Saket Saurabh Assignee: Eugene Koifman Priority: Minor When split-update is enabled for ACID, OrcRecordUpdater creates two sets of writers: one for the insert deltas and one for the delete deltas. The deleteEventWriter is initialized with similar writerOptions as the normal writer, except that it has a different callback handler. Due to the lack of copy constructor/ clone() method in writerOptions, the same writerOptions object is mutated to specify a different callback for the delete case. Although, this is harmless for now, but it may become a source of confusion and possible error in future. The ideal way to fix this would be to create a clone() method for writerOptions- however this requires that the parent class of WriterOptions in the OrcFile.WriterOptions should implement Cloneable or provide a copy constructor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 50934: HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50934/ --- Review request for hive and Eugene Koifman. Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-14233 Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31c5406f500c122a11eccef25b92d357cd4 ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java e46ca51eff9c230147166e9428d7f462d2f9e772 ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java PRE-CREATION ql/src/test/queries/clientpositive/acid_vectorization.q 832909bdb1bc79e01163373beed03eaaffcefd3d ql/src/test/results/clientpositive/acid_vectorization.q.out 1792979156ec361c85882ac8b6968e93d42b5f31 Diff: https://reviews.apache.org/r/50934/diff/ Testing --- This JIRA proposes to improve vectorization for ACID by eliminating row-by-row stitching when reading back ACID files. In the current implementation, a vectorized row batch is created by populating the batch one row at a time, before the vectorized batch is passed up along the operator pipeline. This row-by-row stitching limitation was because of the fact that the ACID insert/update/delete events from various delta files needed to be merged together before the actual version of a given row was found out. HIVE-14035 has enabled us to break away from that limitation by splitting ACID update events into a combination of delete+insert. In fact, it has now enabled us to create splits on delta files. Building on top of HIVE-14035, this JIRA proposes to solve this earlier bottleneck in the vectorized code path for ACID by now directly reading row batches from the underlying ORC files and avoiding any stitching altogether. Once a row batch is read from the split (which may be on a base/delta file), the deleted rows will be found by cross-referencing them against a data structure that will just keep track of deleted events (found in the deleted_delta files). This will lead to a large performance gain when reading ACID files in vectorized fashion, while enabling further optimizations in future that can be done on top of that. Thanks, Saket Saurabh
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
> On Aug. 4, 2016, 2:29 p.m., Sergey Shelukhin wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java, line 399 > > <https://reviews.apache.org/r/49766/diff/4/?file=1455576#file1455576line399> > > > > rowId is always -1 here. Intended? @Sergey: Thanks for pointing this out. It was a bug and not intended. - Saket --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/#review144816 --- On Aug. 8, 2016, 5:53 p.m., Saket Saurabh wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/49766/ > --- > > (Updated Aug. 8, 2016, 5:53 p.m.) > > > Review request for hive and Eugene Koifman. > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-14035 > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 70816bd > > hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java > 14f7316 > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java > 974c6b8 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 731caa8 > metastore/if/hive_metastore.thrift a2e35b8 > metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 > metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 > > metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java > 5a666f2 > metastore/src/gen/thrift/gen-php/metastore/Types.php d6f7f49 > metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 > metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 > > metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java > 3e74675 > ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a > ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 26e6443 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 449d889 > ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > efde2db > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8cb5e8a > ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 6caca98 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java af192fb > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java PRE-CREATION > ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 556df18 > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java > 6648829 > > Diff: https://reviews.apache.org/r/49766/diff/ > > > Testing > --- > > Tests for the feature are in > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly > integration tests that test end-to-end insert/update/delete scenarios > followed by compaction and cleaning. > > > Thanks, > > Saket Saurabh > >
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
> On Aug. 4, 2016, 2:29 p.m., Sergey Shelukhin wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java, line 588 > > <https://reviews.apache.org/r/49766/diff/4/?file=1455573#file1455573line588> > > > > this check is not done in the above "if" before adding the statementId > > to "last". What does this check mean i.e. what do negative numbers mean? @Sergey, this change is now no longer part of this patch. It got refactored away when I merged serializeDelta() function with serializeDeleteDelta(). Although in fact, it exists in current master as well. The negative numbers mean that we do not have a statement id for this delta. For example, when we run a minor compaction we produce delta of the form delta_x_y which has no statement id attached to it. In these cases, statement id would be -1. - Saket --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/#review144816 ----------- On Aug. 8, 2016, 5:53 p.m., Saket Saurabh wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/49766/ > --- > > (Updated Aug. 8, 2016, 5:53 p.m.) > > > Review request for hive and Eugene Koifman. > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-14035 > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 70816bd > > hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java > 14f7316 > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java > 974c6b8 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 731caa8 > metastore/if/hive_metastore.thrift a2e35b8 > metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 > metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 > > metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java > 5a666f2 > metastore/src/gen/thrift/gen-php/metastore/Types.php d6f7f49 > metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 > metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 > > metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java > 3e74675 > ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a > ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 26e6443 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 449d889 > ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > efde2db > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8cb5e8a > ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 6caca98 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java af192fb > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java PRE-CREATION > ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 556df18 > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java > 6648829 > > Diff: https://reviews.apache.org/r/49766/diff/ > > > Testing > --- > > Tests for the feature are in > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly > integration tests that test end-to-end insert/update/delete scenarios > followed by compaction and cleaning. > > > Thanks, > > Saket Saurabh > >
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/ --- (Updated Aug. 8, 2016, 5:53 p.m.) Review request for hive and Eugene Koifman. Changes --- Updated patch with comments/feedback from code review. Major changes are refactoring of AcidUtils.getAcidState(), CompactorMR, fixing of flush length related changes in OrcRecordUpdater, lazy initialization of deleteEventWriters and other minor changes. The patch adds additional unit tests to test schema evolution with ACID. It also adds a subclass to TestTxnCommands2 for replicating all the existing ACID test cases with split-update turned on. Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-14035 In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 70816bd hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 974c6b8 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 731caa8 metastore/if/hive_metastore.thrift a2e35b8 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 metastore/src/gen/thrift/gen-php/metastore/Types.php d6f7f49 metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 26e6443 ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 449d889 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java efde2db ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8cb5e8a ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java af192fb ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 556df18 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 6648829 Diff: https://reviews.apache.org/r/49766/diff/ Testing --- Tests for the feature are in ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly integration tests that test end-to-end insert/update/delete scenarios followed by compaction and cleaning. Thanks, Saket Saurabh
[jira] [Created] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
Saket Saurabh created HIVE-14448: Summary: Queries with predicate fail when ETL split strategy is chosen for ACID tables Key: HIVE-14448 URL: https://issues.apache.org/jira/browse/HIVE-14448 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.2.0 Reporter: Saket Saurabh When ETL split strategy is applied to ACID tables with predicate pushdown (SARG enabled), split generation fails for ACID. This bug will be usually exposed when working with data at scale, because in most otherwise cases only BI split strategy is chosen. My guess is that this is happening because the correct readerSchema is not being picked up when we try to extract SARG column names. Quickest way to reproduce is to add the following unit test to ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} @Test public void testETLSplitStrategyForACID() throws Exception { hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); runWorker(hiveConf); List rs = runStatementOnDriver("select * from " + Table.ACIDTBL + " where a = 1"); int[][] resultData = new int[][] {{1,2}}; Assert.assertEquals(stringifyValues(resultData), rs); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys
Saket Saurabh created HIVE-14366: Summary: Conversion of a Non-ACID table to an ACID table produces non-unique primary keys Key: HIVE-14366 URL: https://issues.apache.org/jira/browse/HIVE-14366 Project: Hive Issue Type: Bug Components: Transactions Reporter: Saket Saurabh When a Non-ACID table is converted to an ACID table, the primary key consisting of (original transaction id, bucket_id, row_id) is not generated uniquely. Currently, the row_id is always set to 0 for most rows. This leads to correctness issue for such tables. Quickest way to reproduce is to add the following unit test to ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} @Test public void testOriginalReader() throws Exception { FileSystem fs = FileSystem.get(hiveConf); FileStatus[] status; // 1. Insert five rows to Non-ACID table. runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) values(1,2),(3,4),(5,6),(7,8),(9,10)"); // 2. Convert NONACIDORCTBL to ACID table. runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET TBLPROPERTIES ('transactional'='true')"); // 3. Perform a major compaction. runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 'MAJOR'"); runWorker(hiveConf); // 3. Perform a delete. runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1"); // Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since (1,2) has been deleted. List rs = runStatementOnDriver("select a,b from " + Table.NONACIDORCTBL + " order by a,b"); int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}}; Assert.assertEquals(stringifyValues(resultData), rs); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
> On July 21, 2016, 12:42 a.m., Lefty Leverenz wrote: > > @Lefty, does the updated description for this config variable seem to explain better now? - Saket --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/#review143059 --- On July 27, 2016, 2:54 p.m., Saket Saurabh wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/49766/ > --- > > (Updated July 27, 2016, 2:54 p.m.) > > > Review request for hive and Eugene Koifman. > > > Repository: hive-git > > > Description > --- > > https://issues.apache.org/jira/browse/HIVE-14035 > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f > > hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java > 14f7316 > > hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java > 974c6b8 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > ca2a912 > metastore/if/hive_metastore.thrift 4d92b73 > metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 > metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 > > metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java > 5a666f2 > metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 > metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 > metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 > > metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java > 3e74675 > orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 > ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a > ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 > ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 63d02fb > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd > ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 6caca98 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 > ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 > > Diff: https://reviews.apache.org/r/49766/diff/ > > > Testing > --- > > Tests for the feature are in > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly > integration tests that test end-to-end insert/update/delete scenarios > followed by compaction and cleaning. > > > Thanks, > > Saket Saurabh > >
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/ --- (Updated July 27, 2016, 2:54 p.m.) Review request for hive and Eugene Koifman. Changes --- Refactor the way delete event writers are created for compaction case in favor of a better abstraction. Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-14035 In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 974c6b8 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ca2a912 metastore/if/hive_metastore.thrift 4d92b73 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 63d02fb ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 Diff: https://reviews.apache.org/r/49766/diff/ Testing --- Tests for the feature are in ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly integration tests that test end-to-end insert/update/delete scenarios followed by compaction and cleaning. Thanks, Saket Saurabh
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/ --- (Updated July 26, 2016, 11:30 a.m.) Review request for hive and Eugene Koifman. Changes --- Add more UTs to specifically test AcidUtils and various other compaction scenarios. Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-14035 In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 974c6b8 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ca2a912 metastore/if/hive_metastore.thrift 4d92b73 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 63d02fb ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java b0f8c8b ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 Diff: https://reviews.apache.org/r/49766/diff/ Testing --- Tests for the feature are in ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly integration tests that test end-to-end insert/update/delete scenarios followed by compaction and cleaning. Thanks, Saket Saurabh
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/ --- (Updated July 20, 2016, 3:55 p.m.) Review request for hive and Eugene Koifman. Changes --- Updated the patch by rebasing with master. No additional code changes. Same as Patch #10 at https://issues.apache.org/jira/browse/HIVE-14035 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-14035 In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66203a5 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 974c6b8 metastore/if/hive_metastore.thrift 4d92b73 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 69d58d6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java b0f8c8b ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java e577961 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ef0bb3d ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 Diff: https://reviews.apache.org/r/49766/diff/ Testing --- Tests for the feature are in ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly integration tests that test end-to-end insert/update/delete scenarios followed by compaction and cleaning. Thanks, Saket Saurabh
[jira] [Created] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
Saket Saurabh created HIVE-14233: Summary: Improve vectorization for ACID by eliminating row-by-row stitching Key: HIVE-14233 URL: https://issues.apache.org/jira/browse/HIVE-14233 Project: Hive Issue Type: New Feature Components: Transactions, Vectorization Reporter: Saket Saurabh Assignee: Saket Saurabh This JIRA proposes to improve vectorization for ACID by eliminating row-by-row stitching when reading back ACID files. In the current implementation, a vectorized row batch is created by populating the batch one row at a time, before the vectorized batch is passed up along the operator pipeline. This row-by-row stitching limitation was because of the fact that the ACID insert/update/delete events from various delta files needed to be merged together before the actual version of a given row was found out. HIVE-14035 has enabled us to break away from that limitation by splitting ACID update events into a combination of delete+insert. In fact, it has now enabled us to create splits on delta files. Building on top of HIVE-14035, this JIRA proposes to solve this earlier bottleneck in the vectorized code path for ACID by now directly reading row batches from the underlying ORC files and avoiding any stitching altogether. Once a row batch is read from the split (which may be on a base/delta file), the deleted rows will be found by cross-referencing them against a data structure that will just keep track of deleted events (found in the deleted_delta files). This will lead to a large performance gain when reading ACID files in vectorized fashion, while enabling further optimizations in future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14199) Enable Bucket Pruning for ACID tables
Saket Saurabh created HIVE-14199: Summary: Enable Bucket Pruning for ACID tables Key: HIVE-14199 URL: https://issues.apache.org/jira/browse/HIVE-14199 Project: Hive Issue Type: Improvement Components: Transactions Reporter: Saket Saurabh Assignee: Saket Saurabh Currently, ACID tables do not benefit from the bucket pruning feature introduced in HIVE-11525. The reason for this has been the fact that bucket pruning happens at split generation level and for ACID, traditionally the delta files were never split. The parallelism for ACID was then restricted to the number of buckets. There would be as many splits as the number of buckets and each worker processing one split would inevitably read all the delta files for that bucket, even when the query may have originally required only one of the buckets to be read. However, HIVE-14035 now enables even the delta files to be also split. What this means is that now we have enough information at the split generation level to determine appropriate buckets to process for the delta files. This can efficiently allow us to prune unnecessary buckets for delta files and will lead to good performance gain for a large number of selective queries on ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/ --- (Updated July 7, 2016, 10:26 a.m.) Review request for hive and Eugene Koifman. Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-14035 In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b13fc65 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 0c6b9ea metastore/if/hive_metastore.thrift 4d92b73 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 36f38f6 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 227a051 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 69d58d6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java b0f8c8b ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java e577961 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 82abd52 ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java e76c925 ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 5745dee Diff: https://reviews.apache.org/r/49766/diff/ Testing (updated) --- Tests for the feature are in ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly integration tests that test end-to-end insert/update/delete scenarios followed by compaction and cleaning. Thanks, Saket Saurabh
Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49766/ --- (Updated July 7, 2016, 10:09 a.m.) Review request for hive and Eugene Koifman. Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-14035 In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b13fc65 hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 0c6b9ea metastore/if/hive_metastore.thrift 4d92b73 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 36f38f6 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 227a051 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 69d58d6 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java b0f8c8b ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java e577961 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 82abd52 ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java e76c925 ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 5745dee Diff: https://reviews.apache.org/r/49766/diff/ Testing --- Thanks, Saket Saurabh
[jira] [Created] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
Saket Saurabh created HIVE-14035: Summary: Enable predicate pushdown to delta files created by ACID Transactions Key: HIVE-14035 URL: https://issues.apache.org/jira/browse/HIVE-14035 Project: Hive Issue Type: New Feature Components: Transactions Reporter: Saket Saurabh Priority: Minor In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)