Re: Review Request 50934: HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-11 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50934/
---

(Updated Aug. 11, 2016, 4:36 p.m.)


Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14233
This JIRA proposes to improve vectorization for ACID by eliminating row-by-row 
stitching when reading back ACID files. In the current implementation, a 
vectorized row batch is created by populating the batch one row at a time, 
before the vectorized batch is passed up along the operator pipeline. This 
row-by-row stitching limitation was because of the fact that the ACID 
insert/update/delete events from various delta files needed to be merged 
together before the actual version of a given row was found out. HIVE-14035 has 
enabled us to break away from that limitation by splitting ACID update events 
into a combination of delete+insert. In fact, it has now enabled us to create 
splits on delta files.
Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
bottleneck in the vectorized code path for ACID by now directly reading row 
batches from the underlying ORC files and avoiding any stitching altogether. 
Once a row batch is read from the split (which may be on a base/delta file), 
the deleted rows will be found by cross-referencing them against a data 
structure that will just keep track of deleted events (found in the 
deleted_delta files). This will lead to a large performance gain when reading 
ACID files in vectorized fashion, while enabling further optimizations in 
future that can be done on top of that.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 
334cb31c5406f500c122a11eccef25b92d357cd4 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
e46ca51eff9c230147166e9428d7f462d2f9e772 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_vectorization.q 
832909bdb1bc79e01163373beed03eaaffcefd3d 
  ql/src/test/results/clientpositive/acid_vectorization.q.out 
1792979156ec361c85882ac8b6968e93d42b5f31 

Diff: https://reviews.apache.org/r/50934/diff/


Testing
---


Thanks,

Saket Saurabh



[jira] [Created] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)
Saket Saurabh created HIVE-14523:


 Summary: ACID performance improvement patches
 Key: HIVE-14523
 URL: https://issues.apache.org/jira/browse/HIVE-14523
 Project: Hive
  Issue Type: Test
Affects Versions: 2.2.0
Reporter: Saket Saurabh
Assignee: Saket Saurabh
Priority: Trivial
 Attachments: HIVE-14035_14199_14233.01.patch

This is a trivial non-functional JIRA that combines the features introduced 
HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14514) OrcRecordUpdater should clone writerOptions when creating delete event writers

2016-08-10 Thread Saket Saurabh (JIRA)
Saket Saurabh created HIVE-14514:


 Summary: OrcRecordUpdater should clone writerOptions when creating 
delete event writers
 Key: HIVE-14514
 URL: https://issues.apache.org/jira/browse/HIVE-14514
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Saket Saurabh
Assignee: Eugene Koifman
Priority: Minor


When split-update is enabled for ACID, OrcRecordUpdater creates two sets of 
writers: one for the insert deltas and one for the delete deltas. The 
deleteEventWriter is initialized with similar writerOptions as the normal 
writer, except that it has a different callback handler. Due to the lack of 
copy constructor/ clone() method in writerOptions, the same writerOptions 
object is mutated to specify a different callback for the delete case. 
Although, this is harmless for now, but it may become a source of confusion and 
possible error in future. The ideal way to fix this would be to create a 
clone() method for writerOptions- however this requires that the parent class 
of WriterOptions in the OrcFile.WriterOptions should implement Cloneable or 
provide a copy constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 50934: HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-09 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50934/
---

Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14233


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 
334cb31c5406f500c122a11eccef25b92d357cd4 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
e46ca51eff9c230147166e9428d7f462d2f9e772 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_vectorization.q 
832909bdb1bc79e01163373beed03eaaffcefd3d 
  ql/src/test/results/clientpositive/acid_vectorization.q.out 
1792979156ec361c85882ac8b6968e93d42b5f31 

Diff: https://reviews.apache.org/r/50934/diff/


Testing
---

This JIRA proposes to improve vectorization for ACID by eliminating row-by-row 
stitching when reading back ACID files. In the current implementation, a 
vectorized row batch is created by populating the batch one row at a time, 
before the vectorized batch is passed up along the operator pipeline. This 
row-by-row stitching limitation was because of the fact that the ACID 
insert/update/delete events from various delta files needed to be merged 
together before the actual version of a given row was found out. HIVE-14035 has 
enabled us to break away from that limitation by splitting ACID update events 
into a combination of delete+insert. In fact, it has now enabled us to create 
splits on delta files.
Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
bottleneck in the vectorized code path for ACID by now directly reading row 
batches from the underlying ORC files and avoiding any stitching altogether. 
Once a row batch is read from the split (which may be on a base/delta file), 
the deleted rows will be found by cross-referencing them against a data 
structure that will just keep track of deleted events (found in the 
deleted_delta files). This will lead to a large performance gain when reading 
ACID files in vectorized fashion, while enabling further optimizations in 
future that can be done on top of that.


Thanks,

Saket Saurabh



Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-08-08 Thread Saket Saurabh


> On Aug. 4, 2016, 2:29 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java, line 399
> > <https://reviews.apache.org/r/49766/diff/4/?file=1455576#file1455576line399>
> >
> > rowId is always -1 here. Intended?

@Sergey: Thanks for pointing this out. It was a bug and not intended.


- Saket


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/#review144816
---


On Aug. 8, 2016, 5:53 p.m., Saket Saurabh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49766/
> ---
> 
> (Updated Aug. 8, 2016, 5:53 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-14035
> 
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 70816bd 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  14f7316 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  974c6b8 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  731caa8 
>   metastore/if/hive_metastore.thrift a2e35b8 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
>  5a666f2 
>   metastore/src/gen/thrift/gen-php/metastore/Types.php d6f7f49 
>   metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
>  3e74675 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 26e6443 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 449d889 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> efde2db 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8cb5e8a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 6caca98 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java af192fb 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 556df18 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
> 6648829 
> 
> Diff: https://reviews.apache.org/r/49766/diff/
> 
> 
> Testing
> ---
> 
> Tests for the feature are in 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly 
> integration tests that test end-to-end insert/update/delete scenarios 
> followed by compaction and cleaning.
> 
> 
> Thanks,
> 
> Saket Saurabh
> 
>



Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-08-08 Thread Saket Saurabh


> On Aug. 4, 2016, 2:29 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java, line 588
> > <https://reviews.apache.org/r/49766/diff/4/?file=1455573#file1455573line588>
> >
> > this check is not done in the above "if" before adding the statementId 
> > to "last". What does this check mean i.e. what do negative numbers mean?

@Sergey, this change is now no longer part of this patch. It got refactored 
away when I merged serializeDelta() function with serializeDeleteDelta(). 
Although in fact, it exists in current master as well. The negative numbers 
mean that we do not have a statement id for this delta. For example, when we 
run a minor compaction we produce delta of the form delta_x_y which has no 
statement id attached to it. In these cases, statement id would be -1.


- Saket


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/#review144816
-----------


On Aug. 8, 2016, 5:53 p.m., Saket Saurabh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49766/
> ---
> 
> (Updated Aug. 8, 2016, 5:53 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-14035
> 
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 70816bd 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  14f7316 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  974c6b8 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  731caa8 
>   metastore/if/hive_metastore.thrift a2e35b8 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
>  5a666f2 
>   metastore/src/gen/thrift/gen-php/metastore/Types.php d6f7f49 
>   metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
>  3e74675 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 26e6443 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 449d889 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> efde2db 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8cb5e8a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 6caca98 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java af192fb 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 556df18 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
> 6648829 
> 
> Diff: https://reviews.apache.org/r/49766/diff/
> 
> 
> Testing
> ---
> 
> Tests for the feature are in 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly 
> integration tests that test end-to-end insert/update/delete scenarios 
> followed by compaction and cleaning.
> 
> 
> Thanks,
> 
> Saket Saurabh
> 
>



Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-08-08 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/
---

(Updated Aug. 8, 2016, 5:53 p.m.)


Review request for hive and Eugene Koifman.


Changes
---

Updated patch with comments/feedback from code review. 
Major changes are refactoring of AcidUtils.getAcidState(), CompactorMR, fixing 
of flush length related changes in OrcRecordUpdater, lazy initialization of 
deleteEventWriters and other minor changes. 
The patch adds additional unit tests to test schema evolution with ACID. It 
also adds a subclass to TestTxnCommands2 for replicating all the existing ACID 
test cases with split-update turned on.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14035

In current Hive version, delta files created by ACID transactions do not allow 
predicate pushdown if they contain any update/delete events. This is done to 
preserve correctness when following a multi-version approach during event 
collapsing, where an update event overwrites an existing insert event. 
This JIRA proposes to split an update event into a combination of a delete 
event followed by a new insert event, that can enable predicate push down to 
all delta files without breaking correctness. To support backward compatibility 
for this feature, this JIRA also proposes to add some sort of versioning to 
ACID that can allow different versions of ACID transactions to co-exist 
together.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 70816bd 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 14f7316 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 974c6b8 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 731caa8 
  metastore/if/hive_metastore.thrift a2e35b8 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
 5a666f2 
  metastore/src/gen/thrift/gen-php/metastore/Types.php d6f7f49 
  metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
  metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
  
metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
 3e74675 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 26e6443 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 449d889 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java efde2db 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8cb5e8a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java af192fb 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 556df18 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
6648829 

Diff: https://reviews.apache.org/r/49766/diff/


Testing
---

Tests for the feature are in 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly 
integration tests that test end-to-end insert/update/delete scenarios followed 
by compaction and cleaning.


Thanks,

Saket Saurabh



[jira] [Created] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-05 Thread Saket Saurabh (JIRA)
Saket Saurabh created HIVE-14448:


 Summary: Queries with predicate fail when ETL split strategy is 
chosen for ACID tables
 Key: HIVE-14448
 URL: https://issues.apache.org/jira/browse/HIVE-14448
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Saket Saurabh


When ETL split strategy is applied to ACID tables with predicate pushdown (SARG 
enabled), split generation fails for ACID. This bug will be usually exposed 
when working with data at scale, because in most otherwise cases only BI split 
strategy is chosen. My guess is that this is happening because the correct 
readerSchema is not being picked up when we try to extract SARG column names.

Quickest way to reproduce is to add the following unit test to 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
 @Test
  public void testETLSplitStrategyForACID() throws Exception {
hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
runWorker(hiveConf);
List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  + 
" where a = 1");
int[][] resultData = new int[][] {{1,2}};
Assert.assertEquals(stringifyValues(resultData), rs);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys

2016-07-27 Thread Saket Saurabh (JIRA)
Saket Saurabh created HIVE-14366:


 Summary: Conversion of a Non-ACID table to an ACID table produces 
non-unique primary keys
 Key: HIVE-14366
 URL: https://issues.apache.org/jira/browse/HIVE-14366
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Saket Saurabh


When a Non-ACID table is converted to an ACID table, the primary key consisting 
of (original transaction id, bucket_id, row_id) is not generated uniquely. 
Currently, the row_id is always set to 0 for most rows. This leads to 
correctness issue for such tables.

Quickest way to reproduce is to add the following unit test to 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
  @Test
  public void testOriginalReader() throws Exception {
FileSystem fs = FileSystem.get(hiveConf);
FileStatus[] status;

// 1. Insert five rows to Non-ACID table.
runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
values(1,2),(3,4),(5,6),(7,8),(9,10)");

// 2. Convert NONACIDORCTBL to ACID table.
runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
TBLPROPERTIES ('transactional'='true')");

// 3. Perform a major compaction.
runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
'MAJOR'");
runWorker(hiveConf);

// 3. Perform a delete.
runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1");

// Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since 
(1,2) has been deleted.
List rs = runStatementOnDriver("select a,b from " + 
Table.NONACIDORCTBL + " order by a,b");
int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}};
Assert.assertEquals(stringifyValues(resultData), rs);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-07-27 Thread Saket Saurabh


> On July 21, 2016, 12:42 a.m., Lefty Leverenz wrote:
> >

@Lefty, does the updated description for this config variable seem to explain 
better now?


- Saket


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/#review143059
---


On July 27, 2016, 2:54 p.m., Saket Saurabh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49766/
> ---
> 
> (Updated July 27, 2016, 2:54 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://issues.apache.org/jira/browse/HIVE-14035
> 
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  14f7316 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
>  974c6b8 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  ca2a912 
>   metastore/if/hive_metastore.thrift 4d92b73 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
>   
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
>  5a666f2 
>   metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 
>   metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
>   
> metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
>  3e74675 
>   orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 63d02fb 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 6caca98 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 
> 
> Diff: https://reviews.apache.org/r/49766/diff/
> 
> 
> Testing
> ---
> 
> Tests for the feature are in 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly 
> integration tests that test end-to-end insert/update/delete scenarios 
> followed by compaction and cleaning.
> 
> 
> Thanks,
> 
> Saket Saurabh
> 
>



Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-07-27 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/
---

(Updated July 27, 2016, 2:54 p.m.)


Review request for hive and Eugene Koifman.


Changes
---

Refactor the way delete event writers are created for compaction case in favor 
of a better abstraction.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14035

In current Hive version, delta files created by ACID transactions do not allow 
predicate pushdown if they contain any update/delete events. This is done to 
preserve correctness when following a multi-version approach during event 
collapsing, where an update event overwrites an existing insert event. 
This JIRA proposes to split an update event into a combination of a delete 
event followed by a new insert event, that can enable predicate push down to 
all delta files without breaking correctness. To support backward compatibility 
for this feature, this JIRA also proposes to add some sort of versioning to 
ACID that can allow different versions of ACID transactions to co-exist 
together.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 14f7316 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 974c6b8 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 ca2a912 
  metastore/if/hive_metastore.thrift 4d92b73 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
 5a666f2 
  metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 
  metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
  metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
  
metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
 3e74675 
  orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 63d02fb 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 

Diff: https://reviews.apache.org/r/49766/diff/


Testing
---

Tests for the feature are in 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly 
integration tests that test end-to-end insert/update/delete scenarios followed 
by compaction and cleaning.


Thanks,

Saket Saurabh



Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-07-26 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/
---

(Updated July 26, 2016, 11:30 a.m.)


Review request for hive and Eugene Koifman.


Changes
---

Add more UTs to specifically test AcidUtils and various other compaction 
scenarios.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14035

In current Hive version, delta files created by ACID transactions do not allow 
predicate pushdown if they contain any update/delete events. This is done to 
preserve correctness when following a multi-version approach during event 
collapsing, where an update event overwrites an existing insert event. 
This JIRA proposes to split an update event into a combination of a delete 
event followed by a new insert event, that can enable predicate push down to 
all delta files without breaking correctness. To support backward compatibility 
for this feature, this JIRA also proposes to add some sort of versioning to 
ACID that can allow different versions of ACID transactions to co-exist 
together.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 14f7316 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 974c6b8 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 ca2a912 
  metastore/if/hive_metastore.thrift 4d92b73 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
 5a666f2 
  metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 
  metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
  metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
  
metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
 3e74675 
  orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 63d02fb 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java b0f8c8b 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 

Diff: https://reviews.apache.org/r/49766/diff/


Testing
---

Tests for the feature are in 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly 
integration tests that test end-to-end insert/update/delete scenarios followed 
by compaction and cleaning.


Thanks,

Saket Saurabh



Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-07-20 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/
---

(Updated July 20, 2016, 3:55 p.m.)


Review request for hive and Eugene Koifman.


Changes
---

Updated the patch by rebasing with master. No additional code changes. Same as 
Patch #10 at https://issues.apache.org/jira/browse/HIVE-14035


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14035

In current Hive version, delta files created by ACID transactions do not allow 
predicate pushdown if they contain any update/delete events. This is done to 
preserve correctness when following a multi-version approach during event 
collapsing, where an update event overwrites an existing insert event. 
This JIRA proposes to split an update event into a combination of a delete 
event followed by a new insert event, that can enable predicate push down to 
all delta files without breaking correctness. To support backward compatibility 
for this feature, this JIRA also proposes to add some sort of versioning to 
ACID that can allow different versions of ACID transactions to co-exist 
together.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 66203a5 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 14f7316 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 974c6b8 
  metastore/if/hive_metastore.thrift 4d92b73 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
 5a666f2 
  metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 
  metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
  metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
  
metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
 3e74675 
  orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 69d58d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java b0f8c8b 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java e577961 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ef0bb3d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 

Diff: https://reviews.apache.org/r/49766/diff/


Testing
---

Tests for the feature are in 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly 
integration tests that test end-to-end insert/update/delete scenarios followed 
by compaction and cleaning.


Thanks,

Saket Saurabh



[jira] [Created] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-07-13 Thread Saket Saurabh (JIRA)
Saket Saurabh created HIVE-14233:


 Summary: Improve vectorization for ACID by eliminating row-by-row 
stitching
 Key: HIVE-14233
 URL: https://issues.apache.org/jira/browse/HIVE-14233
 Project: Hive
  Issue Type: New Feature
  Components: Transactions, Vectorization
Reporter: Saket Saurabh
Assignee: Saket Saurabh


This JIRA proposes to improve vectorization for ACID by eliminating row-by-row 
stitching when reading back ACID files. In the current implementation, a 
vectorized row batch is created by populating the batch one row at a time, 
before the vectorized batch is passed up along the operator pipeline. This 
row-by-row stitching limitation was because of the fact that the ACID 
insert/update/delete events from various delta files needed to be merged 
together before the actual version of a given row was found out. HIVE-14035 has 
enabled us to break away from that limitation by splitting ACID update events 
into a combination of delete+insert. In fact, it has now enabled us to create 
splits on delta files.
Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
bottleneck in the vectorized code path for ACID by now directly reading row 
batches from the underlying ORC files and avoiding any stitching altogether. 
Once a row batch is read from the split (which may be on a base/delta file), 
the deleted rows will be found by cross-referencing them against a data 
structure that will just keep track of deleted events (found in the 
deleted_delta files). This will lead to a large performance gain when reading 
ACID files in vectorized fashion, while enabling further optimizations in 
future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-07-08 Thread Saket Saurabh (JIRA)
Saket Saurabh created HIVE-14199:


 Summary: Enable Bucket Pruning for ACID tables
 Key: HIVE-14199
 URL: https://issues.apache.org/jira/browse/HIVE-14199
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Reporter: Saket Saurabh
Assignee: Saket Saurabh


Currently, ACID tables do not benefit from the bucket pruning feature 
introduced in HIVE-11525. The reason for this has been the fact that bucket 
pruning happens at split generation level and for ACID, traditionally the delta 
files were never split. The parallelism for ACID was then restricted to the 
number of buckets. There would be as many splits as the number of buckets and 
each worker processing one split would inevitably read all the delta files for 
that bucket, even when the query may have originally required only one of the 
buckets to be read.
However, HIVE-14035 now enables even the delta files to be also split. What 
this means is that now we have enough information at the split generation level 
to determine appropriate buckets to process for the delta files. This can 
efficiently allow us to prune unnecessary buckets for delta files and will lead 
to good performance gain for a large number of selective queries on ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-07-07 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/
---

(Updated July 7, 2016, 10:26 a.m.)


Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14035

In current Hive version, delta files created by ACID transactions do not allow 
predicate pushdown if they contain any update/delete events. This is done to 
preserve correctness when following a multi-version approach during event 
collapsing, where an update event overwrites an existing insert event. 
This JIRA proposes to split an update event into a combination of a delete 
event followed by a new insert event, that can enable predicate push down to 
all delta files without breaking correctness. To support backward compatibility 
for this feature, this JIRA also proposes to add some sort of versioning to 
ACID that can allow different versions of ACID transactions to co-exist 
together.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b13fc65 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 14f7316 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 0c6b9ea 
  metastore/if/hive_metastore.thrift 4d92b73 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
 5a666f2 
  metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 
  metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
  metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
  
metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
 3e74675 
  orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 36f38f6 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 227a051 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 69d58d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java b0f8c8b 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java e577961 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 82abd52 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java e76c925 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 5745dee 

Diff: https://reviews.apache.org/r/49766/diff/


Testing (updated)
---

Tests for the feature are in 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly 
integration tests that test end-to-end insert/update/delete scenarios followed 
by compaction and cleaning.


Thanks,

Saket Saurabh



Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

2016-07-07 Thread Saket Saurabh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/
---

(Updated July 7, 2016, 10:09 a.m.)


Review request for hive and Eugene Koifman.


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-14035

In current Hive version, delta files created by ACID transactions do not allow 
predicate pushdown if they contain any update/delete events. This is done to 
preserve correctness when following a multi-version approach during event 
collapsing, where an update event overwrites an existing insert event. 
This JIRA proposes to split an update event into a combination of a delete 
event followed by a new insert event, that can enable predicate push down to 
all delta files without breaking correctness. To support backward compatibility 
for this feature, this JIRA also proposes to add some sort of versioning to 
ACID that can allow different versions of ACID transactions to co-exist 
together.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b13fc65 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 14f7316 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 0c6b9ea 
  metastore/if/hive_metastore.thrift 4d92b73 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java
 5a666f2 
  metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 
  metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
  metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
  
metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java
 3e74675 
  orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 36f38f6 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 227a051 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 69d58d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java b0f8c8b 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java e577961 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 82abd52 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java e76c925 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 5745dee 

Diff: https://reviews.apache.org/r/49766/diff/


Testing
---


Thanks,

Saket Saurabh



[jira] [Created] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-06-16 Thread Saket Saurabh (JIRA)
Saket Saurabh created HIVE-14035:


 Summary: Enable predicate pushdown to delta files created by ACID 
Transactions
 Key: HIVE-14035
 URL: https://issues.apache.org/jira/browse/HIVE-14035
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Reporter: Saket Saurabh
Priority: Minor


In current Hive version, delta files created by ACID transactions do not allow 
predicate pushdown if they contain any update/delete events. This is done to 
preserve correctness when following a multi-version approach during event 
collapsing, where an update event overwrites an existing insert event. 
This JIRA proposes to split an update event into a combination of a delete 
event followed by a new insert event, that can enable predicate push down to 
all delta files without breaking correctness. To support backward compatibility 
for this feature, this JIRA also proposes to add some sort of versioning to 
ACID that can allow different versions of ACID transactions to co-exist 
together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)