from:"\"Saket Saurabh \\\(JIRA\\\)\""

[jira] [Commented] (HIVE-14523) ACID performance improvement patches

2016-08-31 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454341#comment-15454341
 ] 

Saket Saurabh commented on HIVE-14523:
--

Thank you so much [~cartershanklin]. It has been a great experience for me too. 

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-31 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Fix Version/s: 2.2.0

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-31 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

All the related patches have been committed to master.

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-30 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451116#comment-15451116
 ] 

Saket Saurabh commented on HIVE-14233:
--

Thanks [~ekoifman] for committing the patch.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, 
> HIVE-14233.12.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-30 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451113#comment-15451113
 ] 

Saket Saurabh commented on HIVE-14233:
--

Thanks [~ekoifman], and [~sershe] for reviewing the patch.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, 
> HIVE-14233.12.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-30 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Patch Available  (was: Open)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, 
> HIVE-14233.12.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-30 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.12.patch

Patch #12 - addressed comments at RB, minor code refactoring.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, 
> HIVE-14233.12.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-30 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Open  (was: Patch Available)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-30 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.11.patch

Addressed comments at RB, refactored some initialization code to OrcInputFormat.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-30 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Patch Available  (was: Open)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-30 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Open  (was: Patch Available)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446668#comment-15446668
 ] 

Saket Saurabh commented on HIVE-14233:
--

Oops, forgot to do that.. sure Eugene, done that now.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Open  (was: Patch Available)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Patch Available  (was: Open)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Patch Available  (was: Open)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.10.patch

Addressed comments at RB & added TestTxnCommands2 subclass that runs e2e ACID 
tests with split-update and vectorization enabled.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch, HIVE-14233.10.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-29 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Open  (was: Patch Available)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-24 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436236#comment-15436236
 ] 

Saket Saurabh commented on HIVE-14233:
--

Thanks [~ekoifman] for the comments, working now on fixing them.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-22 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431994#comment-15431994
 ] 

Saket Saurabh commented on HIVE-14199:
--

Thanks [~ekoifman] for reviewing it.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-22 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Patch Available  (was: Open)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-22 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Open  (was: Patch Available)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-22 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430788#comment-15430788
 ] 

Saket Saurabh edited comment on HIVE-14233 at 8/22/16 1:51 PM:
---

Addressed comments at RB and added more documentation & unit tests.


was (Author: saketj):
Addressed comments at RB and add more documentation & unit tests.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-22 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430788#comment-15430788
 ] 

Saket Saurabh edited comment on HIVE-14233 at 8/22/16 1:51 PM:
---

Addressed comments at RB and add more documentation & unit tests.


was (Author: saketj):
Address comments at RB and add more documentation & unit tests.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-22 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Status: Patch Available  (was: Open)

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-22 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.09.patch

Address comments at RB and add more documentation & unit tests.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, 
> HIVE-14233.09.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-13 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420007#comment-15420007
 ] 

Saket Saurabh commented on HIVE-14035:
--

Thanks [~leftylev]. Sure I will add it as a wiki page too.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: Design.Document.Improving ACID performance in 
> Hive.01.docx, Design.Document.Improving ACID performance in Hive.02.docx, 
> HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, 
> HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, 
> HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, 
> HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, 
> HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, 
> HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-12 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419809#comment-15419809
 ] 

Saket Saurabh commented on HIVE-14233:
--

It is to be noted that this patch for improved vectorization process does not 
handle the case when the split is on an original file (a non-acid schema file). 
In such cases, it resorts to the older strategy of creating vectorized row 
batches using row-by-row stitching. However, this performance roadblock will 
happen only for the non-ACID to ACID converted tables and even then will only 
exist till the first major compaction on the table produces a base file.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: Design.Document.Improving ACID performance in Hive.02.docx

Updated version of design document with few minor corrections & typo fix

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: Design.Document.Improving ACID performance in 
> Hive.01.docx, Design.Document.Improving ACID performance in Hive.02.docx, 
> HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, 
> HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, 
> HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, 
> HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, 
> HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, 
> HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.08.patch

Rebase with master after HIVE-14035 got committed. Also fixes some comments at 
RB.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14199:
-
Status: Patch Available  (was: Open)

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14199:
-
Attachment: HIVE-14199.03.patch

Rebased with master after HIVE-14035 got committed. Submitting for Ptest.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: Design.Document.Improving ACID performance in Hive.01.docx

Initial version of the design document for reference that describes high level 
changes to ACID introduced by HIVE-14035, HIVE-14199 & HIVE-14233. 

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: Design.Document.Improving ACID performance in 
> Hive.01.docx, HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, 
> HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, 
> HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, 
> HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, 
> HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, 
> HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419309#comment-15419309
 ] 

Saket Saurabh commented on HIVE-14035:
--

Thanks [~ekoifman] and [~sershe] for the review.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.17.patch

Same as patch #17- repost as the previous Jenkins PreCommit job got killed.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Patch Available  (was: Open)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch, 
> HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-12 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Attachment: (was: HIVE-14035_14199_14233.01.patch)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Open  (was: Patch Available)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Patch Available  (was: Open)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Patch Available  (was: Open)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Attachment: HIVE-14523.02.patch

Updated patch with fixes from HIVE-14233.

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch, 
> HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Open  (was: Patch Available)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch, 
> HIVE-14523.02.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.07.patch

Fix a functional bug that was throwing up with ArrayOutOfBoundsExceptions as it 
was trying push SARGs for delete_deltas.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Attachment: HIVE-14523.01.patch

Realized that the previous file naming convention doesn't trigger the Ptest.

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Open  (was: Patch Available)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Patch Available  (was: Open)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Open  (was: Patch Available)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Status: Patch Available  (was: Open)

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14523) ACID performance improvement patches

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14523:
-
Attachment: HIVE-14035_14199_14233.01.patch

First version.

> ACID performance improvement patches
> 
>
> Key: HIVE-14523
> URL: https://issues.apache.org/jira/browse/HIVE-14523
> Project: Hive
>  Issue Type: Test
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
>Priority: Trivial
> Attachments: HIVE-14035_14199_14233.01.patch
>
>
> This is a trivial non-functional JIRA that combines the features introduced 
> HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.16.patch

Rebase with master and fix the two failing UTs due to a trivial case mismatch 
error.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.16.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.16.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.15.patch

Same as patch #14. Re-submit to make the Ptest run again- previous Jenkins 
Ptest build job failed to initialize with some unrelated checksum error.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.14.patch

Patch #14 significantly refactors the way split strategies are chosen for ACID 
split-update case and now correctly sets the isOriginal flag on a per split 
basis. When split-update is enabled, a split on base file can be of three 
types: split on an original_base, split on an compacted_base, & split on an 
insert_delta. It is possible that we might end up with a set of OrcSplits that 
splits both original and insert_delta in same job. In such cases, it is very 
important that we set the isOriginal flag correctly, otherwise it will mess up 
the way split strategies are used to instantiate a number of things. This patch 
takes care of that.
Additionally, the patch now also optimizes for the case when we had to process 
uncovered buckets when the split had no base (possible previously when we had 
only deltas). Now when split-update is enabled, every split will have a base, 
because there is no point of having a split that is supposed to just read the 
delete_deltas. (Minor compaction is not a concern here because minor compaction 
always creates a single split and has a separate logic of doing that, and that 
has not been modified.) 
Tests for all these changes are added to TestInputOutputFormat for various 
scenarios. Also addresses comments at RB. 

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-11 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-09 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414383#comment-15414383
 ] 

Saket Saurabh commented on HIVE-14035:
--

[~sershe] Thanks for the comments on RB. I am working on fixing those. No, the 
last run for patch 13 did not have split-update enabled by default. There are 
many tests that assert on number of files and directory layout that would 
anyway fail in PTest if we run those tests w/o modification. However, excluding 
those assert failures, when I ran these locally, the only other failures were 
NegativeArrayIndexException & IndexOutOfBoundException caused by HIVE-14448 and 
not related to this patch. However, I have created a subclass TestTxnCommands3 
that should ideally mimic this behavior with split-update enabled by default 
for a large number of ACID scenarios.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-09 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414224#comment-15414224
 ] 

Saket Saurabh commented on HIVE-14233:
--

Thanks [~sershe] for pointing that out. Have attached the link to review board 
for this JIRA. https://reviews.apache.org/r/50934/

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-09 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.06.patch

This patch disallows VectorizedRowBatchReader creation on original files.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-09 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414106#comment-15414106
 ] 

Saket Saurabh commented on HIVE-14448:
--

While I was investigating test failures for some other scenarios, I realized 
that schema evolution also breaks when ETL strategy is chosen, which I believe 
might be related to this JIRA.  I think it would be good to add another test 
that evolves the schema, as [~prasanth_j] suggested.

Even with the current patch, the following schema evolution test still fails 
with IndexOutOfBoundsException:

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
@Test
  public void testAcidWithSchemaEvolution() throws Exception {
hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
String tblName = "acidTblWithSchemaEvol";
runStatementOnDriver("drop table if exists " + tblName);
runStatementOnDriver("CREATE TABLE " + tblName + "(a INT, b STRING) " +
  " CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires table to be 
bucketed
  " STORED AS ORC TBLPROPERTIES ('transactional'='true')");

runStatementOnDriver("INSERT INTO " + tblName + " VALUES (1, 'foo'), (2, 
'bar')");

// Major compact to create a base that has ACID schema.
runStatementOnDriver("ALTER TABLE " + tblName + " COMPACT 'MAJOR'");
runWorker(hiveConf);

// Alter table for perform schema evolution.
runStatementOnDriver("ALTER TABLE " + tblName + " ADD COLUMNS(c int)");

// Validate there is an added NULL for column c.
List rs = runStatementOnDriver("SELECT * FROM " + tblName + " ORDER 
BY a");
String[] expectedResult = { "1\tfoo\tNULL", "2\tbar\tNULL" };
Assert.assertEquals(Arrays.asList(expectedResult), rs);
  }
{code}

Here is the back-trace for the failed test:
{code}
exec.Task: Job Submission failed with exception 'java.lang.RuntimeException(ORC 
split generation failed with exception: java.lang.IndexOutOfBoundsException: 
Index: 9, Size: 9)'
java.lang.RuntimeException: ORC split generation failed with exception: 
java.lang.IndexOutOfBoundsException: Index: 9, Size: 9
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1576)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1662)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1983)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1410)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1134)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1122)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1315)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.testAcidWithSchemaEvolution(TestTxnCommands2.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-08 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-08 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.13.patch

Updated patch with comments/feedback from code review. 
Major changes are refactoring of AcidUtils.getAcidState(), CompactorMR, fixing 
of flush length related changes in OrcRecordUpdater, lazy initialization of 
deleteEventWriters and other minor changes. 
The patch adds additional unit tests to test schema evolution with ACID. It 
also adds a subclass to TestTxnCommands2 for replicating all the existing ACID 
test cases with split-update turned on.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-08 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-05 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14448:
-
Description: 
When ETL split strategy is applied to ACID tables with predicate pushdown (SARG 
enabled), split generation fails for ACID. This bug will be usually exposed 
when working with data at scale, because in most otherwise cases only BI split 
strategy is chosen. My guess is that this is happening because the correct 
readerSchema is not being picked up when we try to extract SARG column names.

Quickest way to reproduce is to add the following unit test to 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
 @Test
  public void testETLSplitStrategyForACID() throws Exception {
hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
runWorker(hiveConf);
List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  + 
" where a = 1");
int[][] resultData = new int[][] {{1,2}};
Assert.assertEquals(stringifyValues(resultData), rs);
  }
{code}

Back-trace for this failed test is as follows:
{code}
exec.Task: Job Submission failed with exception 'java.lang.RuntimeException(ORC 
split generation failed with exception: java.lang.NegativeArraySizeException)'
java.lang.RuntimeException: ORC split generation failed with exception: 
java.lang.NegativeArraySizeException
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.r

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-04 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408659#comment-15408659
 ] 

Saket Saurabh commented on HIVE-14035:
--

Thanks [~sershe] for the comments and the feedback. I am working on addressing 
them. Sure, I will also put up a separate patch with default enabled for HiveQA 
run.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys

2016-07-27 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14366:
-
Status: Patch Available  (was: Open)

> Conversion of a Non-ACID table to an ACID table produces non-unique primary 
> keys
> 
>
> Key: HIVE-14366
> URL: https://issues.apache.org/jira/browse/HIVE-14366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Eugene Koifman
> Attachments: HIVE-14366.01.patch
>
>
> When a Non-ACID table is converted to an ACID table, the primary key 
> consisting of (original transaction id, bucket_id, row_id) is not generated 
> uniquely. Currently, the row_id is always set to 0 for most rows. This leads 
> to correctness issue for such tables.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>   @Test
>   public void testOriginalReader() throws Exception {
> FileSystem fs = FileSystem.get(hiveConf);
> FileStatus[] status;
> // 1. Insert five rows to Non-ACID table.
> runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
> values(1,2),(3,4),(5,6),(7,8),(9,10)");
> // 2. Convert NONACIDORCTBL to ACID table.
> runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
> TBLPROPERTIES ('transactional'='true')");
> // 3. Perform a major compaction.
> runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
> 'MAJOR'");
> runWorker(hiveConf);
> // 4. Perform a delete.
> runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 
> 1");
> // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since 
> (1,2) has been deleted.
> List rs = runStatementOnDriver("select a,b from " + 
> Table.NONACIDORCTBL + " order by a,b");
> int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys

2016-07-27 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14366:
-
Attachment: HIVE-14366.01.patch

Initial patch to fix the bug by reusing the globally unique rowNumbers found in 
the orc files.

> Conversion of a Non-ACID table to an ACID table produces non-unique primary 
> keys
> 
>
> Key: HIVE-14366
> URL: https://issues.apache.org/jira/browse/HIVE-14366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Eugene Koifman
> Attachments: HIVE-14366.01.patch
>
>
> When a Non-ACID table is converted to an ACID table, the primary key 
> consisting of (original transaction id, bucket_id, row_id) is not generated 
> uniquely. Currently, the row_id is always set to 0 for most rows. This leads 
> to correctness issue for such tables.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>   @Test
>   public void testOriginalReader() throws Exception {
> FileSystem fs = FileSystem.get(hiveConf);
> FileStatus[] status;
> // 1. Insert five rows to Non-ACID table.
> runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
> values(1,2),(3,4),(5,6),(7,8),(9,10)");
> // 2. Convert NONACIDORCTBL to ACID table.
> runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
> TBLPROPERTIES ('transactional'='true')");
> // 3. Perform a major compaction.
> runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
> 'MAJOR'");
> runWorker(hiveConf);
> // 4. Perform a delete.
> runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 
> 1");
> // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since 
> (1,2) has been deleted.
> List rs = runStatementOnDriver("select a,b from " + 
> Table.NONACIDORCTBL + " order by a,b");
> int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys

2016-07-27 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14366:
-
Description: 
When a Non-ACID table is converted to an ACID table, the primary key consisting 
of (original transaction id, bucket_id, row_id) is not generated uniquely. 
Currently, the row_id is always set to 0 for most rows. This leads to 
correctness issue for such tables.

Quickest way to reproduce is to add the following unit test to 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
  @Test
  public void testOriginalReader() throws Exception {
FileSystem fs = FileSystem.get(hiveConf);
FileStatus[] status;

// 1. Insert five rows to Non-ACID table.
runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
values(1,2),(3,4),(5,6),(7,8),(9,10)");

// 2. Convert NONACIDORCTBL to ACID table.
runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
TBLPROPERTIES ('transactional'='true')");

// 3. Perform a major compaction.
runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
'MAJOR'");
runWorker(hiveConf);

// 4. Perform a delete.
runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1");

// 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since 
(1,2) has been deleted.
List rs = runStatementOnDriver("select a,b from " + 
Table.NONACIDORCTBL + " order by a,b");
int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}};
Assert.assertEquals(stringifyValues(resultData), rs);
  }
{code}

  was:
When a Non-ACID table is converted to an ACID table, the primary key consisting 
of (original transaction id, bucket_id, row_id) is not generated uniquely. 
Currently, the row_id is always set to 0 for most rows. This leads to 
correctness issue for such tables.

Quickest way to reproduce is to add the following unit test to 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
  @Test
  public void testOriginalReader() throws Exception {
FileSystem fs = FileSystem.get(hiveConf);
FileStatus[] status;

// 1. Insert five rows to Non-ACID table.
runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
values(1,2),(3,4),(5,6),(7,8),(9,10)");

// 2. Convert NONACIDORCTBL to ACID table.
runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
TBLPROPERTIES ('transactional'='true')");

// 3. Perform a major compaction.
runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
'MAJOR'");
runWorker(hiveConf);

// 3. Perform a delete.
runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1");

// Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since 
(1,2) has been deleted.
List rs = runStatementOnDriver("select a,b from " + 
Table.NONACIDORCTBL + " order by a,b");
int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}};
Assert.assertEquals(stringifyValues(resultData), rs);
  }
{code}


> Conversion of a Non-ACID table to an ACID table produces non-unique primary 
> keys
> 
>
> Key: HIVE-14366
> URL: https://issues.apache.org/jira/browse/HIVE-14366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Saket Saurabh
>
> When a Non-ACID table is converted to an ACID table, the primary key 
> consisting of (original transaction id, bucket_id, row_id) is not generated 
> uniquely. Currently, the row_id is always set to 0 for most rows. This leads 
> to correctness issue for such tables.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>   @Test
>   public void testOriginalReader() throws Exception {
> FileSystem fs = FileSystem.get(hiveConf);
> FileStatus[] status;
> // 1. Insert five rows to Non-ACID table.
> runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
> values(1,2),(3,4),(5,6),(7,8),(9,10)");
> // 2. Convert NONACIDORCTBL to ACID table.
> runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
> TBLPROPERTIES ('transactional'='true')");
> // 3. Perform a major compaction.
> runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
> 'MAJOR'");
> runWorker(hiveConf);
> // 4. Perform a delete.
> runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 
> 1");
> // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only si

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-27 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-27 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.12.patch

Refactor the way delete event writers are created for compaction case in favor 
of a better abstraction. Discovered that this design would be much simpler now, 
while working on improving compaction for ACID.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-27 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-07-26 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.05.patch

Further optimize vectorized row batch creation/processing by eliminating 
setting it to null after the user payload columns are sent up the operator 
pipeline.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-07-26 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.04.patch

Fix a NullPointerException bug that was being thrown when vectorized row 
batches were being used across subsequent nextBatch() calls.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-26 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-26 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-26 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.11.patch

Add more UTs to specifically test AcidUtils and various compaction scenarios 
for split-update.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-07-25 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.03.patch

Fix vectorized row batch initial schema to be based on that of base reader 
schema so that projected columns are accounted for.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-07-21 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.02.patch

Second version of the patch with optimized code path to remove deleted rows 
from a given vectorized row batch. This is done by loading all the delete 
events into memory at once and using an optimized binary search algorithm.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-20 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386761#comment-15386761
 ] 

Saket Saurabh edited comment on HIVE-14035 at 7/20/16 10:52 PM:


Updated the patch by rebasing with master. No additional code changes. Patch 
(#10)


was (Author: saketj):
Updated the patch by rebasing with master. No additional code changes.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-20 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.10.patch

Updated the patch by rebasing with master. No additional code changes.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-20 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-20 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-07-14 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.01.patch

First version of the patch with a single high-level integration test. There is 
much scope to optimize the way delete events are being used to mark the 
selected vector inside the vectorized row batch. These optimizations and 
further finer unit tests will be part of the subsequent patches.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-07-08 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368883#comment-15368883
 ] 

Saket Saurabh commented on HIVE-14199:
--

Thanks [~gopalv] for the comment. I have updated the patch with these changes. 
Currently, to disable the codepath for legacy layouts, I do not consider the 
case of matching the bucketName against the 
AcidUtils.LEGACY_BUCKET_DIGIT_PATTERN. So, I am thinking these legacy layouts 
will be ignored then. 

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-07-08 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14199:
-
Attachment: HIVE-14199.02.patch

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-07-08 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14199:
-
Attachment: HIVE-14199.01.patch

Initial commit for this feature. Please note it is dependent on HIVE-14035.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-07 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.09.patch

Patch (#09) same as the previous patch (#08) with no code modifications, except 
that it is rebased with master and has trailing whitespaces fixed. Mirrors the 
patch uploaded at review board.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-07 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-07 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-07 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366408#comment-15366408
 ] 

Saket Saurabh commented on HIVE-14035:
--

Link for review board for this patch: https://reviews.apache.org/r/49766/
Thanks

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-05 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Patch Available  (was: Open)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-05 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: HIVE-14035.08.patch

Allow transactional_properties to be set when converting non-acid tables to 
acid tables and add more unit test cases

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-07-05 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Status: Open  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 123 matches

Mail list logo