[jira] [Commented] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454341#comment-15454341 ] Saket Saurabh commented on HIVE-14523: -- Thank you so much [~cartershanklin]. It has been a great experience for me too. > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Fix For: 2.2.0 > > Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Fix Version/s: 2.2.0 > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Fix For: 2.2.0 > > Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Resolution: Fixed Status: Resolved (was: Patch Available) All the related patches have been committed to master. > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451116#comment-15451116 ] Saket Saurabh commented on HIVE-14233: -- Thanks [~ekoifman] for committing the patch. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Fix For: 2.2.0 > > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, > HIVE-14233.12.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451113#comment-15451113 ] Saket Saurabh commented on HIVE-14233: -- Thanks [~ekoifman], and [~sershe] for reviewing the patch. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Fix For: 2.2.0 > > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, > HIVE-14233.12.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Patch Available (was: Open) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, > HIVE-14233.12.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.12.patch Patch #12 - addressed comments at RB, minor code refactoring. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, > HIVE-14233.12.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Open (was: Patch Available) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.11.patch Addressed comments at RB, refactored some initialization code to OrcInputFormat. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Patch Available (was: Open) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Open (was: Patch Available) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446668#comment-15446668 ] Saket Saurabh commented on HIVE-14233: -- Oops, forgot to do that.. sure Eugene, done that now. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Open (was: Patch Available) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Patch Available (was: Open) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Patch Available (was: Open) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.10.patch Addressed comments at RB & added TestTxnCommands2 subclass that runs e2e ACID tests with split-update and vectorization enabled. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Open (was: Patch Available) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436236#comment-15436236 ] Saket Saurabh commented on HIVE-14233: -- Thanks [~ekoifman] for the comments, working now on fixing them. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431994#comment-15431994 ] Saket Saurabh commented on HIVE-14199: -- Thanks [~ekoifman] for reviewing it. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Fix For: 2.2.0 > > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, > HIVE-14199.03.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Patch Available (was: Open) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Open (was: Patch Available) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430788#comment-15430788 ] Saket Saurabh edited comment on HIVE-14233 at 8/22/16 1:51 PM: --- Addressed comments at RB and added more documentation & unit tests. was (Author: saketj): Addressed comments at RB and add more documentation & unit tests. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430788#comment-15430788 ] Saket Saurabh edited comment on HIVE-14233 at 8/22/16 1:51 PM: --- Addressed comments at RB and add more documentation & unit tests. was (Author: saketj): Address comments at RB and add more documentation & unit tests. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Status: Patch Available (was: Open) > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.09.patch Address comments at RB and add more documentation & unit tests. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420007#comment-15420007 ] Saket Saurabh commented on HIVE-14035: -- Thanks [~leftylev]. Sure I will add it as a wiki page too. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: Design.Document.Improving ACID performance in > Hive.01.docx, Design.Document.Improving ACID performance in Hive.02.docx, > HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, > HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, > HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, > HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, > HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, > HIVE-14035.17.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419809#comment-15419809 ] Saket Saurabh commented on HIVE-14233: -- It is to be noted that this patch for improved vectorization process does not handle the case when the split is on an original file (a non-acid schema file). In such cases, it resorts to the older strategy of creating vectorized row batches using row-by-row stitching. However, this performance roadblock will happen only for the non-ACID to ACID converted tables and even then will only exist till the first major compaction on the table produces a base file. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: Design.Document.Improving ACID performance in Hive.02.docx Updated version of design document with few minor corrections & typo fix > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Fix For: 2.2.0 > > Attachments: Design.Document.Improving ACID performance in > Hive.01.docx, Design.Document.Improving ACID performance in Hive.02.docx, > HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, > HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, > HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, > HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, > HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, > HIVE-14035.17.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.08.patch Rebase with master after HIVE-14035 got committed. Also fixes some comments at RB. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14199: - Status: Patch Available (was: Open) > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, > HIVE-14199.03.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14199: - Attachment: HIVE-14199.03.patch Rebased with master after HIVE-14035 got committed. Submitting for Ptest. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, > HIVE-14199.03.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: Design.Document.Improving ACID performance in Hive.01.docx Initial version of the design document for reference that describes high level changes to ACID introduced by HIVE-14035, HIVE-14199 & HIVE-14233. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Fix For: 2.2.0 > > Attachments: Design.Document.Improving ACID performance in > Hive.01.docx, HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, > HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, > HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, > HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, > HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, > HIVE-14035.17.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419309#comment-15419309 ] Saket Saurabh commented on HIVE-14035: -- Thanks [~ekoifman] and [~sershe] for the review. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Fix For: 2.2.0 > > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.17.patch Same as patch #17- repost as the previous Jenkins PreCommit job got killed. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Patch Available (was: Open) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch, > HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Attachment: (was: HIVE-14035_14199_14233.01.patch) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Open (was: Patch Available) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Patch Available (was: Open) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Patch Available (was: Open) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14523.01.patch, HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Attachment: HIVE-14523.02.patch Updated patch with fixes from HIVE-14233. > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch, > HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Open (was: Patch Available) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch, > HIVE-14523.02.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.07.patch Fix a functional bug that was throwing up with ArrayOutOfBoundsExceptions as it was trying push SARGs for delete_deltas. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Attachment: HIVE-14523.01.patch Realized that the previous file naming convention doesn't trigger the Ptest. > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Open (was: Patch Available) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Patch Available (was: Open) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch, HIVE-14523.01.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Open (was: Patch Available) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Status: Patch Available (was: Open) > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14523) ACID performance improvement patches
[ https://issues.apache.org/jira/browse/HIVE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14523: - Attachment: HIVE-14035_14199_14233.01.patch First version. > ACID performance improvement patches > > > Key: HIVE-14523 > URL: https://issues.apache.org/jira/browse/HIVE-14523 > Project: Hive > Issue Type: Test >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Saket Saurabh >Priority: Trivial > Attachments: HIVE-14035_14199_14233.01.patch > > > This is a trivial non-functional JIRA that combines the features introduced > HIVE-14035, HIVE-14199 and HIVE-14233 into a single patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.16.patch Rebase with master and fix the two failing UTs due to a trivial case mismatch error. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.16.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.16.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.15.patch Same as patch #14. Re-submit to make the Ptest run again- previous Jenkins Ptest build job failed to initialize with some unrelated checksum error. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.14.patch Patch #14 significantly refactors the way split strategies are chosen for ACID split-update case and now correctly sets the isOriginal flag on a per split basis. When split-update is enabled, a split on base file can be of three types: split on an original_base, split on an compacted_base, & split on an insert_delta. It is possible that we might end up with a set of OrcSplits that splits both original and insert_delta in same job. In such cases, it is very important that we set the isOriginal flag correctly, otherwise it will mess up the way split strategies are used to instantiate a number of things. This patch takes care of that. Additionally, the patch now also optimizes for the case when we had to process uncovered buckets when the split had no base (possible previously when we had only deltas). Now when split-update is enabled, every split will have a base, because there is no point of having a split that is supposed to just read the delete_deltas. (Minor compaction is not a concern here because minor compaction always creates a single split and has a separate logic of doing that, and that has not been modified.) Tests for all these changes are added to TestInputOutputFormat for various scenarios. Also addresses comments at RB. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414383#comment-15414383 ] Saket Saurabh commented on HIVE-14035: -- [~sershe] Thanks for the comments on RB. I am working on fixing those. No, the last run for patch 13 did not have split-update enabled by default. There are many tests that assert on number of files and directory layout that would anyway fail in PTest if we run those tests w/o modification. However, excluding those assert failures, when I ran these locally, the only other failures were NegativeArrayIndexException & IndexOutOfBoundException caused by HIVE-14448 and not related to this patch. However, I have created a subclass TestTxnCommands3 that should ideally mimic this behavior with split-update enabled by default for a large number of ACID scenarios. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414224#comment-15414224 ] Saket Saurabh commented on HIVE-14233: -- Thanks [~sershe] for pointing that out. Have attached the link to review board for this JIRA. https://reviews.apache.org/r/50934/ > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.06.patch This patch disallows VectorizedRowBatchReader creation on original files. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414106#comment-15414106 ] Saket Saurabh commented on HIVE-14448: -- While I was investigating test failures for some other scenarios, I realized that schema evolution also breaks when ETL strategy is chosen, which I believe might be related to this JIRA. I think it would be good to add another test that evolves the schema, as [~prasanth_j] suggested. Even with the current patch, the following schema evolution test still fails with IndexOutOfBoundsException: {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} @Test public void testAcidWithSchemaEvolution() throws Exception { hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); String tblName = "acidTblWithSchemaEvol"; runStatementOnDriver("drop table if exists " + tblName); runStatementOnDriver("CREATE TABLE " + tblName + "(a INT, b STRING) " + " CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires table to be bucketed " STORED AS ORC TBLPROPERTIES ('transactional'='true')"); runStatementOnDriver("INSERT INTO " + tblName + " VALUES (1, 'foo'), (2, 'bar')"); // Major compact to create a base that has ACID schema. runStatementOnDriver("ALTER TABLE " + tblName + " COMPACT 'MAJOR'"); runWorker(hiveConf); // Alter table for perform schema evolution. runStatementOnDriver("ALTER TABLE " + tblName + " ADD COLUMNS(c int)"); // Validate there is an added NULL for column c. List rs = runStatementOnDriver("SELECT * FROM " + tblName + " ORDER BY a"); String[] expectedResult = { "1\tfoo\tNULL", "2\tbar\tNULL" }; Assert.assertEquals(Arrays.asList(expectedResult), rs); } {code} Here is the back-trace for the failed test: {code} exec.Task: Job Submission failed with exception 'java.lang.RuntimeException(ORC split generation failed with exception: java.lang.IndexOutOfBoundsException: Index: 9, Size: 9)' java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IndexOutOfBoundsException: Index: 9, Size: 9 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1576) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1662) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1983) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1410) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1134) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1122) at org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1315) at org.apache.hadoop.hive.ql.TestTxnCommands2.testAcidWithSchemaEvolution(TestTxnCommands2.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.13.patch Updated patch with comments/feedback from code review. Major changes are refactoring of AcidUtils.getAcidState(), CompactorMR, fixing of flush length related changes in OrcRecordUpdater, lazy initialization of deleteEventWriters and other minor changes. The patch adds additional unit tests to test schema evolution with ACID. It also adds a subclass to TestTxnCommands2 for replicating all the existing ACID test cases with split-update turned on. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14448: - Description: When ETL split strategy is applied to ACID tables with predicate pushdown (SARG enabled), split generation fails for ACID. This bug will be usually exposed when working with data at scale, because in most otherwise cases only BI split strategy is chosen. My guess is that this is happening because the correct readerSchema is not being picked up when we try to extract SARG column names. Quickest way to reproduce is to add the following unit test to ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} @Test   public void testETLSplitStrategyForACID() throws Exception { hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); runWorker(hiveConf); List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  + " where a = 1"); int[][] resultData = new int[][] {{1,2}}; Assert.assertEquals(stringifyValues(resultData), rs);   } {code} Back-trace for this failed test is as follows: {code} exec.Task: Job Submission failed with exception 'java.lang.RuntimeException(ORC split generation failed with exception: java.lang.NegativeArraySizeException)' java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) at org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) at org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.r
[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408659#comment-15408659 ] Saket Saurabh commented on HIVE-14035: -- Thanks [~sershe] for the comments and the feedback. I am working on addressing them. Sure, I will also put up a separate patch with default enabled for HiveQA run. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys
[ https://issues.apache.org/jira/browse/HIVE-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14366: - Status: Patch Available (was: Open) > Conversion of a Non-ACID table to an ACID table produces non-unique primary > keys > > > Key: HIVE-14366 > URL: https://issues.apache.org/jira/browse/HIVE-14366 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Saket Saurabh >Assignee: Eugene Koifman > Attachments: HIVE-14366.01.patch > > > When a Non-ACID table is converted to an ACID table, the primary key > consisting of (original transaction id, bucket_id, row_id) is not generated > uniquely. Currently, the row_id is always set to 0 for most rows. This leads > to correctness issue for such tables. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testOriginalReader() throws Exception { > FileSystem fs = FileSystem.get(hiveConf); > FileStatus[] status; > // 1. Insert five rows to Non-ACID table. > runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) > values(1,2),(3,4),(5,6),(7,8),(9,10)"); > // 2. Convert NONACIDORCTBL to ACID table. > runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET > TBLPROPERTIES ('transactional'='true')"); > // 3. Perform a major compaction. > runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact > 'MAJOR'"); > runWorker(hiveConf); > // 4. Perform a delete. > runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = > 1"); > // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since > (1,2) has been deleted. > List rs = runStatementOnDriver("select a,b from " + > Table.NONACIDORCTBL + " order by a,b"); > int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys
[ https://issues.apache.org/jira/browse/HIVE-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14366: - Attachment: HIVE-14366.01.patch Initial patch to fix the bug by reusing the globally unique rowNumbers found in the orc files. > Conversion of a Non-ACID table to an ACID table produces non-unique primary > keys > > > Key: HIVE-14366 > URL: https://issues.apache.org/jira/browse/HIVE-14366 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Saket Saurabh >Assignee: Eugene Koifman > Attachments: HIVE-14366.01.patch > > > When a Non-ACID table is converted to an ACID table, the primary key > consisting of (original transaction id, bucket_id, row_id) is not generated > uniquely. Currently, the row_id is always set to 0 for most rows. This leads > to correctness issue for such tables. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testOriginalReader() throws Exception { > FileSystem fs = FileSystem.get(hiveConf); > FileStatus[] status; > // 1. Insert five rows to Non-ACID table. > runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) > values(1,2),(3,4),(5,6),(7,8),(9,10)"); > // 2. Convert NONACIDORCTBL to ACID table. > runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET > TBLPROPERTIES ('transactional'='true')"); > // 3. Perform a major compaction. > runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact > 'MAJOR'"); > runWorker(hiveConf); > // 4. Perform a delete. > runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = > 1"); > // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since > (1,2) has been deleted. > List rs = runStatementOnDriver("select a,b from " + > Table.NONACIDORCTBL + " order by a,b"); > int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys
[ https://issues.apache.org/jira/browse/HIVE-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14366: - Description: When a Non-ACID table is converted to an ACID table, the primary key consisting of (original transaction id, bucket_id, row_id) is not generated uniquely. Currently, the row_id is always set to 0 for most rows. This leads to correctness issue for such tables. Quickest way to reproduce is to add the following unit test to ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} @Test public void testOriginalReader() throws Exception { FileSystem fs = FileSystem.get(hiveConf); FileStatus[] status; // 1. Insert five rows to Non-ACID table. runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) values(1,2),(3,4),(5,6),(7,8),(9,10)"); // 2. Convert NONACIDORCTBL to ACID table. runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET TBLPROPERTIES ('transactional'='true')"); // 3. Perform a major compaction. runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 'MAJOR'"); runWorker(hiveConf); // 4. Perform a delete. runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1"); // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since (1,2) has been deleted. List rs = runStatementOnDriver("select a,b from " + Table.NONACIDORCTBL + " order by a,b"); int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}}; Assert.assertEquals(stringifyValues(resultData), rs); } {code} was: When a Non-ACID table is converted to an ACID table, the primary key consisting of (original transaction id, bucket_id, row_id) is not generated uniquely. Currently, the row_id is always set to 0 for most rows. This leads to correctness issue for such tables. Quickest way to reproduce is to add the following unit test to ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} @Test public void testOriginalReader() throws Exception { FileSystem fs = FileSystem.get(hiveConf); FileStatus[] status; // 1. Insert five rows to Non-ACID table. runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) values(1,2),(3,4),(5,6),(7,8),(9,10)"); // 2. Convert NONACIDORCTBL to ACID table. runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET TBLPROPERTIES ('transactional'='true')"); // 3. Perform a major compaction. runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 'MAJOR'"); runWorker(hiveConf); // 3. Perform a delete. runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1"); // Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since (1,2) has been deleted. List rs = runStatementOnDriver("select a,b from " + Table.NONACIDORCTBL + " order by a,b"); int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}}; Assert.assertEquals(stringifyValues(resultData), rs); } {code} > Conversion of a Non-ACID table to an ACID table produces non-unique primary > keys > > > Key: HIVE-14366 > URL: https://issues.apache.org/jira/browse/HIVE-14366 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Saket Saurabh > > When a Non-ACID table is converted to an ACID table, the primary key > consisting of (original transaction id, bucket_id, row_id) is not generated > uniquely. Currently, the row_id is always set to 0 for most rows. This leads > to correctness issue for such tables. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testOriginalReader() throws Exception { > FileSystem fs = FileSystem.get(hiveConf); > FileStatus[] status; > // 1. Insert five rows to Non-ACID table. > runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) > values(1,2),(3,4),(5,6),(7,8),(9,10)"); > // 2. Convert NONACIDORCTBL to ACID table. > runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET > TBLPROPERTIES ('transactional'='true')"); > // 3. Perform a major compaction. > runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact > 'MAJOR'"); > runWorker(hiveConf); > // 4. Perform a delete. > runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = > 1"); > // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only si
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.12.patch Refactor the way delete event writers are created for compaction case in favor of a better abstraction. Discovered that this design would be much simpler now, while working on improving compaction for ACID. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.05.patch Further optimize vectorized row batch creation/processing by eliminating setting it to null after the user payload columns are sent up the operator pipeline. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.04.patch Fix a NullPointerException bug that was being thrown when vectorized row batches were being used across subsequent nextBatch() calls. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.11.patch Add more UTs to specifically test AcidUtils and various compaction scenarios for split-update. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.03.patch Fix vectorized row batch initial schema to be based on that of base reader schema so that projected columns are accounted for. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.02.patch Second version of the patch with optimized code path to remove deleted rows from a given vectorized row batch. This is done by loading all the delete events into memory at once and using an optimized binary search algorithm. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386761#comment-15386761 ] Saket Saurabh edited comment on HIVE-14035 at 7/20/16 10:52 PM: Updated the patch by rebasing with master. No additional code changes. Patch (#10) was (Author: saketj): Updated the patch by rebasing with master. No additional code changes. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.10.patch Updated the patch by rebasing with master. No additional code changes. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching
[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14233: - Attachment: HIVE-14233.01.patch First version of the patch with a single high-level integration test. There is much scope to optimize the way delete events are being used to mark the selected vector inside the vectorized row batch. These optimizations and further finer unit tests will be part of the subsequent patches. > Improve vectorization for ACID by eliminating row-by-row stitching > -- > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368883#comment-15368883 ] Saket Saurabh commented on HIVE-14199: -- Thanks [~gopalv] for the comment. I have updated the patch with these changes. Currently, to disable the codepath for legacy layouts, I do not consider the case of matching the bucketName against the AcidUtils.LEGACY_BUCKET_DIGIT_PATTERN. So, I am thinking these legacy layouts will be ignored then. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14199: - Attachment: HIVE-14199.02.patch > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14199: - Attachment: HIVE-14199.01.patch Initial commit for this feature. Please note it is dependent on HIVE-14035. > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.09.patch Patch (#09) same as the previous patch (#08) with no code modifications, except that it is rebased with master and has trailing whitespaces fixed. Mirrors the patch uploaded at review board. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366408#comment-15366408 ] Saket Saurabh commented on HIVE-14035: -- Link for review board for this patch: https://reviews.apache.org/r/49766/ Thanks > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Patch Available (was: Open) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Attachment: HIVE-14035.08.patch Allow transactional_properties to be set when converting non-acid tables to acid tables and add more unit test cases > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saket Saurabh updated HIVE-14035: - Status: Open (was: Patch Available) > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)