[ https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450642#comment-15450642 ]
Hive QA commented on HIVE-14233: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12826257/HIVE-14233.12.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10502 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1047/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1047/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1047/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12826257 - PreCommit-HIVE-MASTER-Build > Improve vectorization for ACID by eliminating row-by-row stitching > ------------------------------------------------------------------ > > Key: HIVE-14233 > URL: https://issues.apache.org/jira/browse/HIVE-14233 > Project: Hive > Issue Type: New Feature > Components: Transactions, Vectorization > Reporter: Saket Saurabh > Assignee: Saket Saurabh > Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, > HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, > HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch, > HIVE-14233.09.patch, HIVE-14233.10.patch, HIVE-14233.11.patch, > HIVE-14233.12.patch > > > This JIRA proposes to improve vectorization for ACID by eliminating > row-by-row stitching when reading back ACID files. In the current > implementation, a vectorized row batch is created by populating the batch one > row at a time, before the vectorized batch is passed up along the operator > pipeline. This row-by-row stitching limitation was because of the fact that > the ACID insert/update/delete events from various delta files needed to be > merged together before the actual version of a given row was found out. > HIVE-14035 has enabled us to break away from that limitation by splitting > ACID update events into a combination of delete+insert. In fact, it has now > enabled us to create splits on delta files. > Building on top of HIVE-14035, this JIRA proposes to solve this earlier > bottleneck in the vectorized code path for ACID by now directly reading row > batches from the underlying ORC files and avoiding any stitching altogether. > Once a row batch is read from the split (which may be on a base/delta file), > the deleted rows will be found by cross-referencing them against a data > structure that will just keep track of deleted events (found in the > deleted_delta files). This will lead to a large performance gain when reading > ACID files in vectorized fashion, while enabling further optimizations in > future that can be done on top of that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)