[ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786080#comment-13786080 ]
Lars Francke commented on HIVE-4160: ------------------------------------ This is a huge patch and it's hard to see if it changes anything for the end user. As we'd like to keep the Wiki up-to-date it'd be great if someone could comment whether there are any configuration options besides {{hive.vectorized.execution.enabled}} or any other things that should be documented. Thanks! > Vectorized Query Execution in Hive > ---------------------------------- > > Key: HIVE-4160 > URL: https://issues.apache.org/jira/browse/HIVE-4160 > Project: Hive > Issue Type: New Feature > Reporter: Jitendra Nath Pandey > Assignee: Jitendra Nath Pandey > Attachments: Hive-Vectorized-Query-Execution-Design.docx, > Hive-Vectorized-Query-Execution-Design-rev10.docx, > Hive-Vectorized-Query-Execution-Design-rev10.docx, > Hive-Vectorized-Query-Execution-Design-rev10.pdf, > Hive-Vectorized-Query-Execution-Design-rev11.docx, > Hive-Vectorized-Query-Execution-Design-rev11.pdf, > Hive-Vectorized-Query-Execution-Design-rev2.docx, > Hive-Vectorized-Query-Execution-Design-rev3.docx, > Hive-Vectorized-Query-Execution-Design-rev3.docx, > Hive-Vectorized-Query-Execution-Design-rev3.pdf, > Hive-Vectorized-Query-Execution-Design-rev4.docx, > Hive-Vectorized-Query-Execution-Design-rev4.pdf, > Hive-Vectorized-Query-Execution-Design-rev5.docx, > Hive-Vectorized-Query-Execution-Design-rev5.pdf, > Hive-Vectorized-Query-Execution-Design-rev6.docx, > Hive-Vectorized-Query-Execution-Design-rev6.pdf, > Hive-Vectorized-Query-Execution-Design-rev7.docx, > Hive-Vectorized-Query-Execution-Design-rev8.docx, > Hive-Vectorized-Query-Execution-Design-rev8.pdf, > Hive-Vectorized-Query-Execution-Design-rev9.docx, > Hive-Vectorized-Query-Execution-Design-rev9.pdf > > > The Hive query execution engine currently processes one row at a time. A > single row of data goes through all the operators before the next row can be > processed. This mode of processing is very inefficient in terms of CPU usage. > Research has demonstrated that this yields very low instructions per cycle > [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization > and data columns go through a layer of object inspectors that identify column > type, deserialize data and determine appropriate expression routines in the > inner loop. These layers of virtual method calls further slow down the > processing. > This work will add support for vectorized query execution to Hive, where, > instead of individual rows, batches of about a thousand rows at a time are > processed. Each column in the batch is represented as a vector of a primitive > data type. The inner loop of execution scans these vectors very fast, > avoiding method calls, deserialization, unnecessary if-then-else, etc. This > substantially reduces CPU time used, and gives excellent instructions per > cycle (i.e. improved processor pipeline utilization). See the attached design > specification for more details. -- This message was sent by Atlassian JIRA (v6.1#6144)