[ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13653430#comment-13653430 ]
Eric Hanson commented on HIVE-4160: ----------------------------------- The code for this work is currently in the "vectorization" branch of the public Hive repo. > Vectorized Query Execution in Hive > ---------------------------------- > > Key: HIVE-4160 > URL: https://issues.apache.org/jira/browse/HIVE-4160 > Project: Hive > Issue Type: New Feature > Reporter: Jitendra Nath Pandey > Assignee: Jitendra Nath Pandey > Attachments: Hive-Vectorized-Query-Execution-Design.docx, > Hive-Vectorized-Query-Execution-Design-rev2.docx, > Hive-Vectorized-Query-Execution-Design-rev3.docx, > Hive-Vectorized-Query-Execution-Design-rev3.docx, > Hive-Vectorized-Query-Execution-Design-rev3.pdf, > Hive-Vectorized-Query-Execution-Design-rev4.docx, > Hive-Vectorized-Query-Execution-Design-rev4.pdf, > Hive-Vectorized-Query-Execution-Design-rev5.docx, > Hive-Vectorized-Query-Execution-Design-rev5.pdf, > Hive-Vectorized-Query-Execution-Design-rev6.docx, > Hive-Vectorized-Query-Execution-Design-rev6.pdf > > > The Hive query execution engine currently processes one row at a time. A > single row of data goes through all the operators before the next row can be > processed. This mode of processing is very inefficient in terms of CPU usage. > Research has demonstrated that this yields very low instructions per cycle > [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization > and data columns go through a layer of object inspectors that identify column > type, deserialize data and determine appropriate expression routines in the > inner loop. These layers of virtual method calls further slow down the > processing. > This work will add support for vectorized query execution to Hive, where, > instead of individual rows, batches of about a thousand rows at a time are > processed. Each column in the batch is represented as a vector of a primitive > data type. The inner loop of execution scans these vectors very fast, > avoiding method calls, deserialization, unnecessary if-then-else, etc. This > substantially reduces CPU time used, and gives excellent instructions per > cycle (i.e. improved processor pipeline utilization). See the attached design > specification for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira