[ https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205942#comment-14205942 ]
Gopal V commented on HIVE-5538: ------------------------------- TPC-H Query 1 (1000 scale ORC) || Tez + vectorization || Tez + row-mode || MR + vectorization || MR + row-mode || | Time Taken (seconds) | 43.821 | 142.014 | 183.386 | 273.885 | Not sure if you tested against ORC, but Vectorization (once you cut out all the unnecessary memory copies) makes a huge difference to performance. Essentially for an integer/float column, the data fitting inside a L1 cache and the operators handling isRepeating flags is huge. Even for something like a sub-string UDF for instance, we get huge speedups because it doesn't allocate any new strings for that (merely changes the len column in-place). > Turn on vectorization by default. > --------------------------------- > > Key: HIVE-5538 > URL: https://issues.apache.org/jira/browse/HIVE-5538 > Project: Hive > Issue Type: Sub-task > Reporter: Jitendra Nath Pandey > Assignee: Matt McCline > Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, > HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch, > HIVE-5538.61.patch > > > Vectorization should be turned on by default, so that users don't have to > specifically enable vectorization. > Vectorization code validates and ensures that a query falls back to row > mode if it is not supported on vectorized code path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)