[
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205942#comment-14205942
]
Gopal V commented on HIVE-5538:
-------------------------------
TPC-H Query 1 (1000 scale ORC) || Tez + vectorization || Tez + row-mode || MR +
vectorization || MR + row-mode ||
| Time Taken (seconds) | 43.821 | 142.014 | 183.386 | 273.885 |
Not sure if you tested against ORC, but Vectorization (once you cut out all the
unnecessary memory copies) makes a huge difference to performance.
Essentially for an integer/float column, the data fitting inside a L1 cache and
the operators handling isRepeating flags is huge. Even for something like a
sub-string UDF for instance, we get huge speedups because it doesn't allocate
any new strings for that (merely changes the len column in-place).
> Turn on vectorization by default.
> ---------------------------------
>
> Key: HIVE-5538
> URL: https://issues.apache.org/jira/browse/HIVE-5538
> Project: Hive
> Issue Type: Sub-task
> Reporter: Jitendra Nath Pandey
> Assignee: Matt McCline
> Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch,
> HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch,
> HIVE-5538.61.patch
>
>
> Vectorization should be turned on by default, so that users don't have to
> specifically enable vectorization.
> Vectorization code validates and ensures that a query falls back to row
> mode if it is not supported on vectorized code path.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)