[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205942#comment-14205942
 ] 

Gopal V commented on HIVE-5538:
-------------------------------

TPC-H Query 1 (1000 scale ORC) || Tez + vectorization || Tez + row-mode || MR + 
vectorization || MR + row-mode ||
| Time Taken (seconds) | 43.821 | 142.014 | 183.386  | 273.885 |

Not sure if you tested against ORC, but Vectorization (once you cut out all the 
unnecessary memory copies) makes a huge difference to performance.

Essentially for an integer/float column, the data fitting inside a L1 cache and 
the operators handling isRepeating flags is huge. Even for something like a 
sub-string UDF for instance, we get huge speedups because it doesn't allocate 
any new strings for that (merely changes the len column in-place).

> Turn on vectorization by default.
> ---------------------------------
>
>                 Key: HIVE-5538
>                 URL: https://issues.apache.org/jira/browse/HIVE-5538
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Jitendra Nath Pandey
>            Assignee: Matt McCline
>         Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
> HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch, 
> HIVE-5538.61.patch
>
>
>   Vectorization should be turned on by default, so that users don't have to 
> specifically enable vectorization. 
>   Vectorization code validates and ensures that a query falls back to row 
> mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to