[jira] [Commented] (HIVE-16198) Vectorize GenericUDFIndex for ARRAY

Colin Ma (JIRA) Sun, 22 Oct 2017 23:44:00 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214720#comment-16214720
 ]


Colin Ma commented on HIVE-16198:
---------------------------------

hi, [~teddy.choi], [~mmccline], because of the problem HIVE-17133, I rebased 
the patch based on HIVE-2.3.0 with some minor changes. To evaluate the 
performance improvement, the following table is used:
{code}
hive> describe temperature_orc_5g;
           t_date                      string                                   
     
           city                            string                               
         
           temperatures        array<double>
hive> show tblproperties temperature_orc_5g;
           COLUMN_STATS_ACCURATE           {"BASIC_STATS":"true"}
           numFiles   20
           numRows 100000000
           rawDataSize           24100000000
           totalSize   1793960785
{code}
Tested by HIVE on Spark, with the sql {color:#59afe1}select city, 
avg(temperatures\[0\]), avg(temperatures\[5\]) from temperature_orc_5g where 
temperatures\[2\] > 20 group by city limit 10{color}, the following are the 
result:
|| ||Disable vectorization||Enable vectorization||
|execution time|{color:#d04437}34s{color}|{color:#14892c}26s{color}|
Specifically, the detail time cost for the same task which will process 
15154763 rows as follow table:
|| ||Disable vectorization||Enable vectorization||
|Time with RecorderReader|{color:#d04437}8.9s{color}|{color:#14892c}5.9s{color}|
|Time with filter 
operator|{color:#d04437}3.1s{color}|{color:#14892c}0.1s{color}|
|Time with groupBy and followup operators|10.8s|11.5s|
I think the improvement is obviously, do you know why the patch isn't committed 
until now, thanks.

> Vectorize GenericUDFIndex for ARRAY
> -----------------------------------
>
>                 Key: HIVE-16198
>                 URL: https://issues.apache.org/jira/browse/HIVE-16198
>             Project: Hive
>          Issue Type: Sub-task
>          Components: UDF, Vectorization
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>         Attachments: HIVE-16198.1.patch, HIVE-16198.2.patch, 
> HIVE-16198.3.patch
>
>
> Vectorize GenericUDFIndex for array data type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16198) Vectorize GenericUDFIndex for ARRAY

Reply via email to