[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results

Hari Sankar Sivarama Subramaniyan (JIRA) Wed, 04 Jun 2014 23:50:26 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018536#comment-14018536
 ]


Hari Sankar Sivarama Subramaniyan commented on HIVE-7166:
---------------------------------------------------------

I looked at this issue. It seems that vectorization cannot be performed 
trivially for the above example because constant folding  is supported only for 
unary expressions as of now in vectorization. Once HIVE-5771 is committed, this 
query can be vectorized. The current fix is to disable vectorization in such a 
scenario so that we fall back to row-mode.

cc-ing [~jnp] and [~ehans] for reviewing the patch.

> Vectorization with UDFs returns incorrect results
> -------------------------------------------------
>
>                 Key: HIVE-7166
>                 URL: https://issues.apache.org/jira/browse/HIVE-7166
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2, UDF, Vectorization
>    Affects Versions: 0.13.0
>         Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
>            Reporter: Benjamin Bowman
>            Assignee: Hari Sankar Sivarama Subramaniyan
>            Priority: Minor
>
> Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
> query results. 
> Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
> X) and UDF_1
> The following test scenario will reproduce the problem:
> TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 10000):  
> package com.test;
> import org.apache.hadoop.hive.ql.exec.Description;
> import org.apache.hadoop.hive.ql.exec.UDF;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import java.lang.String;
> import java.lang.*;
> public class tenThousand extends UDF {
>   private final LongWritable result = new LongWritable();
>   public LongWritable evaluate() {
>     result.set(10000);
>     return result;
>   }
> }
> TEST DATA (test.input):
> 1|CBCABC|12
> 2|DBCABC|13
> 3|EBCABC|14
> 40000|ABCABC|15
> 50000|BBCABC|16
> 60000|CBCABC|17
> CREATING ORC TABLE:
> 0: jdbc:hive2://server:10002/db> create table testTabOrc (first bigint, 
> second varchar(20), third int) partitioned by (range int) clustered by 
> (first) sorted by (first) into 8 buckets stored as orc tblproperties 
> ("orc.compress" = "SNAPPY", "orc.index" = "true");
> CREATE LOADING TABLE:
> 0: jdbc:hive2://server:10002/db> create table loadingDir (first bigint, 
> second varchar(20), third int) partitioned by (range int) row format 
> delimited fields terminated by '|' stored as textfile;
> COPY IN DATA:
> [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
> ORC DATA:
> [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
> hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
> hive.enforce.sorting=true -e "insert into table testTabOrc partition(range) 
> select * from loadingDir;"
> LOAD TEST FUNCTION:
> 0: jdbc:hive2://server:10002/db>  add jar /opt/hadoop/lib/testFunction.jar
> 0: jdbc:hive2://server:10002/db>  create temporary function ten_thousand as 
> 'com.test.tenThousand';
> TURN OFF VECTORIZATION:
> 0: jdbc:hive2://server:10002/db>  set hive.vectorized.execution.enabled=false;
> QUERY (RESULTS AS EXPECTED):
> 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first 
> between ten_thousand()-10000 and ten_thousand()-9995;
> +--------+
> | first  |
> +--------+
> | 1      |
> | 2      |
> | 3      |
> +--------+
> 3 rows selected (15.286 seconds)
> TURN ON VECTORIZATION:
> 0: jdbc:hive2://server:10002/db>  set hive.vectorized.execution.enabled=true;
> QUERY AGAIN (WRONG RESULTS):
> 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first 
> between ten_thousand()-10000 and ten_thousand()-9995;
> +--------+
> | first  |
> +--------+
> +--------+
> No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results

Reply via email to