[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hari Sankar Sivarama Subramaniyan updated HIVE-7166: ---------------------------------------------------- Status: Patch Available (was: Open) > Vectorization with UDFs returns incorrect results > ------------------------------------------------- > > Key: HIVE-7166 > URL: https://issues.apache.org/jira/browse/HIVE-7166 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 0.13.0 > Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster > Reporter: Benjamin Bowman > Assignee: Hari Sankar Sivarama Subramaniyan > Priority: Minor > Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch > > > Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect > query results. > Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - > X) and UDF_1 > The following test scenario will reproduce the problem: > TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 10000): > package com.test; > import org.apache.hadoop.hive.ql.exec.Description; > import org.apache.hadoop.hive.ql.exec.UDF; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import java.lang.String; > import java.lang.*; > public class tenThousand extends UDF { > private final LongWritable result = new LongWritable(); > public LongWritable evaluate() { > result.set(10000); > return result; > } > } > TEST DATA (test.input): > 1|CBCABC|12 > 2|DBCABC|13 > 3|EBCABC|14 > 40000|ABCABC|15 > 50000|BBCABC|16 > 60000|CBCABC|17 > CREATING ORC TABLE: > 0: jdbc:hive2://server:10002/db> create table testTabOrc (first bigint, > second varchar(20), third int) partitioned by (range int) clustered by > (first) sorted by (first) into 8 buckets stored as orc tblproperties > ("orc.compress" = "SNAPPY", "orc.index" = "true"); > CREATE LOADING TABLE: > 0: jdbc:hive2://server:10002/db> create table loadingDir (first bigint, > second varchar(20), third int) partitioned by (range int) row format > delimited fields terminated by '|' stored as textfile; > COPY IN DATA: > [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. > ORC DATA: > [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf > hive.exec.dynamic.partition.mode=nonstrict --hiveconf > hive.enforce.sorting=true -e "insert into table testTabOrc partition(range) > select * from loadingDir;" > LOAD TEST FUNCTION: > 0: jdbc:hive2://server:10002/db> add jar /opt/hadoop/lib/testFunction.jar > 0: jdbc:hive2://server:10002/db> create temporary function ten_thousand as > 'com.test.tenThousand'; > TURN OFF VECTORIZATION: > 0: jdbc:hive2://server:10002/db> set hive.vectorized.execution.enabled=false; > QUERY (RESULTS AS EXPECTED): > 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first > between ten_thousand()-10000 and ten_thousand()-9995; > +--------+ > | first | > +--------+ > | 1 | > | 2 | > | 3 | > +--------+ > 3 rows selected (15.286 seconds) > TURN ON VECTORIZATION: > 0: jdbc:hive2://server:10002/db> set hive.vectorized.execution.enabled=true; > QUERY AGAIN (WRONG RESULTS): > 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first > between ten_thousand()-10000 and ten_thousand()-9995; > +--------+ > | first | > +--------+ > +--------+ > No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)