[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results

Hive QA (JIRA) Thu, 05 Jun 2014 03:28:24 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018651#comment-14018651
 ]


Hive QA commented on HIVE-7166:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648433/HIVE-7166.1.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 5585 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_schema_evolution
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testBetweenFilters
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/392/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/392/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-392/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648433

> Vectorization with UDFs returns incorrect results
> -------------------------------------------------
>
>                 Key: HIVE-7166
>                 URL: https://issues.apache.org/jira/browse/HIVE-7166
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 0.13.0
>         Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
>            Reporter: Benjamin Bowman
>            Assignee: Hari Sankar Sivarama Subramaniyan
>            Priority: Minor
>         Attachments: HIVE-7166.1.patch
>
>
> Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
> query results. 
> Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
> X) and UDF_1
> The following test scenario will reproduce the problem:
> TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 10000):  
> package com.test;
> import org.apache.hadoop.hive.ql.exec.Description;
> import org.apache.hadoop.hive.ql.exec.UDF;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import java.lang.String;
> import java.lang.*;
> public class tenThousand extends UDF {
>   private final LongWritable result = new LongWritable();
>   public LongWritable evaluate() {
>     result.set(10000);
>     return result;
>   }
> }
> TEST DATA (test.input):
> 1|CBCABC|12
> 2|DBCABC|13
> 3|EBCABC|14
> 40000|ABCABC|15
> 50000|BBCABC|16
> 60000|CBCABC|17
> CREATING ORC TABLE:
> 0: jdbc:hive2://server:10002/db> create table testTabOrc (first bigint, 
> second varchar(20), third int) partitioned by (range int) clustered by 
> (first) sorted by (first) into 8 buckets stored as orc tblproperties 
> ("orc.compress" = "SNAPPY", "orc.index" = "true");
> CREATE LOADING TABLE:
> 0: jdbc:hive2://server:10002/db> create table loadingDir (first bigint, 
> second varchar(20), third int) partitioned by (range int) row format 
> delimited fields terminated by '|' stored as textfile;
> COPY IN DATA:
> [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
> ORC DATA:
> [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
> hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
> hive.enforce.sorting=true -e "insert into table testTabOrc partition(range) 
> select * from loadingDir;"
> LOAD TEST FUNCTION:
> 0: jdbc:hive2://server:10002/db>  add jar /opt/hadoop/lib/testFunction.jar
> 0: jdbc:hive2://server:10002/db>  create temporary function ten_thousand as 
> 'com.test.tenThousand';
> TURN OFF VECTORIZATION:
> 0: jdbc:hive2://server:10002/db>  set hive.vectorized.execution.enabled=false;
> QUERY (RESULTS AS EXPECTED):
> 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first 
> between ten_thousand()-10000 and ten_thousand()-9995;
> +--------+
> | first  |
> +--------+
> | 1      |
> | 2      |
> | 3      |
> +--------+
> 3 rows selected (15.286 seconds)
> TURN ON VECTORIZATION:
> 0: jdbc:hive2://server:10002/db>  set hive.vectorized.execution.enabled=true;
> QUERY AGAIN (WRONG RESULTS):
> 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first 
> between ten_thousand()-10000 and ten_thousand()-9995;
> +--------+
> | first  |
> +--------+
> +--------+
> No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results

Reply via email to