shabnam perween created IMPALA-7278:
---------------------------------------

             Summary: distinct clause is not working as expected with custom 
UDFs
                 Key: IMPALA-7278
                 URL: https://issues.apache.org/jira/browse/IMPALA-7278
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.8.0
            Reporter: shabnam perween


Distinct clause when executed with custom UDF returns unexpected results.

Custom UDF Definition:

udf.h file:

==========

#ifndef IMPALA_UDF_SAMPLE_UDF_H
#define IMPALA_UDF_SAMPLE_UDF_H

#include "udf.h"

using namespace impala_udf;

#ifdef __cplusplus
extern "C"
{
#endif

 

StringVal udf_clear(FunctionContext* context, StringVal& sInput);
#ifdef __cplusplus
}
#endif
#endif

udf.cpp:

========

#include "clear.h"

StringVal udf_clear(
 FunctionContext* context,
 StringVal& sInput /* String to encrypt */
 )
{
 unsigned char* pReturnData = context->Allocate( 100 );
 memset( pReturnData, NULL, 100);
 memcpy(pReturnData, sInput.ptr, sInput.len );
 StringVal sResult( pReturnData );
 sResult.len = sInput.len;
 context->Free( (uint8_t*)pReturnData );
 return sResult;
}

CMakeLists.txt:

===============

project (clear)
 ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp )
 TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a )
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so")
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "")
 INSTALL ( TARGETS clear2.8_RHEL DESTINATION . )

Query Syntax:

CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields 
terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear;

Query: describe clear
+------+--------+---------+
| name | type | comment |
+------+--------+---------+
| c1 | string | |
| c2 | string | |
+------+--------+---------+
Fetched 2 row(s) in 0.04s

select * from clear;
+---------+---------+
| c1 | c2 |
+---------+---------+
| 1111111 | 1111111 |
| 1111111 | 1111111 |
| 222222 | 222222 |
| 444444 | 444444 |
| 222222 | 222222 |
| 3333333 | 3333333 |
| 3333333 | 3333333 |
+---------+---------+
Fetched 7 row(s) in 0.14s

select distinct udf_clear(c1),c2 from clear;
+-----------------------+---------+
| default.udf_clear(c1) | c2 |
+-----------------------+---------+
| {color:#d04437}*222222* {color}| 444444 |   <== this should be *444444* 
| 222222 | 222222 |
| 3333333 | 3333333 |
| 1111111 | 1111111 |
+-----------------------+---------+
Fetched 4 row(s) in 0.24s

 

Expected result:

select distinct c1,c2 from clear;

+---------+---------+
| c1 | c2 |
+---------+---------+
| 444444 | 444444 |
| 222222 | 222222 |
| 3333333 | 3333333 |
| 1111111 | 1111111 |
+---------+---------+
Fetched 4 row(s) in 0.25s

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to