[ https://issues.apache.org/jira/browse/IMPALA-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-7278. ----------------------------------- Resolution: Not A Bug > distinct clause is not working as expected with custom UDFs > ----------------------------------------------------------- > > Key: IMPALA-7278 > URL: https://issues.apache.org/jira/browse/IMPALA-7278 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 2.8.0 > Reporter: shabnam perween > Priority: Critical > > Distinct clause when executed with custom UDF returns unexpected results. > Custom UDF Definition: > udf.h file: > {code} > #ifndef IMPALA_UDF_SAMPLE_UDF_H > #define IMPALA_UDF_SAMPLE_UDF_H > #include "udf.h" > using namespace impala_udf; > #ifdef __cplusplus > extern "C" > { > #endif > StringVal udf_clear(FunctionContext* context, StringVal& sInput); > #ifdef __cplusplus > } > #endif > #endif > {code} > udf.cpp: > {code} > #include "clear.h" > StringVal udf_clear( > FunctionContext* context, > StringVal& sInput /* String to encrypt */ > ) > { > unsigned char* pReturnData = context->Allocate( 100 ); > memset( pReturnData, NULL, 100); > memcpy(pReturnData, sInput.ptr, sInput.len ); > StringVal sResult( pReturnData ); > sResult.len = sInput.len; > context->Free( (uint8_t*)pReturnData ); > return sResult; > } > {code} > CMakeLists.txt: > {code} > project (clear) > ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp ) > TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a ) > SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so") > SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "") > INSTALL ( TARGETS clear2.8_RHEL DESTINATION . ) > Query Syntax: > CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields > terminated by ',' stored as textfile; > LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear; > Query: describe clear > +------+--------+---------+ > | name | type | comment | > +------+--------+---------+ > | c1 | string | | > | c2 | string | | > +------+--------+---------+ > Fetched 2 row(s) in 0.04s > select * from clear; > +---------+---------+ > | c1 | c2 | > +---------+---------+ > | 1111111 | 1111111 | > | 1111111 | 1111111 | > | 222222 | 222222 | > | 444444 | 444444 | > | 222222 | 222222 | > | 3333333 | 3333333 | > | 3333333 | 3333333 | > +---------+---------+ > Fetched 7 row(s) in 0.14s > select distinct udf_clear(c1),c2 from clear; > +-----------------------+---------+ > | default.udf_clear(c1) | c2 | > +-----------------------+---------+ > | {color:#d04437}*222222* {color}| 444444 | <== this should be *444444* > | 222222 | 222222 | > | 3333333 | 3333333 | > | 1111111 | 1111111 | > +-----------------------+---------+ > Fetched 4 row(s) in 0.24s > {code} > > Expected result: > {code} > select distinct c1,c2 from clear; > +---------+---------+ > | c1 | c2 | > +---------+---------+ > | 444444 | 444444 | > | 222222 | 222222 | > | 3333333 | 3333333 | > | 1111111 | 1111111 | > +---------+---------+ > Fetched 4 row(s) in 0.25s > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)