Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17580


Change subject: IMPALA-2019(Part-2): Provide UTF-8 support in instr() and 
locate()
......................................................................

IMPALA-2019(Part-2): Provide UTF-8 support in instr() and locate()

Similar to the previous patch, this patch adds UTF-8 support in instr()
and locate() builtin functions so they can have consistent behaviors
with Hive's. These two string functions both have an optional argument
as position:
INSTR(STRING str, STRING substr[, BIGINT position[, BIGINT occurrence]])
LOCATE(STRING substr, STRING str[, INT pos])
Their return values are positions of the matched substring.

In UTF-8 mode (turned on by set UTF8_MODE=true), these positions are
counted by UTF-8 characters instead of bytes.

Tests:
 - Add BE unit tests and e2e tests

Change-Id: Ic13c3d04649c1aea56c1aaa464799b5e4674f662
---
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/util/CMakeLists.txt
M be/src/util/bit-util.h
M be/src/util/string-util-test.cc
M be/src/util/string-util.cc
M be/src/util/string-util.h
M 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
8 files changed, 235 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/17580/1
--
To view, visit http://gerrit.cloudera.org:8080/17580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic13c3d04649c1aea56c1aaa464799b5e4674f662
Gerrit-Change-Number: 17580
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>

Reply via email to