Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8900 )

Change subject: IMPALA-3282: Adds regexp_escape built-in function
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8900/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8900/2//COMMIT_MSG@10
PS2, Line 10: ".*\\+?^[](){}$!=:-#\n\r\t\v "
Where does this list come from? Impala uses RE2 syntax, which does not escape 
'#', '!', etc. https://github.com/google/re2/wiki/Syntax


http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/expr-test.cc@4159
PS2, Line 4159:   TestStringValue("regexp_escape('Hello\\\\world')", 
"Hello\\\\world");
It seems that the parameter to the regexp_escape function is escaped once so 
that it is interpreted as only 1 slash, '\\', during the execution. I haven't 
found the particular code doing so but we'd better add a comment here.


http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/string-functions-ir.cc@624
PS2, Line 624:   const string input = AnyValUtil::ToString(str);
We can directly iterate with the pointer here. e.g. for (char* c = str.ptr; c 
<= str.ptr + str.len; ++c). It saves us a copying and a malloc.


http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/string-functions-ir.cc@627
PS2, Line 627:     const bool need_escape = special_character_set.find(c) != 
special_character_set.end();
I think using std::find on the string literal might be faster than on a 
set<char> here.


http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/string-functions-ir.cc@638
PS2, Line 638:       default: ss << "\\" << c; break;
Use '\\'.



--
To view, visit http://gerrit.cloudera.org:8080/8900
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I84c3e0ded26f6eb20794c38b75be9b25cd111e4b
Gerrit-Change-Number: 8900
Gerrit-PatchSet: 2
Gerrit-Owner: Kim Jin Chul <jinc...@gmail.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Jan 2018 00:13:31 +0000
Gerrit-HasComments: Yes

Reply via email to