Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8900 )
Change subject: IMPALA-3282: Adds regexp_escape built-in function ...................................................................... Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/8900/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8900/2//COMMIT_MSG@10 PS2, Line 10: ".*\\+?^[](){}$!=:-#\n\r\t\v " Where does this list come from? Impala uses RE2 syntax, which does not escape '#', '!', etc. https://github.com/google/re2/wiki/Syntax http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/expr-test.cc@4159 PS2, Line 4159: TestStringValue("regexp_escape('Hello\\\\world')", "Hello\\\\world"); It seems that the parameter to the regexp_escape function is escaped once so that it is interpreted as only 1 slash, '\\', during the execution. I haven't found the particular code doing so but we'd better add a comment here. http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/string-functions-ir.cc@624 PS2, Line 624: const string input = AnyValUtil::ToString(str); We can directly iterate with the pointer here. e.g. for (char* c = str.ptr; c <= str.ptr + str.len; ++c). It saves us a copying and a malloc. http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/string-functions-ir.cc@627 PS2, Line 627: const bool need_escape = special_character_set.find(c) != special_character_set.end(); I think using std::find on the string literal might be faster than on a set<char> here. http://gerrit.cloudera.org:8080/#/c/8900/2/be/src/exprs/string-functions-ir.cc@638 PS2, Line 638: default: ss << "\\" << c; break; Use '\\'. -- To view, visit http://gerrit.cloudera.org:8080/8900 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I84c3e0ded26f6eb20794c38b75be9b25cd111e4b Gerrit-Change-Number: 8900 Gerrit-PatchSet: 2 Gerrit-Owner: Kim Jin Chul <jinc...@gmail.com> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 05 Jan 2018 00:13:31 +0000 Gerrit-HasComments: Yes