>From Dmitry Lychagin <[email protected]>: Dmitry Lychagin has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243 )
Change subject: [ASTERIXDB-2680][FUN] Add support to regexp_matches() and regexp_split() ...................................................................... Patch Set 3: (4 comments) https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243/3/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_split/003/regexp_split.000.query.sqlpp File asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_split/003/regexp_split.000.query.sqlpp: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243/3/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_split/003/regexp_split.000.query.sqlpp@21 PS3, Line 21: regexp_split("C:\\Program Files\\couchbase\\server\\bin", "[\\\\]"), replace "couchbase" with something else in this and next line. (AsterixDB, for example) https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243/3/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/StringRegExpMatchesDescriptor.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/StringRegExpMatchesDescriptor.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243/3/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/StringRegExpMatchesDescriptor.java@82 PS3, Line 82: for (String match : matches) { This is the find()/group() loop, so we can just do: while (matcher.find()) { String match = matcher.group(); ... } https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243/3/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/StringRegExpSplitDescriptor.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/StringRegExpSplitDescriptor.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243/3/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/StringRegExpSplitDescriptor.java@71 PS3, Line 71: String[] splits = srcPtr.toString().split(patternPtr.toString()); At some point we should reimplement it using RegExpMatcher. String.split() compiles regex pattern and creates a return array for each invocation. RegExpMatcher reuses the Matcher as long as the pattern remains the same. We could save on object construction if we do our own implementation. Please file a bug for tracking this optimization you'd rather do it in a separate change. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243/3/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java File asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243/3/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java@213 PS3, Line 213: public List<String> getMatches() { we already have a public find() method, let's just introduce group() method that'd return matcher.group(). then the caller can do find()/group() loop.Then we can also remove allMatches field -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/4243 To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: Iccf5ba14f5c8b8cf4bcd6dd6e412bb515d68dd74 Gerrit-Change-Number: 4243 Gerrit-PatchSet: 3 Gerrit-Owner: Hussain Towaileb <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Dmitry Lychagin <[email protected]> Gerrit-Reviewer: Hussain Towaileb <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Comment-Date: Tue, 03 Dec 2019 23:38:04 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No Gerrit-MessageType: comment
