[
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692464#comment-17692464
]
ASF GitHub Bot commented on DRILL-8402:
---------------------------------------
cgivre commented on code in PR #2762:
URL: https://github.com/apache/drill/pull/2762#discussion_r1115197954
##########
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java:
##########
@@ -293,6 +293,115 @@ public void eval() {
}
}
+ /*
+ * This function returns the capturing groups from a regex.
+ */
+ @FunctionTemplate(name = "regexp_extract", scope = FunctionScope.SIMPLE,
+ outputWidthCalculatorType =
OutputWidthCalculatorType.CUSTOM_FIXED_WIDTH_DEFAULT)
+ public static class RegexpExtract implements DrillSimpleFunc {
+
+ @Param VarCharHolder input;
+ @Param(constant=true) VarCharHolder pattern;
+ @Inject
+ DrillBuf buffer;
+ @Workspace
+ java.util.regex.Matcher matcher;
+ @Workspace
+ org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper;
+ @Output
+ ComplexWriter out;
+
+ @Override
+ public void setup() {
+ matcher =
java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
pattern.end, pattern.buffer)).matcher("");
+ charSequenceWrapper = new
org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper();
+ matcher.reset(charSequenceWrapper);
+ }
+
+ @Override
+ public void eval() {
+ charSequenceWrapper.setBuffer(input.start, input.end, input.buffer);
+
+ // Reusing same charSequenceWrapper, no need to pass it in.
+ matcher.reset();
+ boolean result = matcher.find();
+
+ // Start the list here. If there are no matches, we return an empty
list.
+ org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter
listWriter = out.rootAsList();
+ listWriter.startList();
+
+ if (result) {
+ org.apache.drill.exec.vector.complex.writer.VarCharWriter
varCharWriter = listWriter.varChar();
+ String extractedResult;
+ for(int i = 1; i <= matcher.groupCount(); i++) {
+ extractedResult = matcher.group(i);
Review Comment:
Fixed.
> Add REGEXP_EXTRACT Function
> ---------------------------
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
> Issue Type: Improvement
> Components: Functions - Drill
> Affects Versions: 1.21.0
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(<text>, <pattern>) which returns an array of strings which
> were captured by capturing groups in the regex.
> regexp_extract(<text>, <pattern>, <index>) returns the text captured by a
> specific capturing group.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)