Ramin Gharib created FLINK-39651:
------------------------------------
Summary: REGEXP predicate does not validate literal regex patterns
at plan time and logs errors on the hot path
Key: FLINK-39651
URL: https://issues.apache.org/jira/browse/FLINK-39651
Project: Flink
Issue Type: Sub-task
Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime
Reporter: Ramin Gharib
{\{SqlFunctionUtils.regExp}} at
\{{flink-table-runtime/.../SqlFunctionUtils.java:1017}}:
{code:java}
public static Boolean regExp(String s, String regex) {
if (regex.length() == 0) {
return false;
}
try {
return (REGEXP_PATTERN_CACHE.get(regex)).matcher(s).find(0);
} catch (Exception e) {
LOG.error("Exception when compile and match regex:" + regex + " on: " +
s, e);
return false;
}
}
{code}
Cached compilation is already in place. The remaining problem is the
\{{LOG.error}} on the hot path. A bad literal regex still produces one stack
trace per record.
h2. Reproducer
{code:sql}
SELECT * FROM src WHERE payload REGEXP '(';
{code}
h2. Fix
# Add \{{RegexpPredicateInputTypeStrategy}}. Same shape as the
\{{REGEXP_EXTRACT}} strategy.
# Route \{{BuiltInFunctionDefinitions.REGEXP}} through it.
# Drop the \{{LOG.error}} and silently return \{{false}} on
\{{PatternSyntaxException}}. No \{{LOG.error}} on the hot path.
h2. Tests
* \{{RegexpPredicateInputTypeStrategyTest}}.
* Regression coverage in the predicate IT case (\{{ScalarFunctionsTest}} or
equivalent).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)