[
https://issues.apache.org/jira/browse/FLINK-39650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ramin Gharib updated FLINK-39650:
---------------------------------
Description:
{{SqlFunctionUtils.regexpReplace}} at
{{{}flink-table-runtime/.../SqlFunctionUtils.java:426{}}}:
{code:java}
public static String regexpReplace(String str, String regex, String
replacement) {
...
try {
return str.replaceAll(regex, Matcher.quoteReplacement(replacement));
} catch (Exception e) {
LOG.error(
String.format(
"Exception in regexpReplace('%s', '%s', '%s')",
str, regex, replacement),
e);
return null;
}
}
{code}
{{String.replaceAll}} calls {{Pattern.compile(regex)}} internally on every
invocation. Two problems on the hot path:
* Pattern is recompiled per record even when it never changes.
* {{PatternSyntaxException}} is caught and logged at {{ERROR}} per record.
h2. Reproducer
{code:sql}
SELECT REGEXP_REPLACE(payload, '(', 'X') FROM src;
{code}
h2. Fix
# Add {{{}RegexpReplaceInputTypeStrategy{}}}. Same shape as the
{{REGEXP_EXTRACT}} strategy.
# Route BuiltInFunctionDefinitions.REGEXP_REP
was:
{\{SqlFunctionUtils.regexpReplace}} at
\{{flink-table-runtime/.../SqlFunctionUtils.java:426}}:
{code:java}
public static String regexpReplace(String str, String regex, String
replacement) {
...
try {
return str.replaceAll(regex, Matcher.quoteReplacement(replacement));
} catch (Exception e) {
LOG.error(
String.format(
"Exception in regexpReplace('%s', '%s', '%s')",
str, regex, replacement),
e);
return null;
}
}
{code}
{\{String.replaceAll}} calls \{{Pattern.compile(regex)}} internally on every
invocation. Two problems on the hot path:
* Pattern is recompiled per record even when it never changes.
* \{{PatternSyntaxException}} is caught and logged at \{{ERROR}} per record.
h2. Reproducer
{code:sql}
SELECT REGEXP_REPLACE(payload, '(', 'X') FROM src;
{code}
h2. Fix
# Add \{{RegexpReplaceInputTypeStrategy}}. Same shape as the
\{{REGEXP_EXTRACT}} strategy.
# Route {{BuiltInFunctionDefinitions.REGEXP_REP
> REGEXP_REPLACE does not validate literal regex patterns at plan time and logs
> errors on the hot path
> ----------------------------------------------------------------------------------------------------
>
> Key: FLINK-39650
> URL: https://issues.apache.org/jira/browse/FLINK-39650
> Project: Flink
> Issue Type: Sub-task
> Reporter: Ramin Gharib
> Priority: Major
>
> {{SqlFunctionUtils.regexpReplace}} at
> {{{}flink-table-runtime/.../SqlFunctionUtils.java:426{}}}:
> {code:java}
> public static String regexpReplace(String str, String regex, String
> replacement) {
> ...
> try {
> return str.replaceAll(regex, Matcher.quoteReplacement(replacement));
> } catch (Exception e) {
> LOG.error(
> String.format(
> "Exception in regexpReplace('%s', '%s', '%s')",
> str, regex, replacement),
> e);
> return null;
> }
> }
> {code}
> {{String.replaceAll}} calls {{Pattern.compile(regex)}} internally on every
> invocation. Two problems on the hot path:
> * Pattern is recompiled per record even when it never changes.
> * {{PatternSyntaxException}} is caught and logged at {{ERROR}} per record.
> h2. Reproducer
> {code:sql}
> SELECT REGEXP_REPLACE(payload, '(', 'X') FROM src;
> {code}
> h2. Fix
> # Add {{{}RegexpReplaceInputTypeStrategy{}}}. Same shape as the
> {{REGEXP_EXTRACT}} strategy.
> # Route BuiltInFunctionDefinitions.REGEXP_REP
--
This message was sent by Atlassian Jira
(v8.20.10#820010)