[ 
https://issues.apache.org/jira/browse/FLINK-39650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramin Gharib updated FLINK-39650:
---------------------------------
    Description: 
{{SqlFunctionUtils.regexpReplace}} at 
{{{}flink-table-runtime/.../SqlFunctionUtils.java:426{}}}:
{code:java}
public static String regexpReplace(String str, String regex, String 
replacement) {
    ...
    try {
        return str.replaceAll(regex, Matcher.quoteReplacement(replacement));
    } catch (Exception e) {
        LOG.error(
                String.format(
                        "Exception in regexpReplace('%s', '%s', '%s')",
                        str, regex, replacement),
                e);
        return null;
    }
}
{code}
{{String.replaceAll}} calls {{Pattern.compile(regex)}} internally on every 
invocation. Two problems on the hot path:
 * Pattern is recompiled per record even when it never changes.
 * {{PatternSyntaxException}} is caught and logged at {{ERROR}} per record.

h2. Reproducer
{code:sql}
SELECT REGEXP_REPLACE(payload, '(', 'X') FROM src;
{code}
h2. Fix
 # Add {{{}RegexpReplaceInputTypeStrategy{}}}. Same shape as the 
{{REGEXP_EXTRACT}} strategy.
 # Route \{{BuiltInFunctionDefinitions.REGEXP_REPLACE}} through it.

  was:
{{SqlFunctionUtils.regexpReplace}} at 
{{{}flink-table-runtime/.../SqlFunctionUtils.java:426{}}}:
{code:java}
public static String regexpReplace(String str, String regex, String 
replacement) {
    ...
    try {
        return str.replaceAll(regex, Matcher.quoteReplacement(replacement));
    } catch (Exception e) {
        LOG.error(
                String.format(
                        "Exception in regexpReplace('%s', '%s', '%s')",
                        str, regex, replacement),
                e);
        return null;
    }
}
{code}
{{String.replaceAll}} calls {{Pattern.compile(regex)}} internally on every 
invocation. Two problems on the hot path:
 * Pattern is recompiled per record even when it never changes.
 * {{PatternSyntaxException}} is caught and logged at {{ERROR}} per record.

h2. Reproducer
{code:sql}
SELECT REGEXP_REPLACE(payload, '(', 'X') FROM src;
{code}
h2. Fix
 # Add {{{}RegexpReplaceInputTypeStrategy{}}}. Same shape as the 
{{REGEXP_EXTRACT}} strategy.
 # Route BuiltInFunctionDefinitions.REGEXP_REP


> REGEXP_REPLACE does not validate literal regex patterns at plan time and logs 
> errors on the hot path
> ----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39650
>                 URL: https://issues.apache.org/jira/browse/FLINK-39650
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Ramin Gharib
>            Priority: Major
>
> {{SqlFunctionUtils.regexpReplace}} at 
> {{{}flink-table-runtime/.../SqlFunctionUtils.java:426{}}}:
> {code:java}
> public static String regexpReplace(String str, String regex, String 
> replacement) {
>     ...
>     try {
>         return str.replaceAll(regex, Matcher.quoteReplacement(replacement));
>     } catch (Exception e) {
>         LOG.error(
>                 String.format(
>                         "Exception in regexpReplace('%s', '%s', '%s')",
>                         str, regex, replacement),
>                 e);
>         return null;
>     }
> }
> {code}
> {{String.replaceAll}} calls {{Pattern.compile(regex)}} internally on every 
> invocation. Two problems on the hot path:
>  * Pattern is recompiled per record even when it never changes.
>  * {{PatternSyntaxException}} is caught and logged at {{ERROR}} per record.
> h2. Reproducer
> {code:sql}
> SELECT REGEXP_REPLACE(payload, '(', 'X') FROM src;
> {code}
> h2. Fix
>  # Add {{{}RegexpReplaceInputTypeStrategy{}}}. Same shape as the 
> {{REGEXP_EXTRACT}} strategy.
>  # Route \{{BuiltInFunctionDefinitions.REGEXP_REPLACE}} through it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to