Ramin Gharib created FLINK-39649:
------------------------------------
Summary: REGEXP_EXTRACT plan-time validation and hot-path log
cleanup
Key: FLINK-39649
URL: https://issues.apache.org/jira/browse/FLINK-39649
Project: Flink
Issue Type: Sub-task
Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime
Reporter: Ramin Gharib
SqlFunctionUtils.regexpExtract compiles the regex per record and emits
LOG.error on PatternSyntaxException. The pattern is known at planning time when
it is a string literal.
h3. Reproducer
{code:java}
SELECT REGEXP_EXTRACT(payload, '(', 1) FROM src; {code}
'(' is an unbalanced group. The job plans successfully and the runtime emits
one stack trace per record processed.
h3.
Fix
# Add RegexpExtractInputTypeStrategy. Compiles literal regex during
inferInputTypes, fails via callContext.fail(...).
# Route BuiltInFunctionDefinitions.REGEXP_EXTRACT through it
# Update SqlFunctionUtils.regexpExtract to use REGEXP_PATTERN_CACHE and
silently return null on compile failure. No LOG.error on the hot path.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)