[ 
https://issues.apache.org/jira/browse/CALCITE-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752889#comment-17752889
 ] 

Julian Hyde edited comment on CALCITE-5910 at 8/10/23 5:39 PM:
---------------------------------------------------------------

We need to make a decision whether or not to use re2, and whether to use it 
only for BigQuery-specific functions or for all functions.

In my opinion, we should continue to use java regexp for all functions, for 
now. The differences in semantics (for correct regular expressions) are small 
or zero. This is the best path because it is the quickest and simplest.

If there is a compelling reason to move to re2 (either for semantics, or for 
re2's better defense against denial-of-service expressions) then we can 
revisit, by logging a Jira case. The change might cover all functions, or just 
BigQuery functions, or be enabled using some kind of configuration property. 
This might mean changing the semantics of existing functions but, as I 
mentioned above, we believe that those differences would be small.


was (Author: julianhyde):
We need to make a decision whether or not to use re2, and whether to use it 
only for BigQuery-specific functions or for all functions.

In my opinion, we should continue to use java regexp for all functions, for 
now. The differences in semantics (for correct regular expressions) are small 
or zero. This is the best path because it is the quickest and simplest.

If there is a compelling reason to move to re2 (either for semantics, or for 
re2's better defense against denial-of-service expressions) then we can 
revisit. The change might cover all functions, or just BigQuery functions, or 
be enabled using some kind of configuration property. This might mean changing 
the semantics of existing functions but, as I mentioned above, we believe that 
those differences would be small.

> Add REGEXP_EXTRACT and REGEXP_SUBSTR functions (enabled in BigQuery library)
> ----------------------------------------------------------------------------
>
>                 Key: CALCITE-5910
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5910
>             Project: Calcite
>          Issue Type: Task
>            Reporter: Jerin John
>            Assignee: Jerin John
>            Priority: Major
>              Labels: pull-request-available
>
> Add support for 
> [REGEXP_EXTRACT|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract]
>  and 
> [REGEXP_SUBSTR|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_substr]
>  functions from BigQuery.
> *{{REGEXP_EXTRACT(value, regexp[, position[, occurrence]])}}*
> Returns the substring in {{value}} that matches the regular expression 
> {{{}regexp{}}}. Returns {{NULL}} if there is no match.
>  * If the regular expression contains a capturing group ({{{}(...){}}}), and 
> there is a match for that capturing group, that match is returned. If there 
> are multiple matches for a capturing group, the last match is returned.
>  * If {{position}} is specified, the search starts at this position in 
> {{{}value{}}}, otherwise it starts at the beginning of {{{}value{}}}.
>  * If {{occurrence}} is specified, the search returns a specific occurrence 
> of the {{regexp}} in {{{}value{}}}, otherwise returns the first match.
>  
> *{{REGEXP_SUBSTR(value, regexp[, position[, occurrence]])}}*
> Synonym for REGEXP_EXTRACT



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to