[ 
https://issues.apache.org/jira/browse/CALCITE-5644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711504#comment-17711504
 ] 

Julian Hyde commented on CALCITE-5644:
--------------------------------------

The example makes it look a little more complicated than it really is - as if 
{{CONTAINS_SUBSTR}} can accept tables or rows as input. Imagine there were a 
{{CONTAINS(expr, seek)}} function that operates on two string arguments and 
expands to {{POSITION(seek IN expr) > 0}}. I think of {{CONTAINS_SUBSTR}} as a 
slight generalization of that function to accept any data type (including 
{{ARRAY}}, {{ROW}} and {{STRUCT}}).

So {{CONTAINS_SUBSTR}} is still a scalar function; much simpler to think about 
than a table function.

I'll note that the implementation of {{CONTAINS_SUBSTR}} may involve using 
full-text indexes, and therefore the query plan will have a very different 
shape than if it were not present. The function is certainly not evaluated for 
every row. But those implementation details are hidden from the user (except 
that the plan runs faster and takes fewer $ and kWh).

> Implement BigQuery CONTAINS_SUBSTR
> ----------------------------------
>
>                 Key: CALCITE-5644
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5644
>             Project: Calcite
>          Issue Type: Task
>            Reporter: Tanner Clary
>            Assignee: Tanner Clary
>            Priority: Major
>
> BigQuery offers the 
> [CONTAINS_SUBSTR|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#contains_substr]
>  function that returns {{TRUE}} if a substring is present in an expression 
> and {{FALSE}} if it is not. A basic example of this may be seen in [1]. 
> The expression can take many forms (more info in the linked doc) which makes 
> its implementation more complex than other string functions that only accept 
> arguments of type {{STRING}}. For instance, the expression to be searched can 
> be a column or table reference. 
> The function also has an optional third argument called {{json_scope}} where 
> the user can indicate the scope of JSON data (keys, values, or both) to be 
> searched for the substring.
>  I am curious if anyone has thoughts on how the search of rows or tables 
> could be implemented. I have a basic implementation (that supports 
> expressions of type {{STRING}} and nothing else) that I will open a draft PR 
> for as a starting point. To me, the challenge is implementing the additional 
> features like the {{JSON_SCOPE}} argument (seen in [2]) and performing a 
> cross field search as seen in [3]. 
> [1] {{SELECT CONTAINS_SUBSTR("hello", "he");}} would return {{TRUE}}. 
> [2] SELECT CONTAINS_SUBSTR(JSON '("lunch":"soup")', "lunch",
>          json_scope=>"JSON_VALUES") AS result;
> would return {{FALSE}}.
> [3] 
> {{SELECT *
> FROM Recipes
> WHERE CONTAINS_SUBSTR(
>   (SELECT AS STRUCT Recipes.* EXCEPT (Lunch, Dinner)),
>   'potato'
> );}} would return: 
> ||Breakfast||Lunch||Dinner||
> |Potato pancakes|Toasted cheese sandwich|Beef stroganoff|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to