[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593121#comment-16593121 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on issue #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#issuecomment-416095640 @juhoautio Yes, you are right, agree! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592606#comment-16592606 ] ASF GitHub Bot commented on FLINK-9990: --- juhoautio commented on issue #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#issuecomment-415972833 How about prefixing such functions with `regexp_` instead of `regex_`? For example postgres & hive have that convention. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569224#comment-16569224 ] ASF GitHub Bot commented on FLINK-9990: --- walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207713222 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: I think putting the following in the doc should work: ``` See [java.util.regex.Matcher](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html) for more information. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569097#comment-16569097 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207701539 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: I agree this idea, but the path `docs/api/java/util/regex/Matcher.html`, how to access? It's not a complete URL. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569077#comment-16569077 ] ASF GitHub Bot commented on FLINK-9990: --- walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207700798 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: Thanks for the explanation. Sorry I was a bit confused on the wording in the doc. Maybe similar to Hive, we can add this line to the doc: ``` See docs/api/java/util/regex/Matcher.html for more information ``` This way it is immediately clear, what do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569049#comment-16569049 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207699629 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: it just returns the extractIndex-th value of match group array, just one element not an array, refers to : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF (please search 'regex_extract') This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569043#comment-16569043 ] ASF GitHub Bot commented on FLINK-9990: --- walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207699504 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: Good point. so another question is: does `REGEX_EXTRACT` returns an array of String similar to how Pattern/Matcher in java does it when extract all capturing groups? or is it concatenated? If so, what's the delimiter? (since in the code it seems only `String` type is returned. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569039#comment-16569039 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207699091 ## File path: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala ## @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase { "") } + @Test + def testRegexExtract(): Unit = { Review comment: Good point, here is a problem, I wrote this case to test : ```scala testAllApis( "foothebar".regexExtract("foo([\\w]+)", 1), //OK, the method got 'foo([\w]+)' "'foothebar'.regexExtract('foo([w]+)', 1)", //failed, the method got 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get compile error. "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)",//OK, the method got 'foo([\w]+)' but must pass four '\' "thebar" ) ``` It seems flink pre-process the regex which contains `\xxx`. A few days ago, we also met this issue when test `similar to` to match the regex which contains `\d`. cc @twalthr This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569038#comment-16569038 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207699117 ## File path: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala ## @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase { "") } + @Test + def testRegexExtract(): Unit = { Review comment: This test case : ```scala testSqlApi("REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)", "thebar") ``` can pass This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569036#comment-16569036 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207699091 ## File path: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala ## @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase { "") } + @Test + def testRegexExtract(): Unit = { Review comment: Good point, here is a problem, I wrote this case to test : ```scala testAllApis( "foothebar".regexExtract("foo([\\w]+)", 1), //OK, "'foothebar'.regexExtract('foo([w]+)', 1)", //failed, the method got 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get compile error. "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)",//OK, but must pass four '\' "thebar" ) ``` It seems flink pre-process the regex which contains `\xxx`. A few days ago, we also met this issue when test `similar to` to match the regex which contains `\d`. cc @twalthr This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569035#comment-16569035 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207699091 ## File path: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala ## @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase { "") } + @Test + def testRegexExtract(): Unit = { Review comment: Good point, here is a problem, I wrote this case to test : ```scala testAllApis( "foothebar".regexExtract("foo([\\w]+)", 1), //OK, "'foothebar'.regexExtract('foo([w]+)', 1)", //failed, the method got 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get compile error. "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)",//OK, but must pass four '\' "thebar" ) ``` It seems flink pre-process the regex which contains `\xxx`. A few days ago, we also met this issue when test `similar to` to match the regex which contains `\d`. cc @twalthr This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569034#comment-16569034 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207699091 ## File path: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala ## @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase { "") } + @Test + def testRegexExtract(): Unit = { Review comment: Good point, here is a problem, I wrote this case to test : ```scala testAllApis( "foothebar".regexExtract("foo([\\w]+)", 1), //OK, "'foothebar'.regexExtract('foo([w]+)', 1)", //failed, got 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get compile error. "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)",//OK, but must pass four '\' "thebar" ) ``` It seems flink pre-process the regex which contains `\xxx`. A few days ago, we also met this issue when test `similar to` to match the regex which contains `\d`. cc @twalthr This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569032#comment-16569032 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207698953 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: for`REGEX_EXTRACT `, it can pass 0, that means extract all. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568964#comment-16568964 ] ASF GitHub Bot commented on FLINK-9990: --- walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207694127 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: This might be more informative: ``` Return the string extracted from the `extractedIndex` capturing group using specified `regex` pattern on input string `str`. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568963#comment-16568963 ] ASF GitHub Bot commented on FLINK-9990: --- walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207694331 ## File path: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala ## @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase { "") } + @Test + def testRegexExtract(): Unit = { Review comment: Can we add a test specifically doing the backslash escape case. For example: ``` "foothebar".regexExtract("foo([\\w]+"), 1) // should return "thebar" ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568965#comment-16568965 ] ASF GitHub Bot commented on FLINK-9990: --- walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207694231 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: Also, shouldn't we add the escape instruction? ``` If regex has a backslash ('`\`'), then need to specify with '`\\`'." ``` This was actually included in the scala doc. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568966#comment-16568966 ] ASF GitHub Bot commented on FLINK-9990: --- walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#discussion_r207694473 ## File path: docs/dev/table/sql.md ## @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string) {% highlight text %} +REGEX_EXTRACT(str string, regex string, extractIndex integer) +{% endhighlight %} + + +Returns the string str extracted using specified regex pattern and index. If str or regex is null, returns null. E.g. REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns bar. Review comment: Also similar to SQL array, might be good to point out that the index starts with 1. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16563242#comment-16563242 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua commented on issue #6448: [FLINK-9990] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448#issuecomment-409123419 @sihuazhou @suez1224 @twalthr can some review this PR? thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL
[ https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561049#comment-16561049 ] ASF GitHub Bot commented on FLINK-9990: --- yanghua opened a new pull request #6448: [FLINK-9990] Add regex_extract supported in TableAPI and SQL URL: https://github.com/apache/flink/pull/6448 ## What is the purpose of the change *This pull request add regex_extract supported in TableAPI and SQL* ## Brief change log - *Add regex_extract supported in TableAPI and SQL* ## Verifying this change This change is already covered by existing tests, such as *ScalarFunctionsTest#testRegexExtract*. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (**yes** / no) - If yes, how is the feature documented? (not applicable / **docs** / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add regex_extract supported in TableAPI and SQL > --- > > Key: FLINK-9990 > URL: https://issues.apache.org/jira/browse/FLINK-9990 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > Labels: pull-request-available > > regex_extract is a very useful function, it returns a string based on a regex > pattern and a index. > For example : > {code:java} > regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.' > {code} > It is provided as a UDF in Hive, more details please see[1]. > [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF -- This message was sent by Atlassian JIRA (v7.6.3#76005)