[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593121#comment-16593121
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on issue #6448: [FLINK-9990] [table] Add regex_extract 
supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#issuecomment-416095640
 
 
   @juhoautio Yes, you are right, agree!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592606#comment-16592606
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

juhoautio commented on issue #6448: [FLINK-9990] [table] Add regex_extract 
supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#issuecomment-415972833
 
 
   How about prefixing such functions with `regexp_` instead of `regex_`?
   
   For example postgres & hive have that convention.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569224#comment-16569224
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207713222
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   I think putting the following in the doc should work:
   ```
   See 
[java.util.regex.Matcher](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html)
 for more information.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569097#comment-16569097
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207701539
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   I agree this idea, but the path `docs/api/java/util/regex/Matcher.html`, how 
to access? It's not a complete URL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569077#comment-16569077
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207700798
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   Thanks for the explanation. Sorry I was a bit confused on the wording in the 
doc. Maybe similar to Hive, we can add this line to the doc: 
   ```
   See docs/api/java/util/regex/Matcher.html for more information
   ```
   This way it is immediately clear, what do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569049#comment-16569049
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207699629
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   it just returns the extractIndex-th value of match group array, just one 
element not an array, refers to : 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF (please 
search 'regex_extract')


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569043#comment-16569043
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207699504
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   Good point. so another question is: does `REGEX_EXTRACT` returns an array of 
String similar to how Pattern/Matcher in java does it when extract all 
capturing groups? or is it concatenated? If so, what's the delimiter? (since in 
the code it seems only `String` type is returned.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569039#comment-16569039
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207699091
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala
 ##
 @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
   "")
   }
 
+  @Test
+  def testRegexExtract(): Unit = {
 
 Review comment:
   Good point, here is a problem, I wrote this case to test  : 
   
   ```scala
   testAllApis(
 "foothebar".regexExtract("foo([\\w]+)", 1), //OK, the 
method got 'foo([\w]+)'
 "'foothebar'.regexExtract('foo([w]+)', 1)",  //failed, the 
method got 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get 
compile error.
 "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)",//OK, the method 
got 'foo([\w]+)' but must pass four '\'
 "thebar"
   )
   ```
   
   It seems flink pre-process the regex which contains `\xxx`. A few days ago, 
we also met this issue when test `similar to` to match the regex which contains 
`\d`.
   
   cc @twalthr 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569038#comment-16569038
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207699117
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala
 ##
 @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
   "")
   }
 
+  @Test
+  def testRegexExtract(): Unit = {
 
 Review comment:
   This test case : 
   
   ```scala
   testSqlApi("REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)", "thebar")
   ```
   
   can pass


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569036#comment-16569036
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207699091
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala
 ##
 @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
   "")
   }
 
+  @Test
+  def testRegexExtract(): Unit = {
 
 Review comment:
   Good point, here is a problem, I wrote this case to test  : 
   
   ```scala
   testAllApis(
 "foothebar".regexExtract("foo([\\w]+)", 1), //OK,
 "'foothebar'.regexExtract('foo([w]+)', 1)",  //failed, the 
method got 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get 
compile error.
 "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)",//OK, but must 
pass four '\'
 "thebar"
   )
   ```
   
   It seems flink pre-process the regex which contains `\xxx`. A few days ago, 
we also met this issue when test `similar to` to match the regex which contains 
`\d`.
   
   cc @twalthr 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569035#comment-16569035
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207699091
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala
 ##
 @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
   "")
   }
 
+  @Test
+  def testRegexExtract(): Unit = {
 
 Review comment:
   Good point, here is a problem, I wrote this case to test  : 
   
   ```scala
   testAllApis(
 "foothebar".regexExtract("foo([\\w]+)", 1), //OK,
 "'foothebar'.regexExtract('foo([w]+)', 1)",  //failed, the 
method got 'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get 
compile error.
 "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)",//OK, but must pass 
four '\'
 "thebar"
   )
   ```
   
   It seems flink pre-process the regex which contains `\xxx`. A few days ago, 
we also met this issue when test `similar to` to match the regex which contains 
`\d`.
   
   cc @twalthr 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569034#comment-16569034
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207699091
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala
 ##
 @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
   "")
   }
 
+  @Test
+  def testRegexExtract(): Unit = {
 
 Review comment:
   Good point, here is a problem, I wrote this case to test  : 
   
   ```scala
   testAllApis(
 "foothebar".regexExtract("foo([\\w]+)", 1), //OK,
 "'foothebar'.regexExtract('foo([w]+)', 1)",  //failed, got 
'foo([\\w]+)' returns "", but if pass 'foo([\\w]+)' would get compile error.
 "REGEX_EXTRACT('foothebar', 'foo([w]+)', 1)",//OK, but must pass 
four '\'
 "thebar"
   )
   ```
   
   It seems flink pre-process the regex which contains `\xxx`. A few days ago, 
we also met this issue when test `similar to` to match the regex which contains 
`\d`.
   
   cc @twalthr 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569032#comment-16569032
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207698953
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   for`REGEX_EXTRACT `,  it can pass 0, that means extract all.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568964#comment-16568964
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207694127
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   This might be more informative:
   ```
   Return the string extracted from the `extractedIndex` capturing group using 
specified `regex` pattern on input string `str`.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568963#comment-16568963
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207694331
 
 

 ##
 File path: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala
 ##
 @@ -450,6 +450,40 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
   "")
   }
 
+  @Test
+  def testRegexExtract(): Unit = {
 
 Review comment:
   Can we add a test specifically doing the backslash escape case. For example: 
   ```
   "foothebar".regexExtract("foo([\\w]+"), 1) // should return "thebar"
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568965#comment-16568965
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207694231
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   Also, shouldn't we add the escape instruction?
   ```
   If regex has a backslash ('`\`'), then need to specify with '`\\`'."
   ```
   This was actually included in the scala doc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568966#comment-16568966
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

walterddr commented on a change in pull request #6448: [FLINK-9990] [table] Add 
regex_extract supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#discussion_r207694473
 
 

 ##
 File path: docs/dev/table/sql.md
 ##
 @@ -1842,6 +1842,16 @@ RPAD(text string, len integer, pad string)
 
   
 {% highlight text %}
+REGEX_EXTRACT(str string, regex string, extractIndex integer)
+{% endhighlight %}
+  
+  
+Returns the string str extracted using specified regex pattern and 
index. If str or regex is null, returns null. E.g. 
REGEX_EXTRACT('foothebar', 'foo(.*?)(bar)', 2) returns 
bar.
 
 Review comment:
   Also similar to SQL array, might be good to point out that the index starts 
with 1.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16563242#comment-16563242
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua commented on issue #6448: [FLINK-9990] Add regex_extract supported in 
TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448#issuecomment-409123419
 
 
@sihuazhou @suez1224 @twalthr can some review this PR? thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9990) Add regex_extract supported in TableAPI and SQL

2018-07-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561049#comment-16561049
 ] 

ASF GitHub Bot commented on FLINK-9990:
---

yanghua opened a new pull request #6448: [FLINK-9990] Add regex_extract 
supported in TableAPI and SQL
URL: https://github.com/apache/flink/pull/6448
 
 
   ## What is the purpose of the change
   
   *This pull request add regex_extract supported in TableAPI and SQL*
   
   ## Brief change log
   
 - *Add regex_extract supported in TableAPI and SQL*
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
*ScalarFunctionsTest#testRegexExtract*.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (**yes** / no)
 - If yes, how is the feature documented? (not applicable / **docs** / 
JavaDocs / not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add regex_extract supported in TableAPI and SQL
> ---
>
> Key: FLINK-9990
> URL: https://issues.apache.org/jira/browse/FLINK-9990
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API & SQL
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>  Labels: pull-request-available
>
> regex_extract is a very useful function, it returns a string based on a regex 
> pattern and a index.
> For example : 
> {code:java}
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) // returns 'bar.'
> {code}
> It is provided as a UDF in Hive, more details please see[1].
> [1]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)