[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18477


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-10-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r146653604
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -268,7 +268,10 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` 
that match `regexp` with `rep`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)', 'num');
+  spark-sql> SELECT _FUNC_('100-200', '(\\d+)', 'num');
+   num-num
+
+  scala> SELECT _FUNC_('100-200', '(d+)', 'num');
num-num
--- End diff --

> scala> spark.sql("SELECT regexp_replace('100-200', '(d+)', 
'num')").collect()
> Array([num-num])



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-10-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r146653358
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -375,7 +378,10 @@ case class RegExpReplace(subject: Expression, regexp: 
Expression, rep: Expressio
   usage = "_FUNC_(str, regexp[, idx]) - Extracts a group that matches 
`regexp`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)-(\d+)', 1);
+  spark-sql> SELECT _FUNC_('100-200', '(\\d+)-(\\d+)', 1);
+   100
+
+  scala> SELECT _FUNC_('100-200', '(d+)-(d+)', 1);
100
--- End diff --

> Array([100])


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-10-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r146653269
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -375,7 +378,10 @@ case class RegExpReplace(subject: Expression, regexp: 
Expression, rep: Expressio
   usage = "_FUNC_(str, regexp[, idx]) - Extracts a group that matches 
`regexp`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)-(\d+)', 1);
+  spark-sql> SELECT _FUNC_('100-200', '(\\d+)-(\\d+)', 1);
+   100
+
+  scala> SELECT _FUNC_('100-200', '(d+)-(d+)', 1);
--- End diff --

> scala> spark.sql("SELECT regexp_extract('100-200', '(d+)-(d+)', 
1)").collect()


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-10-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r146652140
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -375,7 +378,10 @@ case class RegExpReplace(subject: Expression, regexp: 
Expression, rep: Expressio
   usage = "_FUNC_(str, regexp[, idx]) - Extracts a group that matches 
`regexp`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)-(\d+)', 1);
+  spark-sql> SELECT _FUNC_('100-200', '(\\d+)-(\\d+)', 1);
+   100
+
+  scala> SELECT _FUNC_('100-200', '(d+)-(d+)', 1);
--- End diff --

Have you tried this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-10-23 Thread visaxin
Github user visaxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r146457833
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` 
that match `regexp` with `rep`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)', 'num');
+  > SELECT _FUNC_('100-200', '(\\d+)', 'num');
--- End diff --

I add spark-sql and scala to make it clear.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-07-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r126091684
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` 
that match `regexp` with `rep`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)', 'num');
+  > SELECT _FUNC_('100-200', '(\\d+)', 'num');
--- End diff --

Yeah, if we can.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-07-07 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r126087264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` 
that match `regexp` with `rep`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)', 'num');
+  > SELECT _FUNC_('100-200', '(\\d+)', 'num');
--- End diff --

Is the better fix to make it clear that this example uses unescaped style 
@viirya ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-07-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r125298719
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` 
that match `regexp` with `rep`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)', 'num');
+  > SELECT _FUNC_('100-200', '(\\d+)', 'num');
--- End diff --

Hmm, when I wrote the docs on line 160, I was suggested to use unescaped 
characters.

> Since Spark 2.0, string literals (including regex patterns) are unescaped 
in our SQL parser. For example, to match "\abc", a regular expression for 
`regexp` can be "^\\abc$".

Actually, you need to write like this in spark-shell:

scala> sql("SELECT like('abc', 'abc')").show
+---+
|\abc LIKE \\abc|
+---+
|   true|
+---+

scala> sql("SELECT regexp_replace('100-200', '(d+)', 'num')").show
+---+
|regexp_replace(100-200, (\d+), num)|
+---+
|num-num|
+---+


The behavior of Spark 2 when parsing SQL string literal reads `abc`  as 
`\abc` and `(d+)` as `(\d+)` in spark-shell.

But in spark-sql, you write the queries like this:

spark-sql> SELECT like('\\abc', 'abc');
true
Time taken: 0.061 seconds, Fetched 1 row(s)

spark-sql> SELECT regexp_replace('100-200', '(\\d+)', 'num');
num-num
Time taken: 0.117 seconds, Fetched 1 row(s)

So depending how the shell environment processes string escaping, the query 
looks different. In the docs, it seems to me that writing in unescaped style 
can avoid this confusion?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-07-03 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r125268837
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` 
that match `regexp` with `rep`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)', 'num');
+  > SELECT _FUNC_('100-200', '(\\d+)', 'num');
--- End diff --

@viirya I'm not an expert here, but reading the docs on line 160, I think 
this needs to be escaped in order to be consistent with Spark 2 default 
behavior? my assumption was that this was just never updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-07-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r125204558
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` 
that match `regexp` with `rep`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)', 'num');
+  > SELECT _FUNC_('100-200', '(\\d+)', 'num');
--- End diff --

Do we need to fix this? I remember in the doc, we use unescaped characters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-06-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18477#discussion_r125014855
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: 
Expression)
   usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` 
that match `regexp` with `rep`.",
   extended = """
 Examples:
-  > SELECT _FUNC_('100-200', '(\d+)', 'num');
+  > SELECT _FUNC_('100-200', '(\\d+)', 'num');
--- End diff --

There is another example that needs the same change near the end of the 
file too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2017-06-29 Thread visaxin
GitHub user visaxin opened a pull request:

https://github.com/apache/spark/pull/18477

[SPARK-21261][DOCS]SQL Regex document fix

SQL regex fix change:
SELECT _FUNC_('100-200', '(\d+)', 'num') => SELECT _FUNC_('100-200', 
'(\\d+)', 'num')

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/visaxin/spark FixSQLDocuments

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18477


commit 1850d873ac4cf2b2cd66ede07958f17b4849c829
Author: jason.zhang 
Date:   2017-06-30T03:27:08Z

regex document fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org