[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18477 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r146653604 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -268,7 +268,10 @@ case class StringSplit(str: Expression, pattern: Expression) usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)', 'num'); + spark-sql> SELECT _FUNC_('100-200', '(\\d+)', 'num'); + num-num + + scala> SELECT _FUNC_('100-200', '(d+)', 'num'); num-num --- End diff -- > scala> spark.sql("SELECT regexp_replace('100-200', '(d+)', 'num')").collect() > Array([num-num]) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r146653358 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -375,7 +378,10 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio usage = "_FUNC_(str, regexp[, idx]) - Extracts a group that matches `regexp`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)-(\d+)', 1); + spark-sql> SELECT _FUNC_('100-200', '(\\d+)-(\\d+)', 1); + 100 + + scala> SELECT _FUNC_('100-200', '(d+)-(d+)', 1); 100 --- End diff -- > Array([100]) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r146653269 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -375,7 +378,10 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio usage = "_FUNC_(str, regexp[, idx]) - Extracts a group that matches `regexp`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)-(\d+)', 1); + spark-sql> SELECT _FUNC_('100-200', '(\\d+)-(\\d+)', 1); + 100 + + scala> SELECT _FUNC_('100-200', '(d+)-(d+)', 1); --- End diff -- > scala> spark.sql("SELECT regexp_extract('100-200', '(d+)-(d+)', 1)").collect() --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r146652140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -375,7 +378,10 @@ case class RegExpReplace(subject: Expression, regexp: Expression, rep: Expressio usage = "_FUNC_(str, regexp[, idx]) - Extracts a group that matches `regexp`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)-(\d+)', 1); + spark-sql> SELECT _FUNC_('100-200', '(\\d+)-(\\d+)', 1); + 100 + + scala> SELECT _FUNC_('100-200', '(d+)-(d+)', 1); --- End diff -- Have you tried this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user visaxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r146457833 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: Expression) usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)', 'num'); + > SELECT _FUNC_('100-200', '(\\d+)', 'num'); --- End diff -- I add spark-sql and scala to make it clear. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r126091684 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: Expression) usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)', 'num'); + > SELECT _FUNC_('100-200', '(\\d+)', 'num'); --- End diff -- Yeah, if we can. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r126087264 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: Expression) usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)', 'num'); + > SELECT _FUNC_('100-200', '(\\d+)', 'num'); --- End diff -- Is the better fix to make it clear that this example uses unescaped style @viirya ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r125298719 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: Expression) usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)', 'num'); + > SELECT _FUNC_('100-200', '(\\d+)', 'num'); --- End diff -- Hmm, when I wrote the docs on line 160, I was suggested to use unescaped characters. > Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match "\abc", a regular expression for `regexp` can be "^\\abc$". Actually, you need to write like this in spark-shell: scala> sql("SELECT like('abc', 'abc')").show +---+ |\abc LIKE \\abc| +---+ | true| +---+ scala> sql("SELECT regexp_replace('100-200', '(d+)', 'num')").show +---+ |regexp_replace(100-200, (\d+), num)| +---+ |num-num| +---+ The behavior of Spark 2 when parsing SQL string literal reads `abc` as `\abc` and `(d+)` as `(\d+)` in spark-shell. But in spark-sql, you write the queries like this: spark-sql> SELECT like('\\abc', 'abc'); true Time taken: 0.061 seconds, Fetched 1 row(s) spark-sql> SELECT regexp_replace('100-200', '(\\d+)', 'num'); num-num Time taken: 0.117 seconds, Fetched 1 row(s) So depending how the shell environment processes string escaping, the query looks different. In the docs, it seems to me that writing in unescaped style can avoid this confusion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r125268837 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: Expression) usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)', 'num'); + > SELECT _FUNC_('100-200', '(\\d+)', 'num'); --- End diff -- @viirya I'm not an expert here, but reading the docs on line 160, I think this needs to be escaped in order to be consistent with Spark 2 default behavior? my assumption was that this was just never updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r125204558 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: Expression) usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)', 'num'); + > SELECT _FUNC_('100-200', '(\\d+)', 'num'); --- End diff -- Do we need to fix this? I remember in the doc, we use unescaped characters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18477#discussion_r125014855 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -268,7 +268,7 @@ case class StringSplit(str: Expression, pattern: Expression) usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that match `regexp` with `rep`.", extended = """ Examples: - > SELECT _FUNC_('100-200', '(\d+)', 'num'); + > SELECT _FUNC_('100-200', '(\\d+)', 'num'); --- End diff -- There is another example that needs the same change near the end of the file too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix
GitHub user visaxin opened a pull request: https://github.com/apache/spark/pull/18477 [SPARK-21261][DOCS]SQL Regex document fix SQL regex fix change: SELECT _FUNC_('100-200', '(\d+)', 'num') => SELECT _FUNC_('100-200', '(\\d+)', 'num') You can merge this pull request into a Git repository by running: $ git pull https://github.com/visaxin/spark FixSQLDocuments Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18477.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18477 commit 1850d873ac4cf2b2cd66ede07958f17b4849c829 Author: jason.zhang Date: 2017-06-30T03:27:08Z regex document fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org