spark git commit: [SPARK-20399][SQL] Add a config to fallback string literal parsing consistent with old sql parser behavior

2017-05-11 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master 04901dd03 -> 609ba5f2b


[SPARK-20399][SQL] Add a config to fallback string literal parsing consistent 
with old sql parser behavior

## What changes were proposed in this pull request?

The new SQL parser is introduced into Spark 2.0. All string literals are 
unescaped in parser. Seems it bring an issue regarding the regex pattern string.

The following codes can reproduce it:

val data = Seq("\u0020\u0021\u0023", "abc")
val df = data.toDF()

// 1st usage: works in 1.6
// Let parser parse pattern string
val rlike1 = df.filter("value rlike '^\\x20[\\x20-\\x23]+$'")
// 2nd usage: works in 1.6, 2.x
// Call Column.rlike so the pattern string is a literal which doesn't go 
through parser
val rlike2 = df.filter($"value".rlike("^\\x20[\\x20-\\x23]+$"))

// In 2.x, we need add backslashes to make regex pattern parsed correctly
val rlike3 = df.filter("value rlike '^x20[x20-x23]+$'")

Follow the discussion in #17736, this patch adds a config to fallback to 1.6 
string literal parsing and mitigate migration issue.

## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: Liang-Chi Hsieh 

Closes #17887 from viirya/add-config-fallback-string-parsing.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/609ba5f2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/609ba5f2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/609ba5f2

Branch: refs/heads/master
Commit: 609ba5f2b9fd89b1b9971d08f7cc680d202dbc7c
Parents: 04901dd
Author: Liang-Chi Hsieh 
Authored: Fri May 12 11:15:10 2017 +0800
Committer: Wenchen Fan 
Committed: Fri May 12 11:15:10 2017 +0800

--
 .../sql/catalyst/catalog/SessionCatalog.scala   |   2 +-
 .../expressions/regexpExpressions.scala |  33 -
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  11 +-
 .../spark/sql/catalyst/parser/ParseDriver.scala |   8 +-
 .../spark/sql/catalyst/parser/ParserUtils.scala |   6 +
 .../org/apache/spark/sql/internal/SQLConf.scala |  10 ++
 .../catalyst/parser/ExpressionParserSuite.scala | 128 +--
 .../spark/sql/execution/SparkSqlParser.scala|   2 +-
 .../org/apache/spark/sql/DatasetSuite.scala |  13 ++
 9 files changed, 171 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/609ba5f2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 18e5146..f6653d3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -73,7 +73,7 @@ class SessionCatalog(
   functionRegistry,
   conf,
   new Configuration(),
-  CatalystSqlParser,
+  new CatalystSqlParser(conf),
   DummyFunctionResourceLoader)
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/609ba5f2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index 3fa8458..aa5a1b5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -86,6 +86,13 @@ abstract class StringRegexExpression extends BinaryExpression
 escape character, the following character is matched literally. It is 
invalid to escape
 any other character.
 
+Since Spark 2.0, string literals are unescaped in our SQL parser. For 
example, in order
+to match "\abc", the pattern should be "\\abc".
+
+When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, 
it fallbacks
+to Spark 1.6 behavior regarding string literal parsing. For example, 
if the config is
+enabled, the pattern to match "\abc" should be "\abc".
+
 Examples:
   > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%'
   true
@@ -144,7 +151,31 @@ case class Like(left: Expression, right: Expression) 
extends StringRegexExpressi
 }
 
 @ExpressionDescription(
-  

spark git commit: [SPARK-20399][SQL] Add a config to fallback string literal parsing consistent with old sql parser behavior

2017-05-11 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 5844151bc -> 3d1908fd5


[SPARK-20399][SQL] Add a config to fallback string literal parsing consistent 
with old sql parser behavior

## What changes were proposed in this pull request?

The new SQL parser is introduced into Spark 2.0. All string literals are 
unescaped in parser. Seems it bring an issue regarding the regex pattern string.

The following codes can reproduce it:

val data = Seq("\u0020\u0021\u0023", "abc")
val df = data.toDF()

// 1st usage: works in 1.6
// Let parser parse pattern string
val rlike1 = df.filter("value rlike '^\\x20[\\x20-\\x23]+$'")
// 2nd usage: works in 1.6, 2.x
// Call Column.rlike so the pattern string is a literal which doesn't go 
through parser
val rlike2 = df.filter($"value".rlike("^\\x20[\\x20-\\x23]+$"))

// In 2.x, we need add backslashes to make regex pattern parsed correctly
val rlike3 = df.filter("value rlike '^x20[x20-x23]+$'")

Follow the discussion in #17736, this patch adds a config to fallback to 1.6 
string literal parsing and mitigate migration issue.

## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: Liang-Chi Hsieh 

Closes #17887 from viirya/add-config-fallback-string-parsing.

(cherry picked from commit 609ba5f2b9fd89b1b9971d08f7cc680d202dbc7c)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3d1908fd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3d1908fd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3d1908fd

Branch: refs/heads/branch-2.2
Commit: 3d1908fd58fd9b1970cbffebdb731bfe4c776ad9
Parents: 5844151
Author: Liang-Chi Hsieh 
Authored: Fri May 12 11:15:10 2017 +0800
Committer: Wenchen Fan 
Committed: Fri May 12 11:15:26 2017 +0800

--
 .../sql/catalyst/catalog/SessionCatalog.scala   |   2 +-
 .../expressions/regexpExpressions.scala |  33 -
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  11 +-
 .../spark/sql/catalyst/parser/ParseDriver.scala |   8 +-
 .../spark/sql/catalyst/parser/ParserUtils.scala |   6 +
 .../org/apache/spark/sql/internal/SQLConf.scala |  10 ++
 .../catalyst/parser/ExpressionParserSuite.scala | 128 +--
 .../spark/sql/execution/SparkSqlParser.scala|   2 +-
 .../org/apache/spark/sql/DatasetSuite.scala |  13 ++
 9 files changed, 171 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3d1908fd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 18e5146..f6653d3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -73,7 +73,7 @@ class SessionCatalog(
   functionRegistry,
   conf,
   new Configuration(),
-  CatalystSqlParser,
+  new CatalystSqlParser(conf),
   DummyFunctionResourceLoader)
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/3d1908fd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
index 3fa8458..aa5a1b5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
@@ -86,6 +86,13 @@ abstract class StringRegexExpression extends BinaryExpression
 escape character, the following character is matched literally. It is 
invalid to escape
 any other character.
 
+Since Spark 2.0, string literals are unescaped in our SQL parser. For 
example, in order
+to match "\abc", the pattern should be "\\abc".
+
+When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, 
it fallbacks
+to Spark 1.6 behavior regarding string literal parsing. For example, 
if the config is
+enabled, the pattern to match "\abc" should be "\abc".
+
 Examples:
   > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%'
   true
@@