[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2017-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15398


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2017-04-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r111704017
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -68,7 +68,30 @@ trait StringRegexExpression extends 
ImplicitCastInputTypes {
  * Simple RegEx pattern matching function
  */
 @ExpressionDescription(
-  usage = "str _FUNC_ pattern - Returns true if `str` matches `pattern`, 
or false otherwise.")
+  usage = "str _FUNC_ pattern - Returns true if str matches pattern, " +
+"null if any arguments are null, false otherwise.",
+  extended = """
+Arguments:
+  str - a string expression
+  pattern - a string expression. The pattern is a string which is 
matched literally, with
+exception to the following special symbols:
+
+  _ matches any one character in the input (similar to . in posix 
regular expressions)
+
+  % matches zero ore more characters in the input (similar to .* 
in posix regular
--- End diff --

ore -> or?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-22 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84582746
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -68,7 +68,20 @@ trait StringRegexExpression extends 
ImplicitCastInputTypes {
  * Simple RegEx pattern matching function
  */
 @ExpressionDescription(
-  usage = "str _FUNC_ pattern - Returns true if str matches pattern and 
false otherwise.")
+  usage = "str _FUNC_ pattern - Returns true if str matches pattern, " +
+"null if any arguments are null, false otherwise.",
+  extended =
+"The pattern is a string which is matched literally, with exception to 
the following " +
+"special symbols:\n\n" +
+"_ matches any one character in the input (similar to . in posix " 
+
+"regular expressions)\n\n" +
+"% matches zero ore more characters in the input (similar to .* in 
" +
+"posix regular expressions)\n\n" +
+"The escape character is '\\'. If an escape character precedes a 
special symbol or " +
+"another escape character, the following character is matched 
literally, For example, " +
+"the expression ` like \\%SystemDrive\\%Users%` will match 
any `` that " +
+"starts with '%SystemDrive%\\Users'. It is invalid to escape any other 
character.\n\n" +
+"Use RLIKE to match with standard regular expressions.")
--- End diff --

ack


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-22 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84582662
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -19,32 +19,34 @@ package org.apache.spark.sql.catalyst.util
 
 import java.util.regex.{Pattern, PatternSyntaxException}
 
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.unsafe.types.UTF8String
 
 object StringUtils {
 
-  // replace the _ with .{1} exactly match 1 time of any character
-  // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
+  /** Convert 'like' pattern to Java regex. */
+  def escapeLikeRegex(str: String): String = {
+val in = str.toIterator
+val out = new StringBuilder()
+
+def fail(message: String) = throw new AnalysisException(
+  s"the pattern '$str' is invalid, $message")
--- End diff --

>  org.apache.spark.sql.AnalysisException: the pattern '\a' is invalid, the 
escape character is not allowed to precede 'a';

or 

> org.apache.spark.sql.AnalysisException: The pattern '\a' is invalid. 
Reason: the escape character is not allowed to precede 'a';
 
Isn't this just a matter of preference? I preferred the first one because 
it avoids using sentences that are expected to end in a period, when a semi 
colon is always appended after an analysis exception. It's a really really 
superficial thing though, so I'm happy to change it either way


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84579527
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala
 ---
@@ -53,6 +57,40 @@ class RegexpExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation("a\nb" like "a_b", true)
 checkEvaluation("ab" like "a%b", true)
 checkEvaluation("a\nb" like "a%b", true)
+
+// empty input
+checkEvaluation("" like "", true)
+checkEvaluation("a" like "", false)
+checkEvaluation("" like "a", false)
+
+// SI-17647 double-escaping backslash
+checkEvaluation("""""" like """%\\%""", true) // triple quotes to 
avoid java string escaping
+checkEvaluation("""%%""" like """%%""", true)
+checkEvaluation("""\__""" like """\\\__""", true)
+checkEvaluation("""\\\__""" like """%\\%\%""", false)
+checkEvaluation("""_\\\%""" like """%\\""", false)
+
+// unicode
+// scalastyle:off nonascii
+checkEvaluation("a\u20ACa" like "_\u20AC_", true)
+checkEvaluation("a€a" like "_€_", true)
+checkEvaluation("a€a" like "_\u20AC_", true)
+checkEvaluation("a\u20ACa" like "_€_", true)
+// scalastyle:on nonascii
+
+// invalid escaping
+intercept[AnalysisException] {
+  evaluate("""a""" like """\a""")
+}
+intercept[AnalysisException] {
+  evaluate("""a""" like """a\""")
+}
--- End diff --

For these two invalid cases, you also need to check the error messages. For 
example,
```Scala
val e = intercept[AnalysisException] {
  evaluate("""a""" like """\a""")
}.getMessage
assert(e.contains("xyz"))
```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84578560
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -19,32 +19,34 @@ package org.apache.spark.sql.catalyst.util
 
 import java.util.regex.{Pattern, PatternSyntaxException}
 
+import org.apache.spark.sql.AnalysisException
 import org.apache.spark.unsafe.types.UTF8String
 
 object StringUtils {
 
-  // replace the _ with .{1} exactly match 1 time of any character
-  // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
+  /** Convert 'like' pattern to Java regex. */
+  def escapeLikeRegex(str: String): String = {
+val in = str.toIterator
+val out = new StringBuilder()
+
+def fail(message: String) = throw new AnalysisException(
+  s"the pattern '$str' is invalid, $message")
--- End diff --

-> 
```Scala
s"The pattern '$str' is invalid. Reason: $message"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-19 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84143867
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -68,7 +68,20 @@ trait StringRegexExpression extends 
ImplicitCastInputTypes {
  * Simple RegEx pattern matching function
  */
 @ExpressionDescription(
-  usage = "str _FUNC_ pattern - Returns true if str matches pattern and 
false otherwise.")
+  usage = "str _FUNC_ pattern - Returns true if str matches pattern, " +
+"null if any arguments are null, false otherwise.",
+  extended =
+"The pattern is a string which is matched literally, with exception to 
the " +
+"following special symbols:\n\n" +
+"_ matches any one character in the input (similar to . in posix " 
+
+"regular expressions)\n\n" +
+"% matches zero ore more characters in the input (similar to .* in 
" +
+"posix regular expressions\n\n" +
+"The default escape character is '\\'. If an escape character precedes 
a special symbol or " +
+"another escape character, the following character is matched 
literally, otherwise the " +
+ "escape character is treated literally. I.e. '\\%' would match '%', 
whereas '\\a' matches " +
+ "'\\a'.\n\n" +
--- End diff --

good idea


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84007236
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -68,7 +68,20 @@ trait StringRegexExpression extends 
ImplicitCastInputTypes {
  * Simple RegEx pattern matching function
  */
 @ExpressionDescription(
-  usage = "str _FUNC_ pattern - Returns true if str matches pattern and 
false otherwise.")
+  usage = "str _FUNC_ pattern - Returns true if str matches pattern, " +
+"null if any arguments are null, false otherwise.",
+  extended =
+"The pattern is a string which is matched literally, with exception to 
the " +
--- End diff --

nit: The doc might read better with triple quotes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84007087
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -68,7 +68,20 @@ trait StringRegexExpression extends 
ImplicitCastInputTypes {
  * Simple RegEx pattern matching function
  */
 @ExpressionDescription(
-  usage = "str _FUNC_ pattern - Returns true if str matches pattern and 
false otherwise.")
+  usage = "str _FUNC_ pattern - Returns true if str matches pattern, " +
+"null if any arguments are null, false otherwise.",
+  extended =
+"The pattern is a string which is matched literally, with exception to 
the " +
+"following special symbols:\n\n" +
+"_ matches any one character in the input (similar to . in posix " 
+
+"regular expressions)\n\n" +
+"% matches zero ore more characters in the input (similar to .* in 
" +
+"posix regular expressions\n\n" +
+"The default escape character is '\\'. If an escape character precedes 
a special symbol or " +
--- End diff --

I would remove `default`, which gives users a false impression that we 
support other escape characters. We can add it back after we implement the 
support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-19 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r84008009
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -68,7 +68,20 @@ trait StringRegexExpression extends 
ImplicitCastInputTypes {
  * Simple RegEx pattern matching function
  */
 @ExpressionDescription(
-  usage = "str _FUNC_ pattern - Returns true if str matches pattern and 
false otherwise.")
+  usage = "str _FUNC_ pattern - Returns true if str matches pattern, " +
+"null if any arguments are null, false otherwise.",
+  extended =
+"The pattern is a string which is matched literally, with exception to 
the " +
+"following special symbols:\n\n" +
+"_ matches any one character in the input (similar to . in posix " 
+
+"regular expressions)\n\n" +
+"% matches zero ore more characters in the input (similar to .* in 
" +
+"posix regular expressions\n\n" +
+"The default escape character is '\\'. If an escape character precedes 
a special symbol or " +
+"another escape character, the following character is matched 
literally, otherwise the " +
+ "escape character is treated literally. I.e. '\\%' would match '%', 
whereas '\\a' matches " +
+ "'\\a'.\n\n" +
--- End diff --

It might help users to understand the behavior by providing an example. One 
example I have is to match paths starting with `%SystemDrive%\Users` (common 
Windows path). It would be nice to have examples in Scala as well as in SQL. 
See my comment about inconsistency between SQL and Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82931395
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,25 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+var escaping = false
+for (next <- str) {
+  if (escaping) {
+builder ++= Pattern.quote(Character.toString(next))
--- End diff --

`\Q\\E\Qa\E` is correct. But doesn't it become `\Qa\E` in this change?

For `\\a`, the prefixing `\\` will go the next branch and enable 
`escaping`. Then the next char `a` will be quoted here. So it becomes `\Qa\E`. 
BTW, before this change, it will be `\Q\a\E`. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82849408
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,25 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+var escaping = false
+for (next <- str) {
+  if (escaping) {
+builder ++= Pattern.quote(Character.toString(next))
--- End diff --

Every character after a backslash is quoted, so `\\a` becomes `\Q\\E\Qa\E`. 
Is this not the intended behaviour?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82722525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,25 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+var escaping = false
+for (next <- str) {
+  if (escaping) {
+builder ++= Pattern.quote(Character.toString(next))
--- End diff --

How about `"\\a"`? Previously it is `\Q\a\E`, now it seems becoming `\Qa\E`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-07 Thread jodersky
Github user jodersky commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82492398
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,24 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+str.foldLeft(false) { case (escaping, next) =>
--- End diff --

updated in latest commit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-07 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15398#discussion_r82490834
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -25,26 +25,24 @@ object StringUtils {
 
   // replace the _ with .{1} exactly match 1 time of any character
   // replace the % with .*, match 0 or more times with any character
-  def escapeLikeRegex(v: String): String = {
-if (!v.isEmpty) {
-  "(?s)" + (' ' +: v.init).zip(v).flatMap {
-case (prev, '\\') => ""
-case ('\\', c) =>
-  c match {
-case '_' => "_"
-case '%' => "%"
-case _ => Pattern.quote("\\" + c)
-  }
-case (prev, c) =>
-  c match {
-case '_' => "."
-case '%' => ".*"
-case _ => Pattern.quote(Character.toString(c))
-  }
-  }.mkString
-} else {
-  v
+  def escapeLikeRegex(str: String): String = {
+val builder = new StringBuilder()
+str.foldLeft(false) { case (escaping, next) =>
--- End diff --

can we get rid of the fold here? The fold makes it more difficult to parse 
(e.g. what's the return type, what's escaping, what's next).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-07 Thread jodersky
GitHub user jodersky opened a pull request:

https://github.com/apache/spark/pull/15398

[SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patterns.

## What changes were proposed in this pull request?
Modify SQL-to-Java regex escaping to correctly handle cases of double 
backslashes followed by another special character. This fixes cases such as 
'' not matching '%\\%'.

## How was this patch tested?
Extra case in regex unit tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jodersky/spark SPARK-17647

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15398


commit 42c27180efc2691d3dd117c429077d888b9fe12d
Author: Jakob Odersky 
Date:   2016-10-08T00:24:22Z

Fix backslash escaping in 'LIKE' patterns.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org