[GitHub] spark pull request #17266: [SPARK-19912][SQL] String literals should be esca...

2017-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17266


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17266: [SPARK-19912][SQL] String literals should be esca...

2017-03-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17266#discussion_r105583152
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -566,13 +566,24 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 s"$v ${op.symbol} ${a.name}"
   case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} "$v
+s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
   case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
   if !varcharKeys.contains(a.name) =>
-s$v" ${op.symbol} ${a.name}"""
+s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
 }.mkString(" and ")
   }
 
+  private def quoteStringLiteral(str: String): String = {
+if (!str.contains("\"")) {
+  s$str
+} else if (!str.contains("'")) {
+  s"""'$str'"""
+} else {
+  throw new UnsupportedOperationException(
+"""Partition filter cannot have both `"` and `'` characters""")
--- End diff --

We got the exception from Hive, because they are not escaped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17266: [SPARK-19912][SQL] String literals should be esca...

2017-03-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17266#discussion_r105572873
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -566,13 +566,24 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 s"$v ${op.symbol} ${a.name}"
   case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} "$v
+s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
   case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
   if !varcharKeys.contains(a.name) =>
-s$v" ${op.symbol} ${a.name}"""
+s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
 }.mkString(" and ")
   }
 
+  private def quoteStringLiteral(str: String): String = {
+if (!str.contains("\"")) {
+  s$str
+} else if (!str.contains("'")) {
+  s"""'$str'"""
+} else {
+  throw new UnsupportedOperationException(
+"""Partition filter cannot have both `"` and `'` characters""")
--- End diff --

From the current master branch, Hive does not accept that.

```scala
scala> Seq((1, "a\"'b")).toDF("a", 
"p").write.partitionBy("p").saveAsTable("t1")

scala> spark.table("t1").filter($"p" === """a"'b""").select($"a").show
java.lang.RuntimeException:
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17266: [SPARK-19912][SQL] String literals should be esca...

2017-03-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17266#discussion_r105572627
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -566,13 +566,24 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 s"$v ${op.symbol} ${a.name}"
   case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} "$v
+s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
   case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
   if !varcharKeys.contains(a.name) =>
-s$v" ${op.symbol} ${a.name}"""
+s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
 }.mkString(" and ")
   }
 
+  private def quoteStringLiteral(str: String): String = {
+if (!str.contains("\"")) {
+  s$str
+} else if (!str.contains("'")) {
+  s"""'$str'"""
+} else {
+  throw new UnsupportedOperationException(
+"""Partition filter cannot have both `"` and `'` characters""")
--- End diff --

Does that return the correct result?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17266: [SPARK-19912][SQL] String literals should be esca...

2017-03-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17266#discussion_r105572354
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -566,13 +566,24 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 s"$v ${op.symbol} ${a.name}"
   case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} "$v
+s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
   case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
   if !varcharKeys.contains(a.name) =>
-s$v" ${op.symbol} ${a.name}"""
+s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
 }.mkString(" and ")
   }
 
+  private def quoteStringLiteral(str: String): String = {
+if (!str.contains("\"")) {
+  s$str
+} else if (!str.contains("'")) {
+  s"""'$str'"""
+} else {
+  throw new UnsupportedOperationException(
+"""Partition filter cannot have both `"` and `'` characters""")
--- End diff --

```Scala
table.filter($"p" === """a"'b""").select($"a")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17266: [SPARK-19912][SQL] String literals should be esca...

2017-03-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17266#discussion_r105572081
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -566,13 +566,24 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 s"$v ${op.symbol} ${a.name}"
   case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} "$v
+s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
   case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
   if !varcharKeys.contains(a.name) =>
-s$v" ${op.symbol} ${a.name}"""
+s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
 }.mkString(" and ")
   }
 
+  private def quoteStringLiteral(str: String): String = {
+if (!str.contains("\"")) {
+  s$str
+} else if (!str.contains("'")) {
+  s"""'$str'"""
+} else {
+  throw new UnsupportedOperationException(
+"""Partition filter cannot have both `"` and `'` characters""")
--- End diff --

Test case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17266: [SPARK-19912][SQL] String literals should be esca...

2017-03-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/17266#discussion_r105553664
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -566,13 +566,24 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 s"$v ${op.symbol} ${a.name}"
   case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
   if !varcharKeys.contains(a.name) =>
-s"""${a.name} ${op.symbol} "$v
+s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}"""
   case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
   if !varcharKeys.contains(a.name) =>
-s$v" ${op.symbol} ${a.name}"""
+s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}"""
 }.mkString(" and ")
   }
 
+  private def quoteStringLiteral(str: String): String = {
+if (!str.contains("\"")) {
+  s$str
+} else if (!str.contains("'")) {
+  s"""'$str'"""
+} else {
+  throw new UnsupportedOperationException(
+"""Partition filter cannot have both `"` and `'` characters""")
--- End diff --

The current master also raise exception with this mixed case.
```scala
scala> spark.table("t1").filter($"p" === "'\"").select($"a").show
java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from ...
...
Caused by: java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Error parsing partition 
filter : line 1:8 mismatched character '' expecting '"'
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17266: [SPARK-19912][SQL] String literals should be esca...

2017-03-12 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/17266

[SPARK-19912][SQL] String literals should be escaped for Hive metastore 
partition pruning

## What changes were proposed in this pull request?

Currently, HiveShim's `convertFilters` does not escape the string literals. 
This PR fixes the following cases.

**BEFORE**
```
scala> Seq((1, "p1", "q1"), (2, "p1\" and q=\"q1", "q2")).toDF("a", "p", 
"q").write.partitionBy("p", "q").saveAsTable("t1")


scala> spark.table("t1").filter($"p" === "p1\" and 
q=\"q1").select($"a").show
+---+
|  a|
+---+
+---+
```

**AFTER**
```
scala> Seq((1, "p1", "q1"), (2, "p1\" and q=\"q1", "q2")).toDF("a", "p", 
"q").write.partitionBy("p", "q").saveAsTable("t1")


scala> spark.table("t1").filter($"p" === "p1\" and 
q=\"q1").select($"a").show
+---+
|  a|
+---+
|  2|
+---+
```

## How was this patch tested?

Pass the Jenkins test with new test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-19912

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17266


commit 37bff5dfbe7dd8c2b3f650915107bdb84a4b7fac
Author: Dongjoon Hyun 
Date:   2017-03-12T09:48:35Z

[SPARK-19912][SQL] String literals should be escaped for Hive metastore 
partition pruning




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org