[ 
https://issues.apache.org/jira/browse/SPARK-19912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-19912:
-------------------------------
    Description: 
{{Shim_v0_13.convertFilters()}} doesn't escape string literals while generating 
Hive style partition predicates.

The following SQL-injection-like test case illustrates this issue:
{code}
  test("SPARK-19912") {
    withTable("spark_19912") {
      Seq(
        (1, "p1", "q1"),
        (2, "p1\" and q=\"q1", "q2")
      ).toDF("a", "p", "q").write.partitionBy("p", 
"q").saveAsTable("spark_19912")

      checkAnswer(
        spark.table("foo").filter($"p" === "p1\" and q = \"q1").select($"a"),
        Row(2)
      )
    }
  }
{code}
The above test case fails like this:
{noformat}
[info] - spark_19912 *** FAILED *** (13 seconds, 74 milliseconds)
[info]   Results do not match for query:
[info]   Timezone: 
sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
[info]   Timezone Env:
[info]
[info]   == Parsed Logical Plan ==
[info]   'Project [unresolvedalias('a, None)]
[info]   +- Filter (p#27 = p1" and q = "q1)
[info]      +- SubqueryAlias spark_19912
[info]         +- Relation[a#26,p#27,q#28] parquet
[info]
[info]   == Analyzed Logical Plan ==
[info]   a: int
[info]   Project [a#26]
[info]   +- Filter (p#27 = p1" and q = "q1)
[info]      +- SubqueryAlias spark_19912
[info]         +- Relation[a#26,p#27,q#28] parquet
[info]
[info]   == Optimized Logical Plan ==
[info]   Project [a#26]
[info]   +- Filter (isnotnull(p#27) && (p#27 = p1" and q = "q1))
[info]      +- Relation[a#26,p#27,q#28] parquet
[info]
[info]   == Physical Plan ==
[info]   *Project [a#26]
[info]   +- *FileScan parquet default.spark_19912[a#26,p#27,q#28] Batched: 
true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, 
PartitionFilters: [isnotnull(p#27), (p#27 = p1" and q = "q1)], PushedFilters: 
[], ReadSchema: struct<a:int>
[info]   == Results ==
[info]
[info]   == Results ==
[info]   !== Correct Answer - 1 ==   == Spark Answer - 0 ==
[info]    struct<>                   struct<>
[info]   ![2]
{noformat}

  was:
{{Shim_v0_13.convertFilters()}} doesn't escape string literals while generating 
Hive style partition predicates.

The following SQL-injection-like test case illustrates this issue:
{code}
  test("foo") {
    withTable("foo") {
      Seq(
        (1, "p1", "q1"),
        (2, "p1\" and q=\"q1", "q2")
      ).toDF("a", "p", "q").write.partitionBy("p", "q").saveAsTable("foo")

      checkAnswer(
        spark.table("foo").filter($"p" === "p1\" and q = \"q1").select($"a"),
        Row(2)
      )
    }
  }
{code}


> String literals are not escaped while performing partition pruning at Hive 
> metastore level
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19912
>                 URL: https://issues.apache.org/jira/browse/SPARK-19912
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: Cheng Lian
>              Labels: correctness
>
> {{Shim_v0_13.convertFilters()}} doesn't escape string literals while 
> generating Hive style partition predicates.
> The following SQL-injection-like test case illustrates this issue:
> {code}
>   test("SPARK-19912") {
>     withTable("spark_19912") {
>       Seq(
>         (1, "p1", "q1"),
>         (2, "p1\" and q=\"q1", "q2")
>       ).toDF("a", "p", "q").write.partitionBy("p", 
> "q").saveAsTable("spark_19912")
>       checkAnswer(
>         spark.table("foo").filter($"p" === "p1\" and q = \"q1").select($"a"),
>         Row(2)
>       )
>     }
>   }
> {code}
> The above test case fails like this:
> {noformat}
> [info] - spark_19912 *** FAILED *** (13 seconds, 74 milliseconds)
> [info]   Results do not match for query:
> [info]   Timezone: 
> sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
> [info]   Timezone Env:
> [info]
> [info]   == Parsed Logical Plan ==
> [info]   'Project [unresolvedalias('a, None)]
> [info]   +- Filter (p#27 = p1" and q = "q1)
> [info]      +- SubqueryAlias spark_19912
> [info]         +- Relation[a#26,p#27,q#28] parquet
> [info]
> [info]   == Analyzed Logical Plan ==
> [info]   a: int
> [info]   Project [a#26]
> [info]   +- Filter (p#27 = p1" and q = "q1)
> [info]      +- SubqueryAlias spark_19912
> [info]         +- Relation[a#26,p#27,q#28] parquet
> [info]
> [info]   == Optimized Logical Plan ==
> [info]   Project [a#26]
> [info]   +- Filter (isnotnull(p#27) && (p#27 = p1" and q = "q1))
> [info]      +- Relation[a#26,p#27,q#28] parquet
> [info]
> [info]   == Physical Plan ==
> [info]   *Project [a#26]
> [info]   +- *FileScan parquet default.spark_19912[a#26,p#27,q#28] Batched: 
> true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 
> 0, PartitionFilters: [isnotnull(p#27), (p#27 = p1" and q = "q1)], 
> PushedFilters: [], ReadSchema: struct<a:int>
> [info]   == Results ==
> [info]
> [info]   == Results ==
> [info]   !== Correct Answer - 1 ==   == Spark Answer - 0 ==
> [info]    struct<>                   struct<>
> [info]   ![2]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to