[ 
https://issues.apache.org/jira/browse/SPARK-20837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bing huang updated SPARK-20837:
-------------------------------
    Description: 
1. If we run the code below against spark 1.6.x, we will get error message 
"Exception in thread "main" java.lang.RuntimeException: [1.44] failure: ``)'' 
expected but "york" found".
2. If we run the code against spark 2.x.x, we can run successfully and the 
result is (1,2,3,4,5), however, based on the sql specification, to doubling up 
the single quote here is just to escape the single quote, hence the result 
should be (6,7,8,9,10), which could also be verified if you ran the same sql 
and same data in MySQLWorkBench or SQL Server.


The code snippet I used to demonstrate the issue as below:

    val conf = new SparkConf().setAppName("appName").setMaster("local[3]")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)


    // create test dataset
    val data = (1 to 10).map{x:Int => x match {
      case t if t <= 5 => Row("New 'york' city", t.toString,"2015-01-01 
13:59:59.123", 2147483647.0, Double
        .PositiveInfinity)
      case t => Row("New york city", t.toString,"2015-01-02 23:59:59.456", 1.0, 
Double.PositiveInfinity)
    }}

    // create schema of the test dataset
    val schema = StructType(Array(
      StructField("A1", DataTypes.StringType),
      StructField("A2", DataTypes.StringType),
      StructField("A3", DataTypes.StringType),
      StructField("A4", DataTypes.DoubleType),
      StructField("A5", DataTypes.DoubleType)
    ))
    val rdd = sc.parallelize(data)
    val df = sqlContext.createDataFrame(rdd,schema)
    df.registerTempTable("test")

    val sqlString ="select A2 from test where A1 not in ('New ''york'' city')"

    sqlContext.sql(sqlString).show(false)

  was:
1. If we run the code below against spark 1.6.x, we will get error message 
"Exception in thread "main" java.lang.RuntimeException: [1.44] failure: ``)'' 
expected but "york" found".
2. If we run the code against spark 2.x.x, we can run successfully and the 
result is (1,2,3,4,5), however, based on the sql specification, to doubling up 
the single quote here is just to escape the single quote, hence the result 
should be (6,7,8,9,10), which could also be verified if you ran the same sql in 
MySQLWorkBench or SQL Server.


The code snippet I used to demonstrate the issue as below:

    val conf = new SparkConf().setAppName("appName").setMaster("local[3]")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)


    // create test dataset
    val data = (1 to 10).map{x:Int => x match {
      case t if t <= 5 => Row("New 'york' city", t.toString,"2015-01-01 
13:59:59.123", 2147483647.0, Double
        .PositiveInfinity)
      case t => Row("New york city", t.toString,"2015-01-02 23:59:59.456", 1.0, 
Double.PositiveInfinity)
    }}

    // create schema of the test dataset
    val schema = StructType(Array(
      StructField("A1", DataTypes.StringType),
      StructField("A2", DataTypes.StringType),
      StructField("A3", DataTypes.StringType),
      StructField("A4", DataTypes.DoubleType),
      StructField("A5", DataTypes.DoubleType)
    ))
    val rdd = sc.parallelize(data)
    val df = sqlContext.createDataFrame(rdd,schema)
    df.registerTempTable("test")

    val sqlString ="select A2 from test where A1 not in ('New ''york'' city')"

    sqlContext.sql(sqlString).show(false)


> Spark SQL doesn't support escape of single/double quote as SQL standard.
> ------------------------------------------------------------------------
>
>                 Key: SPARK-20837
>                 URL: https://issues.apache.org/jira/browse/SPARK-20837
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1
>            Reporter: bing huang
>
> 1. If we run the code below against spark 1.6.x, we will get error message 
> "Exception in thread "main" java.lang.RuntimeException: [1.44] failure: ``)'' 
> expected but "york" found".
> 2. If we run the code against spark 2.x.x, we can run successfully and the 
> result is (1,2,3,4,5), however, based on the sql specification, to doubling 
> up the single quote here is just to escape the single quote, hence the result 
> should be (6,7,8,9,10), which could also be verified if you ran the same sql 
> and same data in MySQLWorkBench or SQL Server.
> The code snippet I used to demonstrate the issue as below:
>     val conf = new SparkConf().setAppName("appName").setMaster("local[3]")
>     val sc = new SparkContext(conf)
>     val sqlContext = new SQLContext(sc)
>     // create test dataset
>     val data = (1 to 10).map{x:Int => x match {
>       case t if t <= 5 => Row("New 'york' city", t.toString,"2015-01-01 
> 13:59:59.123", 2147483647.0, Double
>         .PositiveInfinity)
>       case t => Row("New york city", t.toString,"2015-01-02 23:59:59.456", 
> 1.0, Double.PositiveInfinity)
>     }}
>     // create schema of the test dataset
>     val schema = StructType(Array(
>       StructField("A1", DataTypes.StringType),
>       StructField("A2", DataTypes.StringType),
>       StructField("A3", DataTypes.StringType),
>       StructField("A4", DataTypes.DoubleType),
>       StructField("A5", DataTypes.DoubleType)
>     ))
>     val rdd = sc.parallelize(data)
>     val df = sqlContext.createDataFrame(rdd,schema)
>     df.registerTempTable("test")
>     val sqlString ="select A2 from test where A1 not in ('New ''york'' city')"
>     sqlContext.sql(sqlString).show(false)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to