[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

HyukjinKwon Sun, 18 Mar 2018 03:07:22 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20849#discussion_r175283491
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
    @@ -2063,4 +2063,178 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
           )
         }
       }
    +
    +  def testFile(fileName: String): String = {
    +    
Thread.currentThread().getContextClassLoader.getResource(fileName).toString
    +  }
    +
    +  test("json in UTF-16 with BOM") {
    +    val fileName = "json-tests/utf16WithBOM.json"
    +    val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
    +    val jsonDF = spark.read.schema(schema)
    +      // The mode filters null rows produced because new line delimiter
    +      // for UTF-8 is used by default.
    --- End diff --
    
    Also, this is where we need a decision, right? It already does not work 
correctly. Another option for a min fix to follow rfc7159 is to describe that 
we don't support other encodings for now, to be clear.
    
    I approved https://github.com/apache/spark/pull/20614 only 
respecting/assuming that it causes an actual issue to some sites and the 
release was close (which is true I guess now).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

Reply via email to