Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20355#discussion_r163434524
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -78,4 +78,20 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext {
           }
         }
       }
    +
    +  // Separate test case for text-based formats that support multiLine as 
an option.
    +  Seq("json", "csv", "text").foreach { format =>
    --- End diff --
    
    Actually "text" doesn't support `multiLine` but `wholetext` which runs 
another code path. Let's take this out.
    
    How about leaving out the tests for `SPARK-22146` as was, and just adding a 
test dedicated for `multiline` here like:
    
    ```scala
        Seq("json", "csv").foreach { format =>
          test("SPARK-23148 read files containing special characters " +
            s"using $format multiline enabled") {
            withTempDir { dir =>
              val tmpFile = s"$dir/$nameWithSpecialChars"
              spark.createDataset(Seq("a", 
"b")).write.format(format).save(tmpFile)
              val reader = spark.read.format(format).option("multiLine", true)
              val fileContent = reader.load(tmpFile)
              checkAnswer(fileContent, Seq(Row("a"), Row("b")))
            }
          }
        }
    ```
    
    This PR changes really fix the code paths for both CSV / JSON when 
`multiLine` is enabled only .. I am not sure why you don't like this suggestion 
.. one test less invasive and more targeted for one JIRA ... 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to