Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20355#discussion_r163434524 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -78,4 +78,20 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } } + + // Separate test case for text-based formats that support multiLine as an option. + Seq("json", "csv", "text").foreach { format => --- End diff -- Actually "text" doesn't support `multiLine` but `wholetext` which runs another code path. Let's take this out. How about leaving out the tests for `SPARK-22146` as was, and just adding a test dedicated for `multiline` here like: ```scala Seq("json", "csv").foreach { format => test("SPARK-23148 read files containing special characters " + s"using $format multiline enabled") { withTempDir { dir => val tmpFile = s"$dir/$nameWithSpecialChars" spark.createDataset(Seq("a", "b")).write.format(format).save(tmpFile) val reader = spark.read.format(format).option("multiLine", true) val fileContent = reader.load(tmpFile) checkAnswer(fileContent, Seq(Row("a"), Row("b"))) } } } ``` This PR changes really fix the code paths for both CSV / JSON when `multiLine` is enabled only .. I am not sure why you don't like this suggestion .. one test less invasive and more targeted for one JIRA ...
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org