[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22503 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r226524439 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -220,6 +221,17 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te } } + test("crlf line separators in multiline mode") { --- End diff -- nit: -> `SPARK-25493: crlf line separators in multiline mode` when a PR fixes a specific problem, let's add the jira prefix in the test name next time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user justinuang commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r226386187 --- Diff: sql/core/src/test/resources/test-data/cars-crlf.csv --- @@ -0,0 +1,7 @@ + +year,make,model,comment,blank +"2012","Tesla","S","No comment", + +1997,Ford,E350,"Go get one now they are going fast", +2015,Chevy,Volt + --- End diff -- Yea, if you open it with `vi -b sql/core/src/test/resources/test-data/cars-crlf.csv`, you can see the `^M` characters which according to [this](https://stackoverflow.com/questions/3860519/see-line-breaks-and-carriage-returns-in-editor) represents a CLRF. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r226149299 --- Diff: sql/core/src/test/resources/test-data/cars-crlf.csv --- @@ -0,0 +1,7 @@ + +year,make,model,comment,blank +"2012","Tesla","S","No comment", + +1997,Ford,E350,"Go get one now they are going fast", +2015,Chevy,Volt + --- End diff -- just for clarification, this file has newline variant combinations, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user justinuang commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r222053706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -212,6 +212,8 @@ class CSVOptions( settings.setEmptyValue(emptyValueInRead) settings.setMaxCharsPerColumn(maxCharsPerColumn) settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER) +settings.setLineSeparatorDetectionEnabled(multiLine) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r220410193 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -212,6 +212,8 @@ class CSVOptions( settings.setEmptyValue(emptyValueInRead) settings.setMaxCharsPerColumn(maxCharsPerColumn) settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER) +settings.setLineSeparatorDetectionEnabled(multiLine) --- End diff -- I would `multiLine == true`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r220251783 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -212,6 +212,11 @@ class CSVOptions( settings.setEmptyValue(emptyValueInRead) settings.setMaxCharsPerColumn(maxCharsPerColumn) settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER) + +if (multiLine) { + settings.setLineSeparatorDetectionEnabled(true) --- End diff -- Would be simple just `settings.setLineSeparatorDetectionEnabled(multiLine)` or `settings.setLineSeparatorDetectionEnabled(multiLine == true)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org