[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22503


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r226524439
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -220,6 +221,17 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils with Te
 }
   }
 
+  test("crlf line separators in multiline mode") {
--- End diff --

nit: -> `SPARK-25493: crlf line separators in multiline mode` 

when a PR fixes a specific problem, let's add the jira prefix in the test 
name next time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-18 Thread justinuang
Github user justinuang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r226386187
  
--- Diff: sql/core/src/test/resources/test-data/cars-crlf.csv ---
@@ -0,0 +1,7 @@
+
+year,make,model,comment,blank
+"2012","Tesla","S","No comment",
+
+1997,Ford,E350,"Go get one now they are going fast",
+2015,Chevy,Volt
+
--- End diff --

Yea, if you open it with `vi -b 
sql/core/src/test/resources/test-data/cars-crlf.csv`, you can see the `^M` 
characters which according to 
[this](https://stackoverflow.com/questions/3860519/see-line-breaks-and-carriage-returns-in-editor)
 represents a CLRF.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r226149299
  
--- Diff: sql/core/src/test/resources/test-data/cars-crlf.csv ---
@@ -0,0 +1,7 @@
+
+year,make,model,comment,blank
+"2012","Tesla","S","No comment",
+
+1997,Ford,E350,"Go get one now they are going fast",
+2015,Chevy,Volt
+
--- End diff --

just for clarification, this file has newline variant combinations, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-10-02 Thread justinuang
Github user justinuang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r222053706
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -212,6 +212,8 @@ class CSVOptions(
 settings.setEmptyValue(emptyValueInRead)
 settings.setMaxCharsPerColumn(maxCharsPerColumn)
 
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
+settings.setLineSeparatorDetectionEnabled(multiLine)
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-09-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r220410193
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -212,6 +212,8 @@ class CSVOptions(
 settings.setEmptyValue(emptyValueInRead)
 settings.setMaxCharsPerColumn(maxCharsPerColumn)
 
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
+settings.setLineSeparatorDetectionEnabled(multiLine)
--- End diff --

I would `multiLine == true`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...

2018-09-25 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22503#discussion_r220251783
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -212,6 +212,11 @@ class CSVOptions(
 settings.setEmptyValue(emptyValueInRead)
 settings.setMaxCharsPerColumn(maxCharsPerColumn)
 
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER)
+
+if (multiLine) {
+  settings.setLineSeparatorDetectionEnabled(true)
--- End diff --

Would be simple just `settings.setLineSeparatorDetectionEnabled(multiLine)` 
or `settings.setLineSeparatorDetectionEnabled(multiLine == true)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org