[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 @justinuang, this might affect existing users application. Although this matches the behaviour to non-miltiline mode, can we explicitly mention it in migration guide? cc @cloud-fan and @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97496/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #97496 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97496/testReport)** for PR 22503 at commit [`040047b`](https://github.com/apache/spark/commit/040047b696c58496ea3da274fa2c58166d31b100). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #97496 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97496/testReport)** for PR 22503 at commit [`040047b`](https://github.com/apache/spark/commit/040047b696c58496ea3da274fa2c58166d31b100). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user justinuang commented on the issue: https://github.com/apache/spark/pull/22503 done! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 @justinuang, okay. Mind rebasing this please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user justinuang commented on the issue: https://github.com/apache/spark/pull/22503 So Hadoop's LineReader looks like it handles CR, LF, CRLF: https://github.com/apache/hadoop/blob/f90c64e6242facf38c2baedeeda42e4a8293e642/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L36 Univocity handles CR, LF, CRLF (the logic is a bit convoluted but it looks like they have the same behavior in that if they see a CR, they will look for a LF next): https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/input/LineSeparatorDetector.java I do agree we should expose the option of `setLineSeparator`, but regardless of that, the default behavior of handling CR, LF, CRLF should be the same between single line and multiline mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97359/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #97359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97359/testReport)** for PR 22503 at commit [`695f676`](https://github.com/apache/spark/commit/695f6760e239a781ad8fb0b1e428944e73f79563). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 I haven't checked what `setLineSeparatorDetectionEnabled` does explicitly yet in Univocity parser. Is this exactly same behaviour when we read it via Hadoop's `LineRecordReader`? Also how does it work with `setLineSeparator`? Essentially we should expose this option too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #97359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97359/testReport)** for PR 22503 at commit [`695f676`](https://github.com/apache/spark/commit/695f6760e239a781ad8fb0b1e428944e73f79563). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96865/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #96865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96865/testReport)** for PR 22503 at commit [`695f676`](https://github.com/apache/spark/commit/695f6760e239a781ad8fb0b1e428944e73f79563). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #96865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96865/testReport)** for PR 22503 at commit [`695f676`](https://github.com/apache/spark/commit/695f6760e239a781ad8fb0b1e428944e73f79563). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/22503 @HyukjinKwon is this ready to be merged in, or is there more feedback to be addressed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user justinuang commented on the issue: https://github.com/apache/spark/pull/22503 What does it take to get this to be merged in? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user justinuang commented on the issue: https://github.com/apache/spark/pull/22503 Sounds good, thanks guys =) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 Seems fine but I or someone else should take a closer look before getting this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96559/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #96559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96559/testReport)** for PR 22503 at commit [`812e4c5`](https://github.com/apache/spark/commit/812e4c58adfb73c6f90b08600dc43e4a0bd88921). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96556/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #96556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96556/testReport)** for PR 22503 at commit [`67d11f1`](https://github.com/apache/spark/commit/67d11f11201461b7d8f6389c7176c7e50f3fdd7b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #96559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96559/testReport)** for PR 22503 at commit [`812e4c5`](https://github.com/apache/spark/commit/812e4c58adfb73c6f90b08600dc43e4a0bd88921). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #96556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96556/testReport)** for PR 22503 at commit [`67d11f1`](https://github.com/apache/spark/commit/67d11f11201461b7d8f6389c7176c7e50f3fdd7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 Mind explaining what `setLineSeparatorDetectionEnabled` does in the PR description as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user justinuang commented on the issue: https://github.com/apache/spark/pull/22503 It looks like a flake? Can someone retrigger it? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96511/console --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22503 **[Test build #96511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96511/testReport)** for PR 22503 at commit [`67d11f1`](https://github.com/apache/spark/commit/67d11f11201461b7d8f6389c7176c7e50f3fdd7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org