[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on the issue: https://github.com/apache/spark/pull/16428 @HyukjinKwon I modify by this commit ,please review it ,thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16428 Ah, I meant to add a test there in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on the issue: https://github.com/apache/spark/pull/16428 @HyukjinKwon ,I already run `CSVSuite` ,and all tests passed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on the issue: https://github.com/apache/spark/pull/16428 @HyukjinKwon ,I see ,because my version is `2.0.2`,we use `ByteArrayOutputStream` and call toString method ,this will using `Charset.defaultCharset()` and bind with env ,and in master branch ,we are already fix itï¼so l agreed to @srowen ,we should only not using hard-coding UTF-8,users can set it by giving their writer encoding --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16428 BTW, the reason I asked that in https://github.com/apache/spark/pull/16428#issuecomment-269635303 is I remember that I checked the reading/writing paths related with encodings before and the encoding should be set to line record reader. I just now double-chekced that newlines were `\n` for each batch due to [`TextOutputFormat`s record writer](https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java#L48-L49) but it seems it was changed in [the recent commit](https://github.com/apache/spark/pull/16089/files#diff-6a14f6bb643b1474139027d72a17f41aL203). So, now, it seems the newlines are dependent on univocity library. We should add some tests for this for sure, in `CSVSuite` to verify this behaviour and prevent regressions. As a small side note, we don't currently support non-ascii compatible encodings in reading path if I haven't missed some changes in this path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16428 Let's start by not hard-coding UTF-8, only. If you're saying that the output is correctly rendered as UTF-8, and MS Office doesn't open that, I'd be really surprised. That's an office bug though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on the issue: https://github.com/apache/spark/pull/16428 @srowen microsoft office can't open csv file correctly with utf-8 encoding when it contains chinese --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on the issue: https://github.com/apache/spark/pull/16428 microsoft office can't open csv file correctly with utf-8 encoding when it contains chinese --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on the issue: https://github.com/apache/spark/pull/16428 because if writer can't set encodingï¼we must convert utf8 2 gb18030 in chinese,so I think we should give setting about it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16428 I can understanding not hard-coding UTF-8 as the output encoding -- that's the core problem right? but how about just using the existing encoding parameter to control this? It's conceivable, but pretty obscure, that someone would want to output a different encoding. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16428 Do you mind if I ask wheather it writes the line separstor correctly as the encoding specified in the option? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16428 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org