[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread cjuexuan
Github user cjuexuan commented on the issue:

https://github.com/apache/spark/pull/16428
  
@HyukjinKwon  I modify by this commit ,please review it ,thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16428
  
Ah, I meant to add a test there in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan
Github user cjuexuan commented on the issue:

https://github.com/apache/spark/pull/16428
  
@HyukjinKwon ,I already run `CSVSuite` ,and all tests passed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan
Github user cjuexuan commented on the issue:

https://github.com/apache/spark/pull/16428
  
@HyukjinKwon ,I see ,because my version is `2.0.2`,we use 
`ByteArrayOutputStream` and call toString method ,this will using  
`Charset.defaultCharset()` and bind with env ,and in master branch ,we are 
already fix it,so l agreed to  @srowen ,we should only not using hard-coding 
UTF-8,users can set it by giving their writer encoding


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16428
  
BTW, the reason I asked that in 
https://github.com/apache/spark/pull/16428#issuecomment-269635303 is I remember 
that I checked the reading/writing paths related with encodings before and the 
encoding should be set to line record reader.

I just now double-chekced that newlines were `\n` for each batch due to 
[`TextOutputFormat`s record 
writer](https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java#L48-L49)
 but it seems it was changed in [the recent 
commit](https://github.com/apache/spark/pull/16089/files#diff-6a14f6bb643b1474139027d72a17f41aL203).
   So, now, it seems the newlines are dependent on univocity library.

We should add some tests for this for sure, in `CSVSuite` to verify this 
behaviour and prevent regressions.

As a small side note, we don't currently support non-ascii compatible 
encodings in reading path if I haven't missed some changes in this path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16428
  
Let's start by not hard-coding UTF-8, only. If you're saying that the 
output is correctly rendered as UTF-8, and MS Office doesn't open that, I'd be 
really surprised. That's an office bug though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan
Github user cjuexuan commented on the issue:

https://github.com/apache/spark/pull/16428
  
@srowen microsoft office can't open csv file correctly with utf-8 encoding 
when it contains chinese


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan
Github user cjuexuan commented on the issue:

https://github.com/apache/spark/pull/16428
  
microsoft office can't open csv file correctly with utf-8 encoding when it 
contains chinese


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan
Github user cjuexuan commented on the issue:

https://github.com/apache/spark/pull/16428
  
because if writer can't set encoding,we must convert utf8 2 gb18030 in 
chinese,so I think we should give setting about it 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16428
  
I can understanding not hard-coding UTF-8 as the output encoding -- that's 
the core problem right? but how about just using the existing encoding 
parameter to control this? It's conceivable, but pretty obscure, that someone 
would want to output a different encoding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16428
  
Do you mind if I ask wheather it writes the line separstor correctly as the 
encoding specified in the option?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16428
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org