[jira] [Commented] (ARROW-16000) [C++][Dataset] Support Latin-1 encoding

Weston Pace (Jira) Mon, 18 Jul 2022 10:17:07 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568127#comment-17568127
 ]


Weston Pace commented on ARROW-16000:
-------------------------------------

I agree with Antoine's suggestion of {{CsvFragmentReadOptions}}.  This is 
essentially the same problem we have with compression too right?  We can 
auto-detect compression based on the file extension but if the compression 
doesn't match the file extension (or the file extension doesn't indicate 
compression) we have no way of wrapping the stream with a decompression 
transform.  It sounds like this solution might solve both problems.

> [C++][Dataset] Support Latin-1 encoding
> ---------------------------------------
>
>                 Key: ARROW-16000
>                 URL: https://issues.apache.org/jira/browse/ARROW-16000
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Nicola Crane
>            Priority: Major
>
> In ARROW-15992 a user is reporting issues with trying to read in files with 
> Latin-1 encoding.  I had a look through the docs for the Dataset API and I 
> don't think this is currently supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-16000) [C++][Dataset] Support Latin-1 encoding

Reply via email to