[jira] [Commented] (MAPREDUCE-232) TextInputFormat should support character encoding settings

Luke Lu (Commented) (JIRA) Tue, 10 Jan 2012 13:40:06 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183601#comment-13183601
 ]


Luke Lu commented on MAPREDUCE-232:
-----------------------------------

Upon closer look, it seems that this won't work for input in non-ascii 
compatible encodings (UTF-16/32 etc.), as LineReader assumes that CR/LF is 
single byte. This would work with most CJK charset/encodings (*JIS, GB* etc.) 
but it's not probably sufficient for the patch.
                
> TextInputFormat should support character encoding settings
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-232
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-232
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>         Environment: Windows XP SP3
>            Reporter: NOMURA Yoshihide
>         Attachments: Hadoop-3481.patch
>
>
> I need to read text files in different character encoding from UTF-8,
> but I think TextInputFormat doesn't support such character encoding.
> I suggest the TextInputFormat to support encoding settings like this.
>   conf.set("io.file.defaultEncoding", "MS932");
> I will submit a patch candidate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-232) TextInputFormat should support character encoding settings

Reply via email to