[
https://issues.apache.org/jira/browse/MAPREDUCE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183601#comment-13183601
]
Luke Lu commented on MAPREDUCE-232:
-----------------------------------
Upon closer look, it seems that this won't work for input in non-ascii
compatible encodings (UTF-16/32 etc.), as LineReader assumes that CR/LF is
single byte. This would work with most CJK charset/encodings (*JIS, GB* etc.)
but it's not probably sufficient for the patch.
> TextInputFormat should support character encoding settings
> ----------------------------------------------------------
>
> Key: MAPREDUCE-232
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-232
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Environment: Windows XP SP3
> Reporter: NOMURA Yoshihide
> Attachments: Hadoop-3481.patch
>
>
> I need to read text files in different character encoding from UTF-8,
> but I think TextInputFormat doesn't support such character encoding.
> I suggest the TextInputFormat to support encoding settings like this.
> conf.set("io.file.defaultEncoding", "MS932");
> I will submit a patch candidate.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira