[ https://issues.apache.org/jira/browse/MAPREDUCE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183601#comment-13183601 ]
Luke Lu commented on MAPREDUCE-232: ----------------------------------- Upon closer look, it seems that this won't work for input in non-ascii compatible encodings (UTF-16/32 etc.), as LineReader assumes that CR/LF is single byte. This would work with most CJK charset/encodings (*JIS, GB* etc.) but it's not probably sufficient for the patch. > TextInputFormat should support character encoding settings > ---------------------------------------------------------- > > Key: MAPREDUCE-232 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-232 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Environment: Windows XP SP3 > Reporter: NOMURA Yoshihide > Attachments: Hadoop-3481.patch > > > I need to read text files in different character encoding from UTF-8, > but I think TextInputFormat doesn't support such character encoding. > I suggest the TextInputFormat to support encoding settings like this. > conf.set("io.file.defaultEncoding", "MS932"); > I will submit a patch candidate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira