[jira] Updated: (HADOOP-3481) TextInputFormat should support character encoding settings

NOMURA Yoshihide (JIRA) Thu, 16 Oct 2008 21:55:38 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


NOMURA Yoshihide updated HADOOP-3481:
-------------------------------------

    Attachment: Hadoop-3481.patch

This is updated patch.

In this patch, the LineReader class extract encoding setting,
and the Text class decode specified charset.
And also, the simple test class is added.

I think the thread of LineReader constructor and readLine() method are always 
same.
Is that right?


> TextInputFormat should support character encoding settings
> ----------------------------------------------------------
>
>                 Key: HADOOP-3481
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3481
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>         Environment: Windows XP SP3
>            Reporter: NOMURA Yoshihide
>         Attachments: Hadoop-3481.patch
>
>
> I need to read text files in different character encoding from UTF-8,
> but I think TextInputFormat doesn't support such character encoding.
> I suggest the TextInputFormat to support encoding settings like this.
>   conf.set("io.file.defaultEncoding", "MS932");
> I will submit a patch candidate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3481) TextInputFormat should support character encoding settings

Reply via email to