[ 
https://issues.apache.org/jira/browse/HDFS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Boy updated HDFS-1169:
-----------------------------

    Attachment: thriftfs.jar

HadoopThriftServer with changed encoding for reading and writing

> Can't read binary data off HDFS via thrift API
> ----------------------------------------------
>
>                 Key: HDFS-1169
>                 URL: https://issues.apache.org/jira/browse/HDFS-1169
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Erik Forsberg
>         Attachments: HadoopThriftServer.java, hadoopfs.thrift, thriftfs.jar
>
>
> Trying to access binary data stored in HDFS (in my case, TypedByte files 
> generated by Dumbo) via thrift talking to 
> org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get back is 
> mangled. For example, when I read a file which contains the value 0xa2, it's 
> coming back as 0xef 0xbf 0xbd, also known as the Unicode replacement 
> character.
> I think this is because the read method in HadoopThriftServer.java is trying 
> to convert the data read from HDFS into UTF-8 via the String() constructor. 
> This essentially makes the HDFS thrift API useless for me :-(.
> Not being an expert on Thrift, but would it be possible to modify the API so 
> that it uses the binary type listed on 
> http://wiki.apache.org/thrift/ThriftTypes?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to