[ 
https://issues.apache.org/jira/browse/HDFS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903118#action_12903118
 ] 

Stuart Smith commented on HDFS-1169:
------------------------------------

I think I know enough to make this change and do some unit testing, but I need 
a little java guidance (on building everything).

Mainly, I need help on compiling the hadoopthriftapi.jar file from the gen-java 
files.

I actually really need this for my own uses.

My first take outlined below starts with just to converting the read/write 
methods to use binary (vs adding new methods). This way I don't have to worry 
about making sure the correct read/write methods are called in the initial 
version.

I re-generated the thrift java files with a new thrift interface the 
reads/writes in binary.

- note that binary data is converted to UTF-8 on write as well as read, so if 
you just update the thrift client to write binary, the server will add unicode 
escape characters before it's even saved to hdfs.

The code in:

hadoop-0.20.2/src/contrib/thriftfs/src/java/org/apache/hadoop/thriftfs/HadoopThriftServer.java

is straightforward as well.

However! this implements the interface defined in:

org.apache.hadoop.thriftfs.api.ThriftHadoopFileSystem.Iface

And even though I update the source in:

hadoop-0.20.2/src/contrib/thriftfs/gen-java

I get an error about overriding the read/write methods incorrectly, so it 
appears to be pulling the definition of the

org.apache.hadoop.thriftfs.api.ThriftHadoopFileSystem.Iface

from hadoopthriftapi.jar (which makes sense).

However, I don't know how to rebuild hadoopthriftapi.jar.

I'll attach the thrift file and the HadoopThriftServer.java file a little 
later, but I just wanted to get this comment up - maybe someone can give me 
simple instructions on how to build hadoopthriftapi.jar from the gen-java files?


> Can't read binary data off HDFS via thrift API
> ----------------------------------------------
>
>                 Key: HDFS-1169
>                 URL: https://issues.apache.org/jira/browse/HDFS-1169
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: contrib/thriftfs
>    Affects Versions: 0.20.2
>            Reporter: Erik Forsberg
>
> Trying to access binary data stored in HDFS (in my case, TypedByte files 
> generated by Dumbo) via thrift talking to 
> org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get back is 
> mangled. For example, when I read a file which contains the value 0xa2, it's 
> coming back as 0xef 0xbf 0xbd, also known as the Unicode replacement 
> character.
> I think this is because the read method in HadoopThriftServer.java is trying 
> to convert the data read from HDFS into UTF-8 via the String() constructor. 
> This essentially makes the HDFS thrift API useless for me :-(.
> Not being an expert on Thrift, but would it be possible to modify the API so 
> that it uses the binary type listed on 
> http://wiki.apache.org/thrift/ThriftTypes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to