[
https://issues.apache.org/jira/browse/HDFS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903178#action_12903178
]
Stuart Smith commented on HDFS-1169:
------------------------------------
oy. didn't format the code, sorry:
{noformat}
/**
* write to a file
*/
public boolean write(ThriftHandle tout, String encodedData) throws
ThriftIOException {
try {
now = now();
HadoopThriftHandler.LOG.debug("write: " + tout.id);
FSDataOutputStream out = (FSDataOutputStream)lookup(tout.id);
Base64 base64 = new Base64();
byte[] tmp = null;
tmp = (byte[])base64.decode( (byte[]) encodedData.getBytes("UTF-8") );
out.write(tmp, 0, tmp.length);
HadoopThriftHandler.LOG.debug("wrote: " + tout.id);
return true;
} catch (IOException e) {
throw new ThriftIOException(e.getMessage());
}
}
/**
* read from a file
*/
public String read(ThriftHandle tout, long offset,
int length) throws ThriftIOException {
try {
now = now();
HadoopThriftHandler.LOG.debug("read: " + tout.id +
" offset: " + offset +
" length: " + length);
FSDataInputStream in = (FSDataInputStream)lookup(tout.id);
if (in.getPos() != offset) {
in.seek(offset);
}
byte[] tmp = new byte[length];
int numbytes = in.read(offset, tmp, 0, length);
HadoopThriftHandler.LOG.debug("read done: " + tout.id);
try
{
Base64 base64 = new Base64();
return new String( (byte[])base64.encode( (Object)tmp ), "UTF-8");
}
catch( EncoderException e )
{
e.printStackTrace();
System.exit(0);
return "";
}
} catch (IOException e) {
throw new ThriftIOException(e.getMessage());
}
}
{noformat}
> Can't read binary data off HDFS via thrift API
> ----------------------------------------------
>
> Key: HDFS-1169
> URL: https://issues.apache.org/jira/browse/HDFS-1169
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: contrib/thriftfs
> Affects Versions: 0.20.2
> Reporter: Erik Forsberg
> Attachments: hadoopfs.thrift, HadoopThriftServer.java
>
>
> Trying to access binary data stored in HDFS (in my case, TypedByte files
> generated by Dumbo) via thrift talking to
> org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get back is
> mangled. For example, when I read a file which contains the value 0xa2, it's
> coming back as 0xef 0xbf 0xbd, also known as the Unicode replacement
> character.
> I think this is because the read method in HadoopThriftServer.java is trying
> to convert the data read from HDFS into UTF-8 via the String() constructor.
> This essentially makes the HDFS thrift API useless for me :-(.
> Not being an expert on Thrift, but would it be possible to modify the API so
> that it uses the binary type listed on
> http://wiki.apache.org/thrift/ThriftTypes?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.