[ https://issues.apache.org/jira/browse/HDFS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903178#action_12903178 ]
Stuart Smith commented on HDFS-1169: ------------------------------------ oy. didn't format the code, sorry: {noformat} /** * write to a file */ public boolean write(ThriftHandle tout, String encodedData) throws ThriftIOException { try { now = now(); HadoopThriftHandler.LOG.debug("write: " + tout.id); FSDataOutputStream out = (FSDataOutputStream)lookup(tout.id); Base64 base64 = new Base64(); byte[] tmp = null; tmp = (byte[])base64.decode( (byte[]) encodedData.getBytes("UTF-8") ); out.write(tmp, 0, tmp.length); HadoopThriftHandler.LOG.debug("wrote: " + tout.id); return true; } catch (IOException e) { throw new ThriftIOException(e.getMessage()); } } /** * read from a file */ public String read(ThriftHandle tout, long offset, int length) throws ThriftIOException { try { now = now(); HadoopThriftHandler.LOG.debug("read: " + tout.id + " offset: " + offset + " length: " + length); FSDataInputStream in = (FSDataInputStream)lookup(tout.id); if (in.getPos() != offset) { in.seek(offset); } byte[] tmp = new byte[length]; int numbytes = in.read(offset, tmp, 0, length); HadoopThriftHandler.LOG.debug("read done: " + tout.id); try { Base64 base64 = new Base64(); return new String( (byte[])base64.encode( (Object)tmp ), "UTF-8"); } catch( EncoderException e ) { e.printStackTrace(); System.exit(0); return ""; } } catch (IOException e) { throw new ThriftIOException(e.getMessage()); } } {noformat} > Can't read binary data off HDFS via thrift API > ---------------------------------------------- > > Key: HDFS-1169 > URL: https://issues.apache.org/jira/browse/HDFS-1169 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/thriftfs > Affects Versions: 0.20.2 > Reporter: Erik Forsberg > Attachments: hadoopfs.thrift, HadoopThriftServer.java > > > Trying to access binary data stored in HDFS (in my case, TypedByte files > generated by Dumbo) via thrift talking to > org.apache.hadoop.thriftfs.HadoopThriftServer, the data I get back is > mangled. For example, when I read a file which contains the value 0xa2, it's > coming back as 0xef 0xbf 0xbd, also known as the Unicode replacement > character. > I think this is because the read method in HadoopThriftServer.java is trying > to convert the data read from HDFS into UTF-8 via the String() constructor. > This essentially makes the HDFS thrift API useless for me :-(. > Not being an expert on Thrift, but would it be possible to modify the API so > that it uses the binary type listed on > http://wiki.apache.org/thrift/ThriftTypes? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.