[ https://issues.apache.org/jira/browse/HADOOP-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892861#action_12892861 ]
Gordon Sommers commented on HADOOP-6883: ---------------------------------------- Thanks for the quick feedback. I'm not sure I understand why calling it like that would make a difference -- if the bytes we want to decode amount to the entire string that should have the same effect as b64.decode(val.getBytes()), no? I looked at the api here: http://www.docjar.com/docs/api/org/apache/hadoop/io/Text.html#getBytes and it didn't specify anything unusual. Is Text encoding the bytes differently from String? In either case, I'm glad to hear that you're working on improving the serialization framework -- I think a lot of people, myself included, will really appreciate that when it gets released. Thanks! > Text.toString violates its abstraction > -------------------------------------- > > Key: HADOOP-6883 > URL: https://issues.apache.org/jira/browse/HADOOP-6883 > Project: Hadoop Common > Issue Type: Bug > Components: io > Affects Versions: 0.20.1 > Environment: Linux > Reporter: Gordon Sommers > > I stumbled upon this when encoding a google protocol buffer in base64, and > storing it in a Text object for serialization. Compare the following two > lines: > byte [] decoded = b64.decode(val.getBytes()) > //this does not return the same bytes as below and the result, after decoding > the base64 successfully, is a very mangled protocol buffer > byte [] decoded = b64.decode(val.toString().getBytes()); > //YES, toString() FIXES IT > Elsewhere in my code I also have: > Text curline = new Text(values.next().toString()); > byte [] raw = base64.decode(curline.getBytes()); > //This does work. > It looks like the Text object must be toString'd (just once, somewhere, even > if its later repacked in a Text) before it will have the proper byte > representation. I would classify this as a leaky abstraction and ask that the > reason please be isolated and the api fixed somehow so that other developers > dont have to spend 3 days figuring out when Text.getBytes isn't returning the > right bytes even though Text.toString prints exactly the right string > representation and Text.toString.getBytes does return the right bytes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.