[ 
https://issues.apache.org/jira/browse/HADOOP-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892861#action_12892861
 ] 

Gordon Sommers commented on HADOOP-6883:
----------------------------------------

Thanks for the quick feedback. I'm not sure I understand why calling it like 
that would make a difference -- if the bytes we want to decode amount to the 
entire string that should have the same effect as b64.decode(val.getBytes()), 
no? I looked at the api here: 
http://www.docjar.com/docs/api/org/apache/hadoop/io/Text.html#getBytes and it 
didn't specify anything unusual. Is Text encoding the bytes differently from 
String?

In either case, I'm glad to hear that you're working on improving the 
serialization framework -- I think a lot of people, myself included, will 
really appreciate that when it gets released. Thanks!

> Text.toString violates its abstraction
> --------------------------------------
>
>                 Key: HADOOP-6883
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6883
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.1
>         Environment: Linux
>            Reporter: Gordon Sommers
>
> I stumbled upon this when encoding a google protocol buffer in base64, and 
> storing it in a Text object for serialization. Compare the following two 
> lines:
> byte [] decoded = b64.decode(val.getBytes())
> //this does not return the same bytes as below and the result, after decoding 
> the base64 successfully, is a very mangled protocol buffer
> byte [] decoded = b64.decode(val.toString().getBytes());
> //YES, toString() FIXES IT
> Elsewhere in my code I also have: 
> Text curline = new Text(values.next().toString());
> byte [] raw = base64.decode(curline.getBytes());
> //This does work.
> It looks like the Text object must be toString'd (just once, somewhere, even 
> if its later repacked in a Text) before it will have the proper byte 
> representation. I would classify this as a leaky abstraction and ask that the 
> reason please be isolated and the api fixed somehow so that other developers 
> dont have to spend 3 days figuring out when Text.getBytes isn't returning the 
> right bytes even though Text.toString prints exactly the right string 
> representation and Text.toString.getBytes does return the right bytes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to