[ 
https://issues.apache.org/jira/browse/AVRO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721350#action_12721350
 ] 

Doug Cutting commented on AVRO-36:
----------------------------------

> How do you encode a default value of 0xFFFF - two bytes?

We'd map 1 code point to 1 byte.  So the two byte sequence [FF, FF] would be 
encoded in JSON as  "\u00FF\u00FF".

> Do the strings "\uFFFF" and"\u00FF\u00FF" represent the same binary data?

No.  We'd only use code points 0-255.  So "\uFFFF" would be illegal.

I'd much prefer we avoid encodings that render text unreadable, since binary 
values often include text.  So that rules out base64, hex, etc., leaving us 
with a choice between URL encoding and the bytes-as-codepoints encoding.  URL 
encoding is more compact in some cases, but transforms many textual characters, 
like turning spaces to pluses.  So I am currently leaning towards the codepoint 
encoding.  It seems the most natural in JSON.  In particular, it is the 
simplest to implement, since a JSON library is already required to implement 
AVRO, and one must merely construct a string whose codepoints are the bytes and 
then the JSON library will implement the coding and decoding.

> binary default values do not decode base64
> ------------------------------------------
>
>                 Key: AVRO-36
>                 URL: https://issues.apache.org/jira/browse/AVRO-36
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>
> The specification says that default values for binary data are base64 encoded 
> text, but the Java implementation uses the raw bytes of the textual value, 
> and does not perform base64 decoded as specified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to