[
https://issues.apache.org/jira/browse/AVRO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715590#action_12715590
]
Doug Cutting commented on AVRO-36:
----------------------------------
> I like the spec the way it is i.e. length + actual bytes
The question is not how to encode binary values in Avro, but rather, how to
encode default values for binary fields in JSON-based schemas, which has no
support for binary values but only UTF-8 strings.
It is possible to encode arbitrary binary values in UTF-8, by encoding each
byte as a code point. The number of bytes encoded will differ than the raw
binary, as bytes between 128 and 255 must be encoded as two bytes. This has
the advantage of rendering ASCII portions of binary data in a readable manner,
but, in pathological cases, it can double data size. Base64 is more opaque,
but guarantees data size at 1.5 times the number of bytes.
For default values I'm not worried about the size, but base64 is a more
standard way of encoding binary values in text than perverting unicode. In
particular, base64 is designed to survive email and text editors, which makes
it easier to process as source code, as schemas will sometimes be.
Ideally we'd use an encoding that was both text-editor/email friendly and
transparent. URL encoding might thus be a better choice than base64 or raw
UTF-8. It's also readily available on most platforms. How would folks feel
about using URL encoding for default values of binary fields in JSON schemas?
> binary default values do not decode base64
> ------------------------------------------
>
> Key: AVRO-36
> URL: https://issues.apache.org/jira/browse/AVRO-36
> Project: Avro
> Issue Type: Bug
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
>
> The specification says that default values for binary data are base64 encoded
> text, but the Java implementation uses the raw bytes of the textual value,
> and does not perform base64 decoded as specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.