[
https://issues.apache.org/jira/browse/AVRO-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899426#comment-13899426
]
Doug Cutting commented on AVRO-1456:
------------------------------------
I'm not sure that it is a bug for AvroAsTextInputFormat to use the toString()
JSON encoding rather than the Avro encoding. Generally AvroAsTextInputFormat
is used to supply Avro to non-Avro-aware tools, where folks generally seem to
prefer to represent unions as simply different types in the JSON data.
Perhaps we could include an option to use the Avro JSON encoding here too.
Would that be of use to you?
> AvroAsTextInputFormat is inconsistent with the Avro JSON Encoding described
> in the Avro Specification
> -----------------------------------------------------------------------------------------------------
>
> Key: AVRO-1456
> URL: https://issues.apache.org/jira/browse/AVRO-1456
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.6
> Reporter: Jamie Olson
>
> org.apache.avro.mapred.AvroAsTextInputFormat relies on the toString() method
> rather than using org.apache.avro.generic.GenericDatumWriter.write() and
> org.apache.avro.io.JsonEncoder as in org.apache.avro.tool.DataFileReadTool.
> This results in a serialization of the data element, without the fully
> qualified name as specified in the Avro Specifications JSON Encoding section:
> http://avro.apache.org/docs/1.7.6/spec.html#json_encoding
> The specification indicates that for a union type: ["null","string","Foo"],
> data should be serialized with:
> * null as null;
> * the string "a" as {"string": "a"}; and
> * a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding
> of a Foo instance.
> Instead, AvroAsTextInputFormat is serializing these values as
> * null as null;
> * the string "a" as "a"; and
> * a Foo instance as {...}, where {...} indicates the JSON encoding of a Foo
> instance.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)