[ https://issues.apache.org/jira/browse/AVRO-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Busbey updated AVRO-1811: ------------------------------ Priority: Critical (was: Major) > SpecificData.deepCopy() cannot be used if schema compiler generated Java > objects with Strings instead of UTF8 > ------------------------------------------------------------------------------------------------------------- > > Key: AVRO-1811 > URL: https://issues.apache.org/jira/browse/AVRO-1811 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.8.0 > Reporter: Ryon Day > Priority: Critical > > {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD} > When the Avro compiler creates Java objects, you have the option to have them > generate fields of type {{string}} with the Java standard {{String}} type, > for wide interoperability with existing Java applications and APIs. > By default, however, the compiler outputs these fields in the Avro-specific > {{Utf8}} type, requiring frequent usage of the {{toString()}} method in order > for default domain objects to be used with the majority of Java libraries. > There are two ways to get around this. The first is to annotate every > {{string}} field in a schema like so: > {code} > { > "name": "some_string", > "doc": "a field that is guaranteed to compile to java.lang.String", > "type": [ > "null", > { > "type": "string", > "avro.java.string": "String" > } > ] > }, > {code} > Unfortunately, long schemas containing many string fields can be dominated by > this annotation by volume; for teams using heterogenous clients, they may to > want to avoid Java-specific annotation in their schema files, or may not > think to use it unless there exist Java exploiters of the schema at the time > the schema is proposed and written. > The other solution to the problem is to compile the schema into Java objects > using the {{SpecificCompiler}}'s string type selection. This option actually > alters the schema carried by the object's {{SCHEMA$}} field to have the above > annotation in it, ensuring that when used by the Java API, the String type > will be used. > Unfortunately, this method is not interoperable with GenericRecords created > by libraries that use the _original_ schema. > {panel} > {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD} > # Create a schema with several {{string}} fields. > # Parse the schema using the standard Avro schema parser > # Create Java domain objects for that schema ensuring usage of the > {{java.lang.String}} string type. > # Create a message of some sort that ends up as a {{GenericRecord}} of the > original schema > # Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out > of the {{GenericRecord}} > There is a unit test that demonstrate this > [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java] > {panel} > {panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD} > As the schemas are literally identical aside from string type, the conversion > should work (and does work for schema that are exactly identical). > {panel} > {panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD} > {{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be > cast to java.lang.String}} > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332)