[ 
https://issues.apache.org/jira/browse/AVRO-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated AVRO-1811:
------------------------------
    Priority: Critical  (was: Major)

> SpecificData.deepCopy() cannot be used if schema compiler generated Java 
> objects with Strings instead of UTF8
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1811
>                 URL: https://issues.apache.org/jira/browse/AVRO-1811
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.0
>            Reporter: Ryon Day
>            Priority: Critical
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> When the Avro compiler creates Java objects, you have the option to have them 
> generate fields of type {{string}} with the Java standard {{String}} type, 
> for wide interoperability with existing Java applications and APIs.
> By default, however, the compiler outputs these fields in the Avro-specific 
> {{Utf8}} type, requiring frequent usage of the {{toString()}} method in order 
> for default domain objects to be used with the majority of Java libraries.
> There are two ways to get around this. The first is to annotate every 
> {{string}} field in a schema like so:
> {code}
>     {
>       "name": "some_string",
>       "doc": "a field that is guaranteed to compile to java.lang.String",
>       "type": [
>         "null",
>         {
>           "type": "string",
>           "avro.java.string": "String"
>         }
>       ]
>     },
> {code}
> Unfortunately, long schemas containing many string fields can be dominated by 
> this annotation by volume; for teams using heterogenous clients, they may to 
> want to avoid  Java-specific annotation in their schema files, or may not 
> think to use it unless there exist Java exploiters of the schema at the time 
> the schema is proposed and written.
> The other solution to the problem is to compile the schema into Java objects  
> using the {{SpecificCompiler}}'s string type selection. This option actually 
> alters the schema carried by the object's {{SCHEMA$}} field to have the above 
> annotation in it, ensuring that when used by the Java API, the String type 
> will be used. 
> Unfortunately, this method is not interoperable with GenericRecords created 
> by libraries that use the _original_ schema.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> # Create a schema with several {{string}} fields.
> # Parse the schema using the standard Avro schema parser
> # Create Java domain objects for that schema ensuring usage of the 
> {{java.lang.String}} string type.
> # Create a message of some sort that ends up as a {{GenericRecord}} of the 
> original schema
> # Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out 
> of the {{GenericRecord}} 
> There is a unit test that demonstrate this 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
> {panel}
> {panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
> As the schemas are literally identical aside from string type, the conversion 
> should work (and does work for schema that are exactly identical).
> {panel}
> {panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
> {{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be 
> cast to java.lang.String}}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to