[ 
https://issues.apache.org/jira/browse/AVRO-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian J. updated AVRO-2438:
-------------------------------
    Description: 
Having a schema fragment like this:
{code:java}
{
"name": "ownerId",
"type": [
  "null",
  {
    "type": "string",
    "java-class": "java.net.URI"
  }
],
"default": null
}{code}
can be perfectly deserialized in a generated POJO with
{code:java}
@org.apache.avro.specific.AvroGenerated
public class MyAvroDataObject extends 
org.apache.avro.specific.SpecificRecordBase implements 
org.apache.avro.specific.SpecificRecord {
...
@Deprecated public java.net.URI ownerId;{code}
as 

{{GenericDatumReader.readString(Object, Schema, Decoder)}} uses via the 
{{stringClassCache}} with 
{code:java}
{"type":"string","java-class":"java.net.URI"}=class java.net.URI{code}
The {{URI}} class itself to rehydrate the value via {{newInstanceFromString}}.

 

On the other hand, {{deepCopy}} only considers the schema-type of the field and 
turns in {{org.apache.avro.generic.GenericData.deepCopy(Schema, T)}}

the {{URI}} value into an {{org.apache.avro.util.Utf8}} via the {{String}} case 
which then causes a {{ClassCastException}}:
{noformat}
java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to 
java.net.URI
  at com.example.MyAvroDataObject.put(MyAvroDataObject.java:104)
  at org.apache.avro.generic.GenericData.setField(GenericData.java:660)
  at org.apache.avro.generic.GenericData.setField(GenericData.java:677)
  at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1082)
  at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1102)
  at 
org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1080){noformat}
 

The following dirty hack seems to avoid the issue - but is not in sync with the 
{{stringClassCache}} which should be consulted, too:
{code:java}
case STRING:
  // Strings are immutable
  if (value instanceof String) {
    return (T)value;
  }


  // Dirty Harry 9 3/4 start
  // URIs are immutable and are probably modeled as an URI itself 
  // TODO: Check with stringClassCache & the schema
  else if ((value instanceof URI)
    && URI.class.getName().equals(schema.getProp("java-class"))
    ) {
    return (T)value;
  }
  // Dirt Harry 9 3/4 end


  // Some CharSequence subclasses are mutable, so we still need to make
  // a copy
  else if (value instanceof Utf8) {
    // Utf8 copy constructor is more efficient than converting
    // to string and then back to Utf8
    return (T)new Utf8((Utf8)value);
  }
  return (T)new Utf8(value.toString());
{code}

  was:
Having a schema fragment like this:
{code:java}
{
"name": "ownerId",
"type": [
  "null",
  {
    "type": "string",
    "java-class": "java.net.URI"
  }
],
"default": null
}{code}
can be perfectly deserialized in a generated POJO with
{code:java}
@org.apache.avro.specific.AvroGenerated
public class MyAvroDataObject extends 
org.apache.avro.specific.SpecificRecordBase implements 
org.apache.avro.specific.SpecificRecord {
...
@Deprecated public java.net.URI ownerId;{code}
as 

{{GenericDatumReader.readString(Object, Schema, Decoder)}} uses via the 
{{stringClassCache}} with 
{code:java}
{"type":"string","java-class":"java.net.URI"}=class java.net.URI{code}
The {{URI}} class itself to rehydrate the value via {{newInstanceFromString}}.

 

On the other hand, {{deepCopy}} only considers the schema-type of the field and 
turns in {{org.apache.avro.generic.GenericData.deepCopy(Schema, T)}}

the {{URI}} value into an {{org.apache.avro.util.Utf8}} via the {{String}} case 
which then causes a {{ClassCastException}}:
{noformat}
java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to 
java.net.URI
  at com.example.MyAvroDataObject.put(MyAvroDataObject.java:104)
  at org.apache.avro.generic.GenericData.setField(GenericData.java:660)
  at org.apache.avro.generic.GenericData.setField(GenericData.java:677)
  at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1082)
  at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1102)
  at 
org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1080){noformat}
 

The following dirty hack seems to avoid the issue - but is not in sync with the 
{{stringClassCache}} which should be consulted, too:
{code:java}
case STRING:
  // Strings are immutable
  if (value instanceof String) {
    return (T)value;
  }
  // Dirty Harry 9 3/4 start
  // URIs are immutable and are probably modeled as an URI itself 
  // TODO: Check with stringClassCache & the schema
  else if ((value instanceof URI)
    && URI.class.getName().equals(schema.getProp("java-class"))
    ) {
    return (T)value;
  }
  // Dirt Harry 9 3/4 end
  // Some CharSequence subclasses are mutable, so we still need to make
  // a copy
  else if (value instanceof Utf8) {
    // Utf8 copy constructor is more efficient than converting
    // to string and then back to Utf8
    return (T)new Utf8((Utf8)value);
  }
  return (T)new Utf8(value.toString());
{code}


> SpecificData.deepCopy() cannot be used with URI fields
> ------------------------------------------------------
>
>                 Key: AVRO-2438
>                 URL: https://issues.apache.org/jira/browse/AVRO-2438
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.2
>            Reporter: Sebastian J.
>            Priority: Major
>
> Having a schema fragment like this:
> {code:java}
> {
> "name": "ownerId",
> "type": [
>   "null",
>   {
>     "type": "string",
>     "java-class": "java.net.URI"
>   }
> ],
> "default": null
> }{code}
> can be perfectly deserialized in a generated POJO with
> {code:java}
> @org.apache.avro.specific.AvroGenerated
> public class MyAvroDataObject extends 
> org.apache.avro.specific.SpecificRecordBase implements 
> org.apache.avro.specific.SpecificRecord {
> ...
> @Deprecated public java.net.URI ownerId;{code}
> as 
> {{GenericDatumReader.readString(Object, Schema, Decoder)}} uses via the 
> {{stringClassCache}} with 
> {code:java}
> {"type":"string","java-class":"java.net.URI"}=class java.net.URI{code}
> The {{URI}} class itself to rehydrate the value via {{newInstanceFromString}}.
>  
> On the other hand, {{deepCopy}} only considers the schema-type of the field 
> and turns in {{org.apache.avro.generic.GenericData.deepCopy(Schema, T)}}
> the {{URI}} value into an {{org.apache.avro.util.Utf8}} via the {{String}} 
> case which then causes a {{ClassCastException}}:
> {noformat}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to 
> java.net.URI
>   at com.example.MyAvroDataObject.put(MyAvroDataObject.java:104)
>   at org.apache.avro.generic.GenericData.setField(GenericData.java:660)
>   at org.apache.avro.generic.GenericData.setField(GenericData.java:677)
>   at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1082)
>   at org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1102)
>   at 
> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:1080){noformat}
>  
> The following dirty hack seems to avoid the issue - but is not in sync with 
> the {{stringClassCache}} which should be consulted, too:
> {code:java}
> case STRING:
>   // Strings are immutable
>   if (value instanceof String) {
>     return (T)value;
>   }
>   // Dirty Harry 9 3/4 start
>   // URIs are immutable and are probably modeled as an URI itself 
>   // TODO: Check with stringClassCache & the schema
>   else if ((value instanceof URI)
>     && URI.class.getName().equals(schema.getProp("java-class"))
>     ) {
>     return (T)value;
>   }
>   // Dirt Harry 9 3/4 end
>   // Some CharSequence subclasses are mutable, so we still need to make
>   // a copy
>   else if (value instanceof Utf8) {
>     // Utf8 copy constructor is more efficient than converting
>     // to string and then back to Utf8
>     return (T)new Utf8((Utf8)value);
>   }
>   return (T)new Utf8(value.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to