[ 
https://issues.apache.org/jira/browse/AVRO-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629209#comment-15629209
 ] 

Karel Fuka commented on AVRO-1827:
----------------------------------

Hi

I had been working on the original patch with Jakub and because this is 
becoming critical for us I am moving forward with it. [~tinxu] - thanks for 
your valuable feedback and you are right, there was an issue with the original 
patch. Though, it is a bit more subtle not completely related to a nested 
union. Let me explain:
1. If I run the following, it is all fine:
{code:title=sample}   
    public static enum E { A, B };
    public void test1() {
        Schema s3 = Schema.createUnion(Arrays.asList(
                Schema.create(Schema.Type.NULL),
                ReflectData.get().getSchema(E.class)));

        int ind = ReflectData.get().resolveUnion(s3, E.A);
   }
{code}
2. When I generate a ProtoBuf with the following definition and try to convert 
it to Avro, I really experience an exception thrown by resolveUnion():
{code:title=proto}
message EMessage {
  enum ET {
    A = 1;
    B = 2;
  }
  optional ET et = 3;
}
{code}
Note that the schema for 'et' is equivalent to what you see above. 

The real problem lies in 'ProtobufData' class, which extends 'GenericData' but 
does not override some methods (like 'getEnumSchema'), so a GenericData 
implementation gets called instead (and fails). Note that for example 
'getEnumSchema' comment contains a text reading _"May be overridden for 
alternate enum"_. 

Therefore, I have addressed this in the latest version, which I am going to 
attach. Please review and let me know if you see any more issues.

Thanks
Karel

> Handling correctly optional fields when converting Protobuf to Avro
> -------------------------------------------------------------------
>
>                 Key: AVRO-1827
>                 URL: https://issues.apache.org/jira/browse/AVRO-1827
>             Project: Avro
>          Issue Type: Improvement
>    Affects Versions: 1.7.7, 1.8.0
>            Reporter: Jakub Kahovec
>         Attachments: AVRO-1827.patch, AVRO-1827.patch, AVRO-1827.patch
>
>
> Hello,
> as of the current implementation of converting protobuf files into avro 
> format, protobuf optional fields are being  given default values in the avro 
> schema if not specified explicitly. 
> So for instance when the protobuf field is defined as  
> {quote}
> optional int64 fieldInt64 = 1;
> {quote}
> in the avro schema it appears as
> {quote}
>  "name" : "fieldInt64",
>   "type" : "long",
>   "default" : 0
> {quote}
> The problem with this implementation is that we are losing information about 
> whether the field was present or not in the original protobuf, as when we ask 
> for this field's value in avro we will be given the default value. 
> What I'm proposing instead is that if the field in the protobuf is defined as 
> optional and has no default value then the generated avro schema type will us 
> a union comprising the matching type and null type with default value null. 
> It is going to look like this:
> {quote}
>  "name" : "fieldIn64",
>   "type" : [ "null", "long" ],
>   "default" : null
> {quote}
> I'm aware that is a breaking change but I think that is the proper way how to 
> handle optional fields.
> I've also  created a patch which fixes the conversion
> Jakub 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to