[ https://issues.apache.org/jira/browse/AVRO-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629209#comment-15629209 ]
Karel Fuka commented on AVRO-1827: ---------------------------------- Hi I had been working on the original patch with Jakub and because this is becoming critical for us I am moving forward with it. [~tinxu] - thanks for your valuable feedback and you are right, there was an issue with the original patch. Though, it is a bit more subtle not completely related to a nested union. Let me explain: 1. If I run the following, it is all fine: {code:title=sample} public static enum E { A, B }; public void test1() { Schema s3 = Schema.createUnion(Arrays.asList( Schema.create(Schema.Type.NULL), ReflectData.get().getSchema(E.class))); int ind = ReflectData.get().resolveUnion(s3, E.A); } {code} 2. When I generate a ProtoBuf with the following definition and try to convert it to Avro, I really experience an exception thrown by resolveUnion(): {code:title=proto} message EMessage { enum ET { A = 1; B = 2; } optional ET et = 3; } {code} Note that the schema for 'et' is equivalent to what you see above. The real problem lies in 'ProtobufData' class, which extends 'GenericData' but does not override some methods (like 'getEnumSchema'), so a GenericData implementation gets called instead (and fails). Note that for example 'getEnumSchema' comment contains a text reading _"May be overridden for alternate enum"_. Therefore, I have addressed this in the latest version, which I am going to attach. Please review and let me know if you see any more issues. Thanks Karel > Handling correctly optional fields when converting Protobuf to Avro > ------------------------------------------------------------------- > > Key: AVRO-1827 > URL: https://issues.apache.org/jira/browse/AVRO-1827 > Project: Avro > Issue Type: Improvement > Affects Versions: 1.7.7, 1.8.0 > Reporter: Jakub Kahovec > Attachments: AVRO-1827.patch, AVRO-1827.patch, AVRO-1827.patch > > > Hello, > as of the current implementation of converting protobuf files into avro > format, protobuf optional fields are being given default values in the avro > schema if not specified explicitly. > So for instance when the protobuf field is defined as > {quote} > optional int64 fieldInt64 = 1; > {quote} > in the avro schema it appears as > {quote} > "name" : "fieldInt64", > "type" : "long", > "default" : 0 > {quote} > The problem with this implementation is that we are losing information about > whether the field was present or not in the original protobuf, as when we ask > for this field's value in avro we will be given the default value. > What I'm proposing instead is that if the field in the protobuf is defined as > optional and has no default value then the generated avro schema type will us > a union comprising the matching type and null type with default value null. > It is going to look like this: > {quote} > "name" : "fieldIn64", > "type" : [ "null", "long" ], > "default" : null > {quote} > I'm aware that is a breaking change but I think that is the proper way how to > handle optional fields. > I've also created a patch which fixes the conversion > Jakub -- This message was sent by Atlassian JIRA (v6.3.4#6332)