While working on AVRO-2090, I noticed what is either an implementation bug or a specification bug in schema resolution for enumerations.
The relevant code is here: https://bit.ly/2q5tsIp. This code uses the reader's default symbol, if it exists, in the case where the writer's symbols is missing. Let's think about this through an example. Let's say the reader defines just two symbols for an enum: "alpha" and "beta", with "one" as the default. Let's say that the writer had three symbols: "alpha", "beta", and "gamma". The way https://bit.ly/2q5tsIp is written, if the reader encounters a file containing the symbol "gamma", and error will NOT be thrown. Instead, the reader will be told that the actual symbol was "alpha". Note that the Avro specification says the following about matching enumerations: "if the writer's symbol is not present in the reader's enum, then an error is signalled." This would suggest that, in the example just described, an error should be thrown, rather than the value "alpha" returns. So either the code is wrong, or the spec is wrong. On a related note, the current spec says nothing about a "default" property for enumerations. When should this property be used? As a "default default" for fields? (If so, this isn't happening.) As a value to be used in resolution, when the writer provides a symbole that is not (any longer) defined? (If so, this is happening in the code, but the spec needs an update.) And/or should it be used in other circumstances? I'm willing to update docs and/or code appropriately, but can someone indicated the intended semantics of "default" for enums? Raymie