While working on AVRO-2090, I noticed what is either an implementation
bug or a specification bug in schema resolution for enumerations.

The relevant code is here: https://bit.ly/2q5tsIp.  This code uses the
reader's default symbol, if it exists, in the case where the writer's
symbols is missing.

Let's think about this through an example.  Let's say the reader
defines just two symbols for an enum: "alpha" and "beta", with "one"
as the default.  Let's say that the writer had three symbols: "alpha",
"beta", and "gamma".  The way https://bit.ly/2q5tsIp is written, if
the reader encounters a file containing the symbol "gamma", and error
will NOT be thrown.  Instead, the reader will be told that the actual
symbol was "alpha".

Note that the Avro specification says the following about matching
enumerations: "if the writer's symbol is not present in the reader's
enum, then an error is signalled."  This would suggest that, in the
example just described, an error should be thrown, rather than the
value "alpha" returns.  So either the code is wrong, or the spec is
wrong.

On a related note, the current spec says nothing about a "default"
property for enumerations.  When should this property be used?  As a
"default default" for fields?  (If so, this isn't happening.)  As a
value to be used in resolution, when the writer provides a symbole
that is not (any longer) defined?  (If so, this is happening in the
code, but the spec needs an update.)  And/or should it be used in
other circumstances?

I'm willing to update docs and/or code appropriately, but can someone
indicated the intended semantics of "default" for enums?

  Raymie

Reply via email to