Consistent laziness when resolving partially-compatible changes
---------------------------------------------------------------

                 Key: AVRO-419
                 URL: https://issues.apache.org/jira/browse/AVRO-419
             Project: Avro
          Issue Type: Bug
          Components: spec
            Reporter: Raymie Stata


Avro schema resolution is generally "lazy" when it comes to dealing with 
incompatible changes.  If the writer writes a union of "int" and "null", and 
the reader expects just an "int", Avro doesn't raise an exception unless the 
writer _actually_ writes a "null" (and the reader attempts to read it).

This laziness is a powerful feature for supporting "forward compatibility" (old 
readers reading data written by new writers).  In the example just given, for 
example, we might decide at some point that a column needs to be "nullable" but 
there's a lot of old code that assumes that it's not.  When using old code, we 
can often arrange to avoid sending the old code any new records that have 
null-values in that column.  It's powerful to allow new writers to write 
against the nullable schema and allow readers to read those records.  (For this 
to be safe, it's also important that this be _checked,_ i.e., that a run time 
error is thrown is a bad value is passed to the reader.)

Avro is lazy in many places (e.g., in the union example just given, and for 
enumerations).  But it's not _consistently_ lazy.  I propose we comb through 
the spec and make it lazy in all places we can, unless there's a compelling 
reason not to.

Numeric types is one area where Avro is not consistently lazy.  I propose that 
we fairly liberally allow any change from one numeric type to another, and 
raise errors at runtime if bad values are found.  An "int" can be changed to a 
"long", for example, and an error is raised when a reader gets an out-of-bounds 
value.  A "double" can be changed to an "int", and an error is raised if the 
reader gets a non-integer value or an out-of-bounds value.  (I'm not sure if 
there are types beyond numerics where we could be more consistently lazy, but I 
decided to write this issue generically just in case.)

(One might object that these checks are expensive, but note that they are only 
needed when the reader and writer specs don't agree.  Thus, if these checks are 
induced, then the system designer _wanted_ these checks, we're only adding 
value here, not inducing costs.)

I'm not sure if there are other a


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to