[ 
https://issues.apache.org/jira/browse/AVRO-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymie Stata reassigned AVRO-419:
---------------------------------

    Assignee: Raymie Stata

> Consistent laziness when resolving partially-compatible changes
> ---------------------------------------------------------------
>
>                 Key: AVRO-419
>                 URL: https://issues.apache.org/jira/browse/AVRO-419
>             Project: Avro
>          Issue Type: Bug
>          Components: spec
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>            Priority: Major
>
> Avro schema resolution is generally "lazy" when it comes to dealing with 
> incompatible changes.  If the writer writes a union of "int" and "null", and 
> the reader expects just an "int", Avro doesn't raise an exception unless the 
> writer _actually_ writes a "null" (and the reader attempts to read it).
> This laziness is a powerful feature for supporting "forward compatibility" 
> (old readers reading data written by new writers).  In the example just 
> given, for example, we might decide at some point that a column needs to be 
> "nullable" but there's a lot of old code that assumes that it's not.  When 
> using old code, we can often arrange to avoid sending the old code any new 
> records that have null-values in that column.  It's powerful to allow new 
> writers to write against the nullable schema and allow readers to read those 
> records.  (For this to be safe, it's also important that this be _checked,_ 
> i.e., that a run time error is thrown is a bad value is passed to the reader.)
> Avro is lazy in many places (e.g., in the union example just given, and for 
> enumerations).  But it's not _consistently_ lazy.  I propose we comb through 
> the spec and make it lazy in all places we can, unless there's a compelling 
> reason not to.
> Numeric types is one area where Avro is not consistently lazy.  I propose 
> that we fairly liberally allow any change from one numeric type to another, 
> and raise errors at runtime if bad values are found.  An "int" can be changed 
> to a "long", for example, and an error is raised when a reader gets an 
> out-of-bounds value.  A "double" can be changed to an "int", and an error is 
> raised if the reader gets a non-integer value or an out-of-bounds value.  
> (I'm not sure if there are types beyond numerics where we could be more 
> consistently lazy, but I decided to write this issue generically just in 
> case.)
> (One might object that these checks are expensive, but note that they are 
> only needed when the reader and writer specs don't agree.  Thus, if these 
> checks are induced, then the system designer _wanted_ these checks, we're 
> only adding value here, not inducing costs.)
> I'm not sure if there are other a



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to