[ https://issues.apache.org/jira/browse/AVRO-419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymie Stata reassigned AVRO-419: --------------------------------- Assignee: Raymie Stata > Consistent laziness when resolving partially-compatible changes > --------------------------------------------------------------- > > Key: AVRO-419 > URL: https://issues.apache.org/jira/browse/AVRO-419 > Project: Avro > Issue Type: Bug > Components: spec > Reporter: Raymie Stata > Assignee: Raymie Stata > Priority: Major > > Avro schema resolution is generally "lazy" when it comes to dealing with > incompatible changes. If the writer writes a union of "int" and "null", and > the reader expects just an "int", Avro doesn't raise an exception unless the > writer _actually_ writes a "null" (and the reader attempts to read it). > This laziness is a powerful feature for supporting "forward compatibility" > (old readers reading data written by new writers). In the example just > given, for example, we might decide at some point that a column needs to be > "nullable" but there's a lot of old code that assumes that it's not. When > using old code, we can often arrange to avoid sending the old code any new > records that have null-values in that column. It's powerful to allow new > writers to write against the nullable schema and allow readers to read those > records. (For this to be safe, it's also important that this be _checked,_ > i.e., that a run time error is thrown is a bad value is passed to the reader.) > Avro is lazy in many places (e.g., in the union example just given, and for > enumerations). But it's not _consistently_ lazy. I propose we comb through > the spec and make it lazy in all places we can, unless there's a compelling > reason not to. > Numeric types is one area where Avro is not consistently lazy. I propose > that we fairly liberally allow any change from one numeric type to another, > and raise errors at runtime if bad values are found. An "int" can be changed > to a "long", for example, and an error is raised when a reader gets an > out-of-bounds value. A "double" can be changed to an "int", and an error is > raised if the reader gets a non-integer value or an out-of-bounds value. > (I'm not sure if there are types beyond numerics where we could be more > consistently lazy, but I decided to write this issue generically just in > case.) > (One might object that these checks are expensive, but note that they are > only needed when the reader and writer specs don't agree. Thus, if these > checks are induced, then the system designer _wanted_ these checks, we're > only adding value here, not inducing costs.) > I'm not sure if there are other a -- This message was sent by Atlassian JIRA (v7.6.3#76005)