[ 
https://issues.apache.org/jira/browse/AVRO-419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835465#action_12835465
 ] 

Scott Carey commented on AVRO-419:
----------------------------------

I think that Avro should have good, consistent default behavior, and agree that 
this default should probably be lazy.  But these defaults also need to be safe 
and consistent across all target languages.

However, the details should really be up to the client, fully controlled by 
some sort of configuration or annotation.  Sometimes a client will want to fail 
eagerly long before a specific tuple is encountered that can't be promoted.  
Sometimes a client will want an exception to be thrown.  Sometimes a client 
will want _something else_ to happen -- perhaps a callback or an override for a 
default value or something else.  Maybe one client thinks its completely fine 
to take a double that is larger than MAX_INT and cast it to an int with 
truncation to MAX_INT, while another wants an exception in that case, and a 
third never wants to down-cast.

There are lots of possibilities, and in the long run I think all those 
decisions can be configurable -- both at the schema level and via client 
specific overrides.  

This can all be achieved with fantastic performance in Java if these 'rules' 
were configured up front, and compiled into a parser (fast), static state 
machine (faster), or a class generated by asm and compiled 'to the metal' by 
the JIT (fastest -- zero overhead for resolving decoders beyond the initial 
resolution/compilation cost paid once per schema resolution pair).

> Consistent laziness when resolving partially-compatible changes
> ---------------------------------------------------------------
>
>                 Key: AVRO-419
>                 URL: https://issues.apache.org/jira/browse/AVRO-419
>             Project: Avro
>          Issue Type: Bug
>          Components: spec
>            Reporter: Raymie Stata
>
> Avro schema resolution is generally "lazy" when it comes to dealing with 
> incompatible changes.  If the writer writes a union of "int" and "null", and 
> the reader expects just an "int", Avro doesn't raise an exception unless the 
> writer _actually_ writes a "null" (and the reader attempts to read it).
> This laziness is a powerful feature for supporting "forward compatibility" 
> (old readers reading data written by new writers).  In the example just 
> given, for example, we might decide at some point that a column needs to be 
> "nullable" but there's a lot of old code that assumes that it's not.  When 
> using old code, we can often arrange to avoid sending the old code any new 
> records that have null-values in that column.  It's powerful to allow new 
> writers to write against the nullable schema and allow readers to read those 
> records.  (For this to be safe, it's also important that this be _checked,_ 
> i.e., that a run time error is thrown is a bad value is passed to the reader.)
> Avro is lazy in many places (e.g., in the union example just given, and for 
> enumerations).  But it's not _consistently_ lazy.  I propose we comb through 
> the spec and make it lazy in all places we can, unless there's a compelling 
> reason not to.
> Numeric types is one area where Avro is not consistently lazy.  I propose 
> that we fairly liberally allow any change from one numeric type to another, 
> and raise errors at runtime if bad values are found.  An "int" can be changed 
> to a "long", for example, and an error is raised when a reader gets an 
> out-of-bounds value.  A "double" can be changed to an "int", and an error is 
> raised if the reader gets a non-integer value or an out-of-bounds value.  
> (I'm not sure if there are types beyond numerics where we could be more 
> consistently lazy, but I decided to write this issue generically just in 
> case.)
> (One might object that these checks are expensive, but note that they are 
> only needed when the reader and writer specs don't agree.  Thus, if these 
> checks are induced, then the system designer _wanted_ these checks, we're 
> only adding value here, not inducing costs.)
> I'm not sure if there are other a

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to