[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325334#comment-16325334
 ] 

Adam Bellemare edited comment on AVRO-1340 at 1/13/18 9:08 PM:
---------------------------------------------------------------

Avro enum "UNKNOWN" defaults has become extremely important to our company in 
the past while, especially as we're using Kafka and Avro integrations 
extensively. This ticket is very relevant to what we're doing. Here are my 
thoughts, let me know if I am missing something. I've been following this 
thread for a while and I'm hoping that I can help get it moving towards some 
form of resolution.



enum values have a specific meaning tied to them. Aliasing works well in the 
following conditions:

1) When the value added is entirely NEW to the data producer, and should 
therefore be aliased to UNKNOWN. If you alias it to an existing enum value you 
are redefining the data contract of that value. In this case a conversation 
should occur between the producer and the consumers of this data as it is now 
about renegotiating the data contract.

2) When the new enum values to be added added are entirely a COMPLETE SUBSET of 
an existing enum. For example, if the producer produces all 3xx 
HttpResponseCode as 300, splitting the enum value into 300, 301 and 302 and 
aliasing them all to 300 makes sense. It was always 300, and adding more 
granularity to the current schema is OK as it maps directly back to the single, 
original enum value.

The only real value I can see aliasing adding is for #2 above, as #1 is the 
same as having a default field for unknown values. #2 above is a scenario that 
I have not yet encountered, and I question how common it is. Without aliasing 
it would also be possible to work around that issue, simply by creating a new 
enum entry with the newly defined enum values and eventually phasing out the 
old one. Note that this would be highly specific to the scenario where you need 
to split an enum value into a complete subset of other values. Additions of 
enums can be done easily, as the UNKNOWN default value will simply be used by 
older reader schemas.

Redefining enum values via aliasing can be extremely dangerous. For instance, 
if HttpResponseCode = 300 was always ONLY just 300, then aliasing 301 and 302 
to it breaks the definition of the enum value and can have consequences for 
downstream consumers of this data. As it stands I have major concerns that 
adding aliasing to enum values will greatly weaken the "data contract" aspect 
of a given enum as it would normalize the redefinition of enum values in a way 
that is transparent to the consumers of the data. 


was (Author: abellemare):
Avro enum "UNKNOWN" defaults has become extremely important to our company in 
the past while, especially as we're using Kafka and Avro integrations 
extensively. This ticket is very relevant to what we're doing. Here are my 
thoughts, let me know if I am missing something. I've been following this 
thread for a while and I'm hoping that I can help get it moving towards some 
form of resolution, otherwise we're going to have to fork our own Avro 
implementation.



enum values have a specific meaning tied to them. Aliasing works well in the 
following conditions:

1) When the value added is entirely NEW to the data producer, and should 
therefore be aliased to UNKNOWN. If you alias it to an existing enum value you 
are redefining the data contract of that value. In this case a conversation 
should occur between the producer and the consumers of this data as it is now 
about renegotiating the data contract.

2) When the new enum values to be added added are entirely a COMPLETE SUBSET of 
an existing enum. For example, if the producer produces all 3xx 
HttpResponseCode as 300, splitting the enum value into 300, 301 and 302 and 
aliasing them all to 300 makes sense. It was always 300, and adding more 
granularity to the current schema is OK as it maps directly back to the single, 
original enum value.

The only real value I can see aliasing adding is for #2 above, as #1 is the 
same as having a default field for unknown values. #2 above is a scenario that 
I have not yet encountered, and I question how common it is. Without aliasing 
it would also be possible to work around that issue, simply by creating a new 
enum entry with the newly defined enum values and eventually phasing out the 
old one. Note that this would be highly specific to the scenario where you need 
to split an enum value into a complete subset of other values. Additions of 
enums can be done easily, as the UNKNOWN default value will simply be used by 
older reader schemas.

Redefining enum values via aliasing can be extremely dangerous. For instance, 
if HttpResponseCode = 300 was always ONLY just 300, then aliasing 301 and 302 
to it breaks the definition of the enum value and can have consequences for 
downstream consumers of this data. As it stands I have major concerns that 
adding aliasing to enum values will greatly weaken the "data contract" aspect 
of a given enum as it would normalize the redefinition of enum values in a way 
that is transparent to the consumers of the data. 

> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1340
>                 URL: https://issues.apache.org/jira/browse/AVRO-1340
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>         Environment: N/A
>            Reporter: Jim Donofrio
>            Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to