[ https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325334#comment-16325334 ]
Adam Bellemare edited comment on AVRO-1340 at 1/13/18 9:08 PM: --------------------------------------------------------------- Avro enum "UNKNOWN" defaults has become extremely important to our company in the past while, especially as we're using Kafka and Avro integrations extensively. This ticket is very relevant to what we're doing. Here are my thoughts, let me know if I am missing something. I've been following this thread for a while and I'm hoping that I can help get it moving towards some form of resolution. enum values have a specific meaning tied to them. Aliasing works well in the following conditions: 1) When the value added is entirely NEW to the data producer, and should therefore be aliased to UNKNOWN. If you alias it to an existing enum value you are redefining the data contract of that value. In this case a conversation should occur between the producer and the consumers of this data as it is now about renegotiating the data contract. 2) When the new enum values to be added added are entirely a COMPLETE SUBSET of an existing enum. For example, if the producer produces all 3xx HttpResponseCode as 300, splitting the enum value into 300, 301 and 302 and aliasing them all to 300 makes sense. It was always 300, and adding more granularity to the current schema is OK as it maps directly back to the single, original enum value. The only real value I can see aliasing adding is for #2 above, as #1 is the same as having a default field for unknown values. #2 above is a scenario that I have not yet encountered, and I question how common it is. Without aliasing it would also be possible to work around that issue, simply by creating a new enum entry with the newly defined enum values and eventually phasing out the old one. Note that this would be highly specific to the scenario where you need to split an enum value into a complete subset of other values. Additions of enums can be done easily, as the UNKNOWN default value will simply be used by older reader schemas. Redefining enum values via aliasing can be extremely dangerous. For instance, if HttpResponseCode = 300 was always ONLY just 300, then aliasing 301 and 302 to it breaks the definition of the enum value and can have consequences for downstream consumers of this data. As it stands I have major concerns that adding aliasing to enum values will greatly weaken the "data contract" aspect of a given enum as it would normalize the redefinition of enum values in a way that is transparent to the consumers of the data. was (Author: abellemare): Avro enum "UNKNOWN" defaults has become extremely important to our company in the past while, especially as we're using Kafka and Avro integrations extensively. This ticket is very relevant to what we're doing. Here are my thoughts, let me know if I am missing something. I've been following this thread for a while and I'm hoping that I can help get it moving towards some form of resolution, otherwise we're going to have to fork our own Avro implementation. enum values have a specific meaning tied to them. Aliasing works well in the following conditions: 1) When the value added is entirely NEW to the data producer, and should therefore be aliased to UNKNOWN. If you alias it to an existing enum value you are redefining the data contract of that value. In this case a conversation should occur between the producer and the consumers of this data as it is now about renegotiating the data contract. 2) When the new enum values to be added added are entirely a COMPLETE SUBSET of an existing enum. For example, if the producer produces all 3xx HttpResponseCode as 300, splitting the enum value into 300, 301 and 302 and aliasing them all to 300 makes sense. It was always 300, and adding more granularity to the current schema is OK as it maps directly back to the single, original enum value. The only real value I can see aliasing adding is for #2 above, as #1 is the same as having a default field for unknown values. #2 above is a scenario that I have not yet encountered, and I question how common it is. Without aliasing it would also be possible to work around that issue, simply by creating a new enum entry with the newly defined enum values and eventually phasing out the old one. Note that this would be highly specific to the scenario where you need to split an enum value into a complete subset of other values. Additions of enums can be done easily, as the UNKNOWN default value will simply be used by older reader schemas. Redefining enum values via aliasing can be extremely dangerous. For instance, if HttpResponseCode = 300 was always ONLY just 300, then aliasing 301 and 302 to it breaks the definition of the enum value and can have consequences for downstream consumers of this data. As it stands I have major concerns that adding aliasing to enum values will greatly weaken the "data contract" aspect of a given enum as it would normalize the redefinition of enum values in a way that is transparent to the consumers of the data. > use default to allow old readers to specify default enum value when > encountering new enum symbols > ------------------------------------------------------------------------------------------------- > > Key: AVRO-1340 > URL: https://issues.apache.org/jira/browse/AVRO-1340 > Project: Avro > Issue Type: Improvement > Components: spec > Environment: N/A > Reporter: Jim Donofrio > Priority: Minor > > The schema resolution page says: > > if both are enums: > > if the writer's symbol is not present in the reader's enum, then an > error is signalled. > This makes it difficult to use enum's because you can never add a enum value > and keep old reader's compatible. Why not use the default option to refer to > one of enum values so that when a old reader encounters a enum ordinal it > does not recognize, it can default to the optional schema provided one. If > the old schema does not provide a default then the older reader can continue > to fail as it does today. -- This message was sent by Atlassian JIRA (v6.4.14#64029)