Hi,
I'm working on a project where we are putting message serialized avro
records into Kafka. The schemas are made available via a schema registry of
some sorts.
Because Kafka stores the messages for a longer period 'weeks' we have two
common scenarios that occur when a new version of the schema is introduced
(i.e. from V1 to V2).
1) A V2 producer is released and a V1 consumer must be able to read the
records.
2) A 'new' V2 consumer is released a few days after the V2 producer started
creating records. The V2 consumer starts reading Kafka "from the beginning"
and as a consequence first has to go through a set of V1 records.
So in this usecase we need schema evolution in two directions.
To make sure it all works as expected I did some experiments and found that
these requirements are all doable except when you are in need of an enum.
This 'two directions' turns out to have a problem with changing the values
of an enum.
You cannot write an enum { 'A', 'B', 'C' } and then read it with the schema
enum { 'A', 'B' }
So I was thinking about a possible way to make this easier for the
developer.
The current idea that I want your opinion on:
1) In the IDL we add a way of directing that we want the enum to be stored
in a different way in the schema. I was thinking about something like
either defining a new type like 'string enum' or perhaps use an annotation
of some sorts.
2) The 'string enum' is mapped into the actual schema as a string (which
can contain ANY value). So anyone using the json schema can simply read it
because it is a string.
3) The generated code that is used to set/change the value enforces that
only the allowed values can be set.
This way a 'reader' can read any value, the schema is compatible in all
directions.
What do you guys think?
Is this an idea worth trying out?
--
Best regards / Met vriendelijke groeten,
Niels Basjes