Re: Avro union compatibility mode enhancement proposal

Yacine Benabderrahmane Tue, 29 Mar 2016 02:41:43 -0700

Hi Ryan,

Just a little up^^. Could you please (or anyone else) give me a little
feedback ?
Thanks in advance.


Regards,
Yacine.

2016-03-21 17:36 GMT+01:00 Yacine Benabderrahmane <[email protected]
>:

> Hi Ryan,
>
> Thank you for giving feedback. I will try in the following to provide you
> some more details about the addressed problem.
>
> But before that, just a brief reminder of the context. Avro has been
> chosen in this project (and by many other ones for sure) especially for a
> very important feature: enabling forward and backward compatibility
> management through schema life-cycle. Our development model involves
> intensive usage of this feature, and many heavy developments are made in
> parallel streams inside feature teams that share the same schema, provided
> the evolution of the latter complies with the stated compatibility rules.
> This implies that all the entities supported by the Avro schema must
> support the two-way compatibility, including unions. However, in the
> special case of the union, this two-way compatibility is not well supported
> by the current rules. Let me explain you the basement of our point of view,
> it remains quite simple.
>
> The use case is to have, for example,
> - a first union version A:
>       { "name": "Vehicle",
>         "type": ["null", "Car"] }
> - a new version of it B:
>       { "name": "Vehicle",
>         "type": ["null", "Car", "Bus"] }
> For being forward compatible, an evolution of the union schema must
> guarantee that an old reader reading with A can read the data written with
> the new schema B. Getting an error just means that the forward
> compatibility feature is broken. But this is not actually the case (and
> this behavior is not suitable), because the old reader has a correct schema
> and this schema has evolved naturally to version B to incorporate a new
> Vehicle type. Not knowing this new type must not produce an error, but just
> give the reader a default value, which means: "Either the data is not there
> or you do not know how to handle it".
>
> This is thought while keeping in mind that in an object-oriented code
> modeling, a union field is seen as a class member with the higher level
> generic type ("Any" (scala) or "Object" (java5+)...). Therefore, it is
> natural for a modeler / programmer to implement the ability of not getting
> the awaited types and using some default value of known type. To give a
> more complete specification, the new mode of compatibility has to impose
> one rule: the union default value must not change through versions and the
> corresponding type must be placed at the top of the types list. This is
> much easier to handle by development streams, because it is addressed once
> for all in the very beginning of the schema life-cycle, than the fact to
> oblige a number of teams, among which some are just not in place anymore,
> to update the whole code just because another dev team has deployed a new
> version of the union in the schema.
>
> Now, for being backward compatible, the reader with B must always be able
> to read data written with schema A. Even if the type included in the data
> is not known, so it gets the default value and not an error.
>
> I understand that getting an error could make sense when the requested
> field is not present. However, this behavior:
>
>    - is very restrictive, meaning: this obliges the old reader to update
>    its code for integrating the new schema, while he is not managing to do it
>    for many reasons: development stream of next delivery is not finished, or
>    not engaged, or not even planned - in the case of old and stable code
>    - breaks the forward compatibility feature: the older reader is not
>    able to read the new version of the union without getting an error
>    - breaks the backward compatibility feature: the new reader is not
>    able to read an old version containing unknown types of the union without
>    getting an error
>
> By the way, what do you exactly mean by "pushing evolution lower" and
> "update the record"? Could you please give me an example of the trick you
> are talking about?
>
> Just to be a bit more precise, we are not targeting to use a "trick". This
> life-cycle management should be included in a standard so to keep the
> software development clean, production safe and compliant with a complex
> product road-map.
>
> Finally, you seem to be concerned by the "significant change to the
> current evolution rules". Well, we actually do not change these rules, they
> keep just the same. All we are proposing is to introduce a *mode* where
> the rules of union compatibility change. This mode is materialized by a
> minimum and thin impact of the existing classes without any change in the
> behavior, all the logic of the new compatibility mode is implemented by new
> classes that must be invoked specifically. But you would better see it in
> the code patch.
>
> Looking forward to reading your feedback and answers.
>
> Regards,
> Yacine.
>
>
> 2016-03-17 19:00 GMT+01:00 Ryan Blue <[email protected]>:
>
>> Hi Yacine,
>>
>> Thanks for the proposal. It sounds interesting, but I want to make sure
>> there's a clear use case for this because it's a significant change to the
>> current evolution rules. Right now we guarantee that a reader will get an
>> error if the data has an unknown union branch rather than getting a
>> default
>> value. I think that makes sense: if the reader is requesting a field, it
>> should get the actual datum for it rather than a default because it
>> doesn't
>> know how to handle it.
>>
>> Could you give us an example use case that requires this new logic?
>>
>> I just want to make sure we can't solve your problem another way. For
>> example, pushing evolution lower in the schema usually does the trick:
>> rather than having ["null", "RecordV1"] => ["null", "RecordV1",
>> "RecordV2"], it is usually better to update the record so that older
>> readers can ignore the new fields.
>>
>> Thanks,
>>
>> rb
>>
>> On Mon, Mar 14, 2016 at 7:30 AM, Yacine Benabderrahmane <
>> [email protected]> wrote:
>>
>> > Hi all,
>> >
>> > In order to provide a solution to the union schema evolution problem,
>> as it
>> > was earlier notified in the thread "add a type to a union
>> > <
>> >
>> http://search-hadoop.com/m/F2svI1IXrQS1bIFgU1/union+evolution&subj=add+a+type+to+a+union
>> > >"
>> > of the user mailing list, we decided, for the needs of the reactive
>> > architecture we have implemented for one of our clients, to implement an
>> > evolution of the compatibility principle of Avro when using Unions. For
>> > reminder, the asked question was about the way to handle the case where
>> a
>> > reader, using an old version of a schema that includes a union, reads
>> some
>> > data written with a new version of the schema where a type has been
>> added
>> > to the union.
>> >
>> > As answered by Martin Kleppman in that thread, one way to handle this
>> kind
>> > of evolution (a new version of the schema adds a new type type in a
>> union)
>> > would be to ensure that all the development streams have integrated the
>> new
>> > schema B before deploying it in the IT schema referential.
>> > However, in big structures involving strongly uncorrelated teams (in the
>> > product life-cycle point of view), this approach appears to be quite
>> > impracticable, causing production stream congestion, blocking behavior
>> > between teams, and a bunch of other
>> > unwanted-counter-agile-/-reactive-phenomena...
>> >
>> > Therefore, we had to implement a new *compatibility* *mode* for the
>> unions,
>> > while taking care to comply with the following rules:
>> >
>> >    1. Clear rules of compatibility are stated and integrated for this
>> >    compatibility mode
>> >    2. The standard Avro behavior must be kept intact
>> >    3. All the evolution implementation must be done without introducing
>> any
>> >    regression (all existing tests of the Avro stack must succeed)
>> >    4. The code impact on Avro stack must be minimized
>> >
>> > Just to give you a very brief overview (as I don't know if this is
>> actually
>> > the place for a full detailed description), the evolution addresses the
>> > typical problem where two development streams use the same schema but in
>> > different versions, in the case described shortly as follows:
>> >
>> >    - The first development stream, called "DevA", uses the version A of
>> a
>> >    schema which integrates a union referencing two types, say "null" and
>> >    "string". The default value is set to null.
>> >    - The second development team, called "DevB", uses the version B,
>> which
>> >    is an evolution of the version A, as it adds a reference to a new
>> type
>> > in
>> >    the former union, say "long" (which makes it "null", string" and
>> "long")
>> >    - When the schema B is deployed on the schema referential (in our
>> case,
>> >    the IO Confluent Schema Registry) subsequently to the version A
>> >       - The stream "DevA" must be able to read with schema A, even if
>> the
>> >       data has been written using the schema B with the type "long" in
>> > the union.
>> >       In the latter case, the read value is the union default value
>> >       - The stream "DevB" must be able to read/write with schema B,
>> even if
>> >       it writes the data using the type "long" in the union
>> >
>> > The evolution that we implemented for this mode includes some rules that
>> > are based on the principles stated in the Avro documentation. It is even
>> > more powerful than showed in the few lines above, as it enables the
>> readers
>> > to get the default value of the union if the schema used for reading
>> does
>> > not contain the type used by the writer in the union. This achieves a
>> new
>> > mode of forward / backward compatibility. This evolution is for now
>> working
>> > perfectly, and should be on production in the few coming weeks. We have
>> > also made an evolution of the IO Confluent Schema Registry stack to
>> support
>> > it, again in a transparent manner (we also intend to contribute to this
>> > stack in a second / parallel step).
>> >
>> > In the objective of contributing to the Avro stack with this new
>> > compatibility mode for unions, I have some questions about the
>> procedure:
>> >
>> >    1. How can I achieve the contribution proposal? Should I directly
>> >    provide a patch in JIRA and dive into the details right there?
>> >    2. The base version of this evolution is 1.7.7, is it eligible to
>> >    contribution evaluation anyway?
>> >
>> > Thanks in advance, looking forward to hearing from you and giving you
>> more
>> > details.
>> >
>> > Kind Regards,
>> > --
>> > *Yacine Benabderrahmane*
>> > Architect
>> > *OCTO Technology*
>> > <http://www.octo.com>
>> > -----------------------------------------------
>> > Tel : +33 6 10 88 25 98
>> > 50 avenue des Champs Elysées
>> > 75008 PARIS
>> > www.octo.com
>> >
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

Re: Avro union compatibility mode enhancement proposal

Reply via email to