Hi Ryan, Just a little up^^. Could you please (or anyone else) give me a little feedback ? Thanks in advance.
Regards, Yacine. 2016-03-21 17:36 GMT+01:00 Yacine Benabderrahmane <[email protected] >: > Hi Ryan, > > Thank you for giving feedback. I will try in the following to provide you > some more details about the addressed problem. > > But before that, just a brief reminder of the context. Avro has been > chosen in this project (and by many other ones for sure) especially for a > very important feature: enabling forward and backward compatibility > management through schema life-cycle. Our development model involves > intensive usage of this feature, and many heavy developments are made in > parallel streams inside feature teams that share the same schema, provided > the evolution of the latter complies with the stated compatibility rules. > This implies that all the entities supported by the Avro schema must > support the two-way compatibility, including unions. However, in the > special case of the union, this two-way compatibility is not well supported > by the current rules. Let me explain you the basement of our point of view, > it remains quite simple. > > The use case is to have, for example, > - a first union version A: > { "name": "Vehicle", > "type": ["null", "Car"] } > - a new version of it B: > { "name": "Vehicle", > "type": ["null", "Car", "Bus"] } > For being forward compatible, an evolution of the union schema must > guarantee that an old reader reading with A can read the data written with > the new schema B. Getting an error just means that the forward > compatibility feature is broken. But this is not actually the case (and > this behavior is not suitable), because the old reader has a correct schema > and this schema has evolved naturally to version B to incorporate a new > Vehicle type. Not knowing this new type must not produce an error, but just > give the reader a default value, which means: "Either the data is not there > or you do not know how to handle it". > > This is thought while keeping in mind that in an object-oriented code > modeling, a union field is seen as a class member with the higher level > generic type ("Any" (scala) or "Object" (java5+)...). Therefore, it is > natural for a modeler / programmer to implement the ability of not getting > the awaited types and using some default value of known type. To give a > more complete specification, the new mode of compatibility has to impose > one rule: the union default value must not change through versions and the > corresponding type must be placed at the top of the types list. This is > much easier to handle by development streams, because it is addressed once > for all in the very beginning of the schema life-cycle, than the fact to > oblige a number of teams, among which some are just not in place anymore, > to update the whole code just because another dev team has deployed a new > version of the union in the schema. > > Now, for being backward compatible, the reader with B must always be able > to read data written with schema A. Even if the type included in the data > is not known, so it gets the default value and not an error. > > I understand that getting an error could make sense when the requested > field is not present. However, this behavior: > > - is very restrictive, meaning: this obliges the old reader to update > its code for integrating the new schema, while he is not managing to do it > for many reasons: development stream of next delivery is not finished, or > not engaged, or not even planned - in the case of old and stable code > - breaks the forward compatibility feature: the older reader is not > able to read the new version of the union without getting an error > - breaks the backward compatibility feature: the new reader is not > able to read an old version containing unknown types of the union without > getting an error > > By the way, what do you exactly mean by "pushing evolution lower" and > "update the record"? Could you please give me an example of the trick you > are talking about? > > Just to be a bit more precise, we are not targeting to use a "trick". This > life-cycle management should be included in a standard so to keep the > software development clean, production safe and compliant with a complex > product road-map. > > Finally, you seem to be concerned by the "significant change to the > current evolution rules". Well, we actually do not change these rules, they > keep just the same. All we are proposing is to introduce a *mode* where > the rules of union compatibility change. This mode is materialized by a > minimum and thin impact of the existing classes without any change in the > behavior, all the logic of the new compatibility mode is implemented by new > classes that must be invoked specifically. But you would better see it in > the code patch. > > Looking forward to reading your feedback and answers. > > Regards, > Yacine. > > > 2016-03-17 19:00 GMT+01:00 Ryan Blue <[email protected]>: > >> Hi Yacine, >> >> Thanks for the proposal. It sounds interesting, but I want to make sure >> there's a clear use case for this because it's a significant change to the >> current evolution rules. Right now we guarantee that a reader will get an >> error if the data has an unknown union branch rather than getting a >> default >> value. I think that makes sense: if the reader is requesting a field, it >> should get the actual datum for it rather than a default because it >> doesn't >> know how to handle it. >> >> Could you give us an example use case that requires this new logic? >> >> I just want to make sure we can't solve your problem another way. For >> example, pushing evolution lower in the schema usually does the trick: >> rather than having ["null", "RecordV1"] => ["null", "RecordV1", >> "RecordV2"], it is usually better to update the record so that older >> readers can ignore the new fields. >> >> Thanks, >> >> rb >> >> On Mon, Mar 14, 2016 at 7:30 AM, Yacine Benabderrahmane < >> [email protected]> wrote: >> >> > Hi all, >> > >> > In order to provide a solution to the union schema evolution problem, >> as it >> > was earlier notified in the thread "add a type to a union >> > < >> > >> http://search-hadoop.com/m/F2svI1IXrQS1bIFgU1/union+evolution&subj=add+a+type+to+a+union >> > >" >> > of the user mailing list, we decided, for the needs of the reactive >> > architecture we have implemented for one of our clients, to implement an >> > evolution of the compatibility principle of Avro when using Unions. For >> > reminder, the asked question was about the way to handle the case where >> a >> > reader, using an old version of a schema that includes a union, reads >> some >> > data written with a new version of the schema where a type has been >> added >> > to the union. >> > >> > As answered by Martin Kleppman in that thread, one way to handle this >> kind >> > of evolution (a new version of the schema adds a new type type in a >> union) >> > would be to ensure that all the development streams have integrated the >> new >> > schema B before deploying it in the IT schema referential. >> > However, in big structures involving strongly uncorrelated teams (in the >> > product life-cycle point of view), this approach appears to be quite >> > impracticable, causing production stream congestion, blocking behavior >> > between teams, and a bunch of other >> > unwanted-counter-agile-/-reactive-phenomena... >> > >> > Therefore, we had to implement a new *compatibility* *mode* for the >> unions, >> > while taking care to comply with the following rules: >> > >> > 1. Clear rules of compatibility are stated and integrated for this >> > compatibility mode >> > 2. The standard Avro behavior must be kept intact >> > 3. All the evolution implementation must be done without introducing >> any >> > regression (all existing tests of the Avro stack must succeed) >> > 4. The code impact on Avro stack must be minimized >> > >> > Just to give you a very brief overview (as I don't know if this is >> actually >> > the place for a full detailed description), the evolution addresses the >> > typical problem where two development streams use the same schema but in >> > different versions, in the case described shortly as follows: >> > >> > - The first development stream, called "DevA", uses the version A of >> a >> > schema which integrates a union referencing two types, say "null" and >> > "string". The default value is set to null. >> > - The second development team, called "DevB", uses the version B, >> which >> > is an evolution of the version A, as it adds a reference to a new >> type >> > in >> > the former union, say "long" (which makes it "null", string" and >> "long") >> > - When the schema B is deployed on the schema referential (in our >> case, >> > the IO Confluent Schema Registry) subsequently to the version A >> > - The stream "DevA" must be able to read with schema A, even if >> the >> > data has been written using the schema B with the type "long" in >> > the union. >> > In the latter case, the read value is the union default value >> > - The stream "DevB" must be able to read/write with schema B, >> even if >> > it writes the data using the type "long" in the union >> > >> > The evolution that we implemented for this mode includes some rules that >> > are based on the principles stated in the Avro documentation. It is even >> > more powerful than showed in the few lines above, as it enables the >> readers >> > to get the default value of the union if the schema used for reading >> does >> > not contain the type used by the writer in the union. This achieves a >> new >> > mode of forward / backward compatibility. This evolution is for now >> working >> > perfectly, and should be on production in the few coming weeks. We have >> > also made an evolution of the IO Confluent Schema Registry stack to >> support >> > it, again in a transparent manner (we also intend to contribute to this >> > stack in a second / parallel step). >> > >> > In the objective of contributing to the Avro stack with this new >> > compatibility mode for unions, I have some questions about the >> procedure: >> > >> > 1. How can I achieve the contribution proposal? Should I directly >> > provide a patch in JIRA and dive into the details right there? >> > 2. The base version of this evolution is 1.7.7, is it eligible to >> > contribution evaluation anyway? >> > >> > Thanks in advance, looking forward to hearing from you and giving you >> more >> > details. >> > >> > Kind Regards, >> > -- >> > *Yacine Benabderrahmane* >> > Architect >> > *OCTO Technology* >> > <http://www.octo.com> >> > ----------------------------------------------- >> > Tel : +33 6 10 88 25 98 >> > 50 avenue des Champs Elysées >> > 75008 PARIS >> > www.octo.com >> > >> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > >
