[ https://issues.apache.org/jira/browse/THRIFT-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270387#comment-17270387 ]
Juan Cruz Viotti commented on THRIFT-5340: ------------------------------------------ Hi [~jensg], Thanks a lot for the thorough comment. The execute summary and the Thrift's guide section on compatibility ([https://diwakergupta.github.io/thrift-missing-guide/#_versioning_compatibility)] make a lot of sense. My experiments revealed a perhaps more fine-grained analysis on compatibility that I would like to share with you in case you think its useful to document. Also, I found that compatibility is a bit hard to test as a particular scenario may be more subtle than it seems when using certain data types or depending on the surrounding data. Please let me know if something is off and I can investigate further! What I found so far is: * Adding an optional field to the end of a struct is fully compatible * Removing an optional field (while not re-using the identifier) is fully compatible * Adding a required field to the end of a struct is forwards compatible. The new schema encodes the new field and the old schema gracefully omits it as its unknown * Conversely, removing a required field from the end of the struct is backwards compatible. The old schema encodes the new field and the new schema omits it as its unknown * Changing a field from optional to required is forwards compatible. The new schema always sets the field and the old schema can parse it correctly * Conversely, following the same reasoning, changing a field from required to optional is backwards compatible * Adding a choice to an existing union is backwards compatible. The new schema can always parse the data produced by the old schema as the new choices are a superset of the previous choices * Conversely, removing a choice from an existing union is forwards compatible * Changing an enumeration into a 32-bit integer is backwards compatible as the set of valid enumeration constants are a subset of the range of values in the integer * Conversely, turning a 32-bit integer into an enumeration is forwards compatible only. The data produced by the new schema will always become a valid int32 value according to the old schema * Similar to the union case, adding a new enumeration constant is backward compatible and removing an enumeration constant (without re-using it in the future) is forwards compatible * Changing a string field into a "binary" field is backwards compatible. Every UTF-8 string produced by the old schema is of course a valid byte-string from the point of view of the new schema * Conversely, changing a "binary" field into a string field is forwards compatible. It is not backwards compatible as not every byte-array is a valid UTF-8 string Everything else I tried either resulted in an exception or in an incompatible result at least for one case. Do you think the above makes sense? My intention is to provide a more detailed set of operations that can be assumes to be safe, broken down by backwards, forwards, and fully compatibility. Thanks a lot! > Document schema evolution features > ---------------------------------- > > Key: THRIFT-5340 > URL: https://issues.apache.org/jira/browse/THRIFT-5340 > Project: Thrift > Issue Type: Improvement > Components: Documentation > Reporter: Juan Cruz Viotti > Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > I could not find a section in the documentation outlining the schema > evolution/versioning features that Thrift provides. > In case there is none, I volunteer to write the first draft, as I've been > writing a paper involving Apache Thrift as part of my MSc at University of > Oxford, and ran plenty of schema evolution experiments. > Please let me know your thoughts and where would this section fit! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)