[ 
https://issues.apache.org/jira/browse/THRIFT-5314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242300#comment-17242300
 ] 

Z M edited comment on THRIFT-5314 at 12/2/20, 12:16 PM:
--------------------------------------------------------

[~jensg] [~allengeorge]

The Parquet file that started this thread (in the 
https://issues.apache.org/jira/browse/ARROW-10553 issue) came from a project 
I'm working on.

You refer to Thrift' "soft versioning" but I don't see how this relates to the 
laxed treatment of invalid enum values in the cpp vs rust implementation.

soft versioning, IIUC, relates to the fact that Thrift IDLs don't adhere to any 
strict versioning schema like semantic versioning for example, so no 
handshaking on version numbers is performed, and maintaining backward 
compatibility is more of a common sense and accepted behavior, than a formal 
contract.

Thrift however is used to clearly define a wire protocol, as such one of the 
invariants I'd expect Thrift to uphold is:
 * A binary stream created with a Thrift encoder, from an object which complies 
with the IDL, also complies with the IDL definition
 * an object _successfully_ decoded by Thrift from a binary stream results in 
an object that complies with the IDL definition

WRT to enums, allowing any value as an enum (even if it doesn't correspond to a 
defined value in the Thrift spec) violates this invariant.

The example in the Parquet file in the related issue demonstrates this, a value 
is provided for the ConvertedType enum outside of the enum space, it's an 
invalid value that has no semantics within Parquet. Accepting it (by e.g. the 
cpp implementation) will most likely break something down the line of whatever 
ETL + OLAP pipeline this Parquet goes through.

Failing early (like the Rust implementation does) seems like the right thing to 
do to catch such invalid values in the stream, this is what I'm expecting a 
wire protocol to do.

If the author of an IDL wants to allow a more laxed semantics for a value (say 
ConvertedType in Parquet) they can use an i32 field, and defer any semantic 
interpretation of the value to the application layer that reads this value.

Using an enum is a clear statement that a value has well defined semantics and 
should be restricted to a pre-defined set of values.

Would appreciate your take on this.

Thanks.

 


was (Author: wiredwolf):
[~jensg] [~allengeorge]

The Parquet file that started this thread (in the 
https://issues.apache.org/jira/browse/ARROW-10553 issue) came from a project 
I'm working on.

You refer to Thrift' "soft versioning" but I don't see how this relates to the 
laxed treatment of invalid enum values in the cpp vs rust implementation.

soft versioning, IIUC, relates to the fact that Thrift IDLs don't adhere to any 
strict versioning schema like semantic versioning for example, so no 
handshaking on version numbers is performed, and maintaining backward 
compatibility is more of a common sense and accepted behavior, than a formal 
contract.

Thrift however is used to clearly define a wire protocol, as such one of the 
invariants I'd expect Thrift to uphold is:
 * A binary stream created with a Thrift encoder, from an object which complies 
with the IDL, also complies with the IDL definition
 * an object _successfully_ decoded by Thrift from a binary stream results in 
an object that complies with the IDL definition

WRT to enums, allowing any value as an enum (even if it doesn't correspond to a 
defined value in the Thrift spec) violates this invariant.

The example in the Parquet file in the related issue demonstrates this, a value 
is provided for the ConvertedType enum outside of the enum space, it's an 
invalid value that has no semantics within Parquet. Accepting it (by e.g. the 
cpp implementation) will most likely break something down the line of whatever 
ETL + OLAP pipeline this Parquet goes through.

Failing early (like the Rust implementation does) seems like the right thing to 
do to catch such invalid values in the stream, this is what I'm expecting a 
wire protocol to do.

If the author of an IDL wants to allow a more laxed semantics for a value (say 
ConvertedType in Parquet) they can use an i32 field.

Using an enum is a clear statement that a value has well defined semantics and 
should be restricted to a pre-defined set of values.

Would appreciate your take on this.

Thanks.

 

> Enum forward compatibility
> --------------------------
>
>                 Key: THRIFT-5314
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5314
>             Project: Thrift
>          Issue Type: Bug
>          Components: Rust - Compiler, Rust - Library
>    Affects Versions: 0.13.0
>            Reporter: Remi Dettai
>            Priority: Major
>
> It seems that enums in the Rust implem are not forward compatible. As Thrift 
> enums are mapped 1:1 to Rust enum, if a newer Thrift definition adds a case 
> to an enum, an error will be returned when parsing the message.
> Is this intended? Is there a workaround?
> (We met this problem in the Rust parquet implem: 
> https://issues.apache.org/jira/browse/ARROW-10553)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to