Hi, > (2) this means that if code sets two fields of a union, both fields get > sent on-the-wire, and the receiving end deserializes both fields. So it > really isn't a union, and it certianly doesn't "provide a means to > transport exactly one field of a possible set of fieldsprovide a means to > transport exactly one field of a possible set of fields"
and > (7) More generally, anybody who uses unions, and relies on them really > being "structs with all optional fields" will find that they can't rely on > that anymore. I think we have to differentiate between what is on the wire and how the library and generated code handles the situation. Technically it could be possible to have two union fields set, which would indeed be incorrect. Technically I also can craft a serialized struct that misses a "required" field or has two values for the same field ID, data for an undefined field ID, or whatever else comes to mind. All of these are transport-wise corect but *semnatically* invalid wire data w/respect to the protocol level. Therefore the code must be able to handle this. And it does: it skips unknown field IDs, it uses only the last field value and it (usually) complains about missing "required" fields. But all of this is code. The wire data are not affected in any way by this. The wire data do not even contain a flag for "required" because we don't need it, because the knowledge is in the generated code. And the same should - in my opinion - be true for the union handling. And since we only have to change code, not wire formats, there should be no compat issue. > (3) It's easy to fix this -- each set_field clears all the other ones. I > won't worry for the moment about making that efficient (though it probably > doesn't need to be) but it can be made efficient using C unions (yes?) Mmmh. But that's a O(N) operation, even if we only set and clear bits. And not all laguages do actually have unions. Proposal: If we change it in a way where we not have a isset field flags set for unions, but only store the currently active ID, that would be an O(1) operation then. The flag set could still be provided as calculated property for (backward) compatibility. The ID would also have the benefit of being able to write things like switch( myUnion.issetFieldID) { case 1: DoThis(myUnion.one); break; case 2: DoThat(myUnion.two); break; case 3: DoSomethingElse(myUnion.three); break; default: Console.Write("You loose"); break; } instead of if( myUnion.isset_one) { DoThis(myUnion.one); } else if( myUnion.isset_one) { case 2: DoThat(myUnion.two); } else if( myUnion.isset_one) { case 3: DoSomethingElse(myUnion.three); } else { Console.Write("You loose"); break; } which would then also be an O(1) op instead of O(N). Re (4), (5), (6): union t_const_value { 1: optional map<t_const_value, t_const_value> map_val 2: optional list<t_const_value> list_val 3: optional string string_val 4: optional i64 integer_val 5: optional double double_val 6: optional string identifier_val 7: optional t_type_id enum_val > and those fields [6 and 7] are -not- alternatives: they are BOTH SET, or > NEITHER ARE SET. If that is true, the union choice here is simply incorrect. If it works today, it relies on undefined behavuiour and that certainly can't be a good thing as any bug fix can potentially break it. That relates exactly to what I wrote earlier: > If unions are used *as they are supposed to be*, this > should be a nonbreaker. If they arent, it is wrong anyway. So let's fix it. Have fun, JensG -----Ursprüngliche Nachricht----- From: Chet Murthy Sent: Saturday, December 2, 2017 12:28 AM To: dev@thrift.apache.org Subject: Re: Thrift unions don't work as documented Jens, YES! That's why I sent the email, rather than diving into code! *grin* (1) the current implementation in C++ and Ocaml treats a union as a struct with all-optional fields. (2) this means that if code sets two fields of a union, both fields get sent on-the-wire, and the receiving end deserializes both fields. So it really isn't a union, and it certianly doesn't "provide a means to transport exactly one field of a possible set of fieldsprovide a means to transport exactly one field of a possible set of fields" (3) It's easy to fix this -- each set_field clears all the other ones. I won't worry for the moment about making that efficient (though it probably doesn't need to be) but it can be made efficient using C unions (yes?) (4) But this *will* be a breaking change. B/c for instance, in plugin.thrift, t_const_value has fields 6: optional string identifier_val 7: optional t_type_id enum_val and those fields are -not- alternatives: they are BOTH SET, or NEITHER ARE SET. (5) So at a minimum, we'd need to replace that wiith something like struct t_const_identifier_value { 1: required string identifier_val 2: required t_type_id enum_val } and then replace those two fields with 8: optional t_const_identifier_value const_identifier_val (6) This will break any code people have written that uses this plugin interface (b/c types change in non-compatible ways) (7) More generally, anybody who uses unions, and relies on them really being "structs with all optional fields" will find that they can't rely on that anymore. I would note that neither of these is a big problem. But a breaking change is a breaking change, right? Anyway, that's the story. I think you can see, that writing the code is the easy part. If you think this is OK to do, I can probably prepare a PR quickly. --chet-- P.S. Why did I look into this subject? B/c I want to write a new ocaml library, that will use "idiomatic" Ocaml types, and hopefully be much more efficient. But certainly be easier-to-use for Ocaml programmers. Thrift unions should be mapped to Ocaml union types ("constructor datatypes"). And those .... well, they really *are* exactly one field can be set at any time. So I figure, yes I can write my new compiler and library. But it'll be incompatible with the current Thrift contract. So I figured, I should look into whether that contract can be fixed, while I hack my code. On Fri, Dec 1, 2017 at 2:48 PM, Jens Geyer <jensge...@hotmail.com> wrote: > > > If you're not sure, you can discuss it. If you are sure, provide a PR > (ideally one per language) and we look at it. > > From my feelings, I'm not really sure if I fully understand the problem > and > how you plan to fix it. Maybe it is a good idea to sketch the idea behind > and get some comments before you deep-dive in the code. >