Hi,

> (2) this means that if code sets two fields of a union, both fields get
> sent on-the-wire, and the receiving end deserializes both fields.  So it
> really isn't a union, and it certianly doesn't "provide a means to
> transport exactly one field of a possible set of fieldsprovide a means to
> transport exactly one field of a possible set of fields"

and

> (7) More generally, anybody who uses unions, and relies on them really
> being "structs with all optional fields" will find that they can't rely on
> that anymore.

I think we have to differentiate between what is on the wire and how the 
library and generated code handles the situation.

Technically it could be possible to have two union fields set, which would 
indeed be incorrect. Technically I also can craft a serialized struct that 
misses a "required" field or has two values for the same field ID, data for 
an undefined field ID, or whatever else comes to mind. All of these are 
transport-wise corect but *semnatically* invalid wire data w/respect to the 
protocol level. Therefore the code must be able to handle this. And it does: 
it skips unknown field IDs, it uses only the last field value and it 
(usually) complains about missing "required" fields.

But all of this is code. The wire data are not affected in any way by this. 
The wire data do not even contain a flag for "required" because we don't 
need it, because the knowledge is in the generated code. And the same 
should - in my opinion - be true for the union handling. And since we only 
have to change code, not wire formats, there should be no compat issue.


> (3) It's easy to fix this -- each set_field clears all the other ones.  I
> won't worry for the moment about making that efficient (though it probably
> doesn't need to be) but it can be made efficient using C unions (yes?)

Mmmh. But that's a O(N) operation, even if we only set and clear bits. And 
not all laguages do actually have unions.

Proposal: If we change it in a way where we not have a isset field flags set 
for unions, but only store the currently active ID, that would be an O(1) 
operation then. The flag set could still be provided as calculated property 
for (backward) compatibility. The ID would also have the benefit of being 
able to write things like

    switch( myUnion.issetFieldID) {
        case 1:  DoThis(myUnion.one); break;
        case 2:  DoThat(myUnion.two); break;
        case 3:  DoSomethingElse(myUnion.three); break;
        default: Console.Write("You loose"); break;
    }

instead of


    if( myUnion.isset_one) {
        DoThis(myUnion.one);
    } else if( myUnion.isset_one) {
        case 2:  DoThat(myUnion.two);
    } else if( myUnion.isset_one) {
        case 3:  DoSomethingElse(myUnion.three);
    } else {
        Console.Write("You loose"); break;
    }

which would then also be an O(1) op instead of O(N).


Re (4), (5), (6):

union t_const_value {
  1: optional map<t_const_value, t_const_value> map_val
  2: optional list<t_const_value> list_val
  3: optional string string_val
  4: optional i64 integer_val
  5: optional double double_val
  6: optional string identifier_val
  7: optional t_type_id enum_val

> and those fields [6 and 7] are -not- alternatives: they are BOTH SET, or 
> NEITHER ARE SET.

If that is true, the union choice here is simply incorrect. If it works 
today, it relies on undefined behavuiour and that certainly can't be a good 
thing as any bug fix can potentially break it. That relates exactly to what 
I wrote earlier:

> If unions are used *as they are supposed to be*, this
> should be a nonbreaker. If they arent, it is wrong anyway.

So let's fix it.

Have fun,
JensG


-----Ursprüngliche Nachricht----- 
From: Chet Murthy
Sent: Saturday, December 2, 2017 12:28 AM
To: dev@thrift.apache.org
Subject: Re: Thrift unions don't work as documented

Jens,

YES!  That's why I sent the email, rather than diving into code!  *grin*

(1) the current implementation in C++ and Ocaml treats a union as a struct
with all-optional fields.

(2) this means that if code sets two fields of a union, both fields get
sent on-the-wire, and the receiving end deserializes both fields.  So it
really isn't a union, and it certianly doesn't "provide a means to
transport exactly one field of a possible set of fieldsprovide a means to
transport exactly one field of a possible set of fields"

(3) It's easy to fix this -- each set_field clears all the other ones.  I
won't worry for the moment about making that efficient (though it probably
doesn't need to be) but it can be made efficient using C unions (yes?)

(4) But this *will* be a breaking change.  B/c for instance, in
plugin.thrift, t_const_value has fields

  6: optional string identifier_val
  7: optional t_type_id enum_val

and those fields are -not- alternatives: they are BOTH SET, or NEITHER ARE
SET.

(5) So at a minimum, we'd need to replace that wiith something like

struct t_const_identifier_value {
  1: required string identifier_val
  2: required t_type_id enum_val
}

and then replace those two fields with

  8: optional t_const_identifier_value const_identifier_val

(6) This will break any code people have written that uses this plugin
interface (b/c types change in non-compatible ways)

(7) More generally, anybody who uses unions, and relies on them really
being "structs with all optional fields" will find that they can't rely on
that anymore.

I would note that neither of these is a big problem.  But a breaking change
is a breaking change, right?

Anyway, that's the story.  I think you can see, that writing the code is
the easy part.

If you think this is OK to do, I can probably prepare a PR quickly.

--chet--

P.S. Why did I look into this subject?  B/c I want to write a new ocaml
library, that will use "idiomatic" Ocaml types, and hopefully be much more
efficient.  But certainly be easier-to-use for Ocaml programmers.  Thrift
unions should be mapped to Ocaml union types ("constructor datatypes").
And those .... well, they really *are* exactly one field can be set at any
time.  So I figure, yes I can write my new compiler and library.  But it'll
be incompatible with the current Thrift contract.  So I figured, I should
look into whether that contract can be fixed, while I hack my code.



On Fri, Dec 1, 2017 at 2:48 PM, Jens Geyer <jensge...@hotmail.com> wrote:

>
>
> If you're not sure, you can discuss it. If you are sure, provide a PR
> (ideally one per language) and we look at it.
>
> From my feelings, I'm not really sure if I fully understand the problem 
> and
> how you plan to fix it. Maybe it is a good idea to sketch the idea behind
> and get some comments before you deep-dive in the code.
> 

Reply via email to