Are there any dissenting votes to making a BooleanCoder a standard (portable) coder?
I'm happy to make a PR to implement a BooleanCoder in python (and to add the Java BooleanCoder to the ModelCoderRegistrar) if everyone agrees that this is useful. -chad On Fri, Sep 27, 2019 at 3:32 PM Robert Bradshaw <[email protected]> wrote: > I think boolean is useful to have. What I'm more skeptical of is > adding standard types for variations like UnsignedInteger16, etc. that > don't have natural representations in all languages. > > On Fri, Sep 27, 2019 at 2:46 PM Brian Hulette <[email protected]> wrote: > > > > Some more context from an offline discussion I had with +Robert Bradshaw > a while ago: We both agreed all of the coders listed in BEAM-7996 should be > implemented in Python, but didn't come to a conclusion on whether or not > they should actually be _standard_ coders, versus just being implicitly > standard as part of row coder. > > > > On Fri, Sep 27, 2019 at 2:29 PM Kenneth Knowles <[email protected]> wrote: > >> > >> Yes, noted here: > https://github.com/apache/beam/pull/9188/files#diff-f0d64c2cfc4583bfe2a7e5ee59818ae2R678 > and that links to https://issues.apache.org/jira/browse/BEAM-7996 > >> > >> Kenn > >> > >> On Fri, Sep 27, 2019 at 12:57 PM Reuven Lax <[email protected]> wrote: > >>> > >>> Java has one, implemented as a byte coder. My guess is that nobody has > gotten around to implementing it yet for portability. > >>> > >>> On Fri, Sep 27, 2019 at 12:44 PM Chad Dombrova <[email protected]> > wrote: > >>>> > >>>> Hi all, > >>>> It seems a bit unfortunate that there isn’t a portable way to > serialize a boolean value. > >>>> > >>>> I’m working on porting my external PubsubIO PR over to use the > improved schema-based external transform API in python, but because of this > limitation I can’t use boolean values. For example, this fails: > >>>> > >>>> ReadFromPubsubSchema = typing.NamedTuple( > >>>> 'ReadFromPubsubSchema', > >>>> [ > >>>> ('topic', typing.Optional[unicode]), > >>>> ('subscription', typing.Optional[unicode]), > >>>> ('id_label', typing.Optional[unicode]), > >>>> ('with_attributes', bool), > >>>> ('timestamp_attribute', typing.Optional[unicode]), > >>>> ] > >>>> ) > >>>> > >>>> It fails because coders.get_coder(bool) returns the non-portable > pickle coder. > >>>> > >>>> In the short term I can hack something into the external transform > API to use varint coder for bools, but this kind of hacky approach to > portability won’t work in scenarios where round-tripping is required > without user intervention. In other words, in python it is not uncommon to > test if x is True, in which case the integer 1 would fail this test. All of > that is to say that a BooleanCoder would be a convenient way to ensure the > proper type is used everywhere. > >>>> > >>>> So, I was just wondering why it’s not there? Are there concerns over > whether booleans are universal enough to make part of the portability > standard? > >>>> > >>>> -chad >
