Yes, go ahead and do this (though for your usecase I'm hoping we'll be
able to switch to schemas soon).

On Fri, Sep 27, 2019 at 5:35 PM Chad Dombrova <[email protected]> wrote:
>
> Would BooleanCoder continue to fall into this category?  I was under the 
> impression we might make it a full fledge standard coder with this PR.
>
>
>
> On Fri, Sep 27, 2019 at 5:32 PM Brian Hulette <[email protected]> wrote:
>>
>> +1, thank you!
>>
>> Note In my Row Coder PR I added a new section for "Additional Standard 
>> Coders" - i.e. coders that have a URN, but aren't required for a new 
>> runner/sdk to implement the beam model: 
>> https://github.com/apache/beam/pull/9188/files#diff-f0d64c2cfc4583bfe2a7e5ee59818ae2R646
>>
>> I think this would belong there as well, assuming that is a distinction we 
>> want to make.
>>
>> On Fri, Sep 27, 2019 at 5:22 PM Thomas Weise <[email protected]> wrote:
>>>
>>> +1 for adding the coder
>>>
>>> Please also add a test here: 
>>> https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml
>>>
>>>
>>> On Fri, Sep 27, 2019 at 5:17 PM Chad Dombrova <[email protected]> wrote:
>>>>
>>>> Are there any dissenting votes to making a BooleanCoder a standard 
>>>> (portable) coder?
>>>>
>>>> I'm happy to make a PR to implement a BooleanCoder in python (and to add 
>>>> the Java BooleanCoder to the ModelCoderRegistrar) if everyone agrees that 
>>>> this is useful.
>>>>
>>>> -chad
>>>>
>>>>
>>>> On Fri, Sep 27, 2019 at 3:32 PM Robert Bradshaw <[email protected]> 
>>>> wrote:
>>>>>
>>>>> I think boolean is useful to have. What I'm more skeptical of is
>>>>> adding standard types for variations like UnsignedInteger16, etc. that
>>>>> don't have natural representations in all languages.
>>>>>
>>>>> On Fri, Sep 27, 2019 at 2:46 PM Brian Hulette <[email protected]> wrote:
>>>>> >
>>>>> > Some more context from an offline discussion I had with +Robert 
>>>>> > Bradshaw a while ago: We both agreed all of the coders listed in 
>>>>> > BEAM-7996 should be implemented in Python, but didn't come to a 
>>>>> > conclusion on whether or not they should actually be _standard_ coders, 
>>>>> > versus just being implicitly standard as part of row coder.
>>>>> >
>>>>> > On Fri, Sep 27, 2019 at 2:29 PM Kenneth Knowles <[email protected]> wrote:
>>>>> >>
>>>>> >> Yes, noted here: 
>>>>> >> https://github.com/apache/beam/pull/9188/files#diff-f0d64c2cfc4583bfe2a7e5ee59818ae2R678
>>>>> >>  and that links to https://issues.apache.org/jira/browse/BEAM-7996
>>>>> >>
>>>>> >> Kenn
>>>>> >>
>>>>> >> On Fri, Sep 27, 2019 at 12:57 PM Reuven Lax <[email protected]> wrote:
>>>>> >>>
>>>>> >>> Java has one, implemented as a byte coder. My guess is that nobody 
>>>>> >>> has gotten around to implementing it yet for portability.
>>>>> >>>
>>>>> >>> On Fri, Sep 27, 2019 at 12:44 PM Chad Dombrova <[email protected]> 
>>>>> >>> wrote:
>>>>> >>>>
>>>>> >>>> Hi all,
>>>>> >>>> It seems a bit unfortunate that there isn’t a portable way to 
>>>>> >>>> serialize a boolean value.
>>>>> >>>>
>>>>> >>>> I’m working on porting my external PubsubIO PR over to use the 
>>>>> >>>> improved schema-based external transform API in python, but because 
>>>>> >>>> of this limitation I can’t use boolean values. For example, this 
>>>>> >>>> fails:
>>>>> >>>>
>>>>> >>>> ReadFromPubsubSchema = typing.NamedTuple(
>>>>> >>>>     'ReadFromPubsubSchema',
>>>>> >>>>     [
>>>>> >>>>         ('topic', typing.Optional[unicode]),
>>>>> >>>>         ('subscription', typing.Optional[unicode]),
>>>>> >>>>         ('id_label',  typing.Optional[unicode]),
>>>>> >>>>         ('with_attributes', bool),
>>>>> >>>>         ('timestamp_attribute',  typing.Optional[unicode]),
>>>>> >>>>     ]
>>>>> >>>> )
>>>>> >>>>
>>>>> >>>> It fails because coders.get_coder(bool) returns the non-portable 
>>>>> >>>> pickle coder.
>>>>> >>>>
>>>>> >>>> In the short term I can hack something into the external transform 
>>>>> >>>> API to use varint coder for bools, but this kind of hacky approach 
>>>>> >>>> to portability won’t work in scenarios where round-tripping is 
>>>>> >>>> required without user intervention. In other words, in python it is 
>>>>> >>>> not uncommon to test if x is True, in which case the integer 1 would 
>>>>> >>>> fail this test. All of that is to say that a BooleanCoder would be a 
>>>>> >>>> convenient way to ensure the proper type is used everywhere.
>>>>> >>>>
>>>>> >>>> So, I was just wondering why it’s not there? Are there concerns over 
>>>>> >>>> whether booleans are universal enough to make part of the 
>>>>> >>>> portability standard?
>>>>> >>>>
>>>>> >>>> -chad

Reply via email to