Re: [DISCUSS] Turn `WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition

Robert Bradshaw Wed, 31 Jul 2019 03:17:06 -0700

Yep, Python support under active development, e.g.
https://github.com/apache/beam/pull/9188


On Wed, Jul 31, 2019 at 9:24 AM jincheng sun <[email protected]>
wrote:

> Thanks a lot for sharing the link. I take a quick look at the design and
> the implementation in Java and think it could address my concern. It seems
> that it's still not supported in the Python SDK Harness. Is there any plan
> on that?
>
> Robert Bradshaw <[email protected]> 于2019年7月30日周二 下午12:33写道：
>
>> On Tue, Jul 30, 2019 at 11:52 AM jincheng sun <[email protected]>
>> wrote:
>>
>>>
>>>>> Is it possible to add an interface such as `isSelfContained()` to the
>>>>> `Coder`? This interface indicates
>>>>> whether the serialized bytes are self contained. If it returns true,
>>>>> then there is no need to add a prefixing length.
>>>>> In this way, there is no need to introduce an extra protocol,  Please
>>>>> correct me if I missed something :)
>>>>>
>>>>
>>>> The question is how it is self contained. E.g. DoubleCoder is self
>>>> contained because it always uses exactly 8 bytes, but one needs to know the
>>>> double coder to leverage this. VarInt coder is self-contained a different
>>>> way, as is StringCoder (which does just do prefixing).
>>>>
>>>
>>> Yes, you are right! I think it again that we can not add such interface
>>> for the coder, due to runner can not call it. And just one more thought:
>>> does it make sense to add a method such as "registerSelfContained
>>> Coder(xxx)" or so to let users register the coders which can be processed
>>> in the SDK Harness?  It's the responsibility of the SDK harness to ensure
>>> that the coder is supported.
>>>
>>
>> Basically, a "please don't add length prefixing to this coder, assume
>> everyone else can understand it (and errors will ensue if anyone doesn't)"
>> at the user level? Seems a bit dangerous. Also, there is not "the
>> SDK"--there may be multiple other SDKs in general, and of course runner
>> components, some of which may understand the coder in question and some of
>> which may not.
>>
>> I would say that if this becomes a problem, we could look at the pros and
>> cons of various remedies, this being one alternative.
>>
>>
>>>
>>>
>>>> I am hopeful that schemas give us a rich enough way to encode the vast
>>>> majority of types that we will want to transmit across language barriers
>>>> (possibly with some widening promotions). For high performance one will
>>>> want to use formats like arrow rather than one-off coders as well, which
>>>> also biases us towards the schema work. The set of StandardCoders is not
>>>> closed, and nor is the possibility of figuring out a way to communicate
>>>> outside this set for a particular pair of languages, but I think it makes
>>>> sense to avoid going that direction unless we have to due to the increased
>>>> API surface aread and complexity it imposes on all runners and SDKs.
>>>>
>>>
>>> Great! Could you share some links about the schema work. It seems very
>>> interesting and promising.
>>>
>>
>> https://beam.apache.org/contribute/design-documents/#sql--schema and of
>> particular relevance https://s.apache.org/beam-schemas
>>
>>
>>
>

Re: [DISCUSS] Turn `WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition

Reply via email to