On Tue, Jul 30, 2019 at 11:52 AM jincheng sun <sunjincheng...@gmail.com>
wrote:

>
>>> Is it possible to add an interface such as `isSelfContained()` to the
>>> `Coder`? This interface indicates
>>> whether the serialized bytes are self contained. If it returns true,
>>> then there is no need to add a prefixing length.
>>> In this way, there is no need to introduce an extra protocol,  Please
>>> correct me if I missed something :)
>>>
>>
>> The question is how it is self contained. E.g. DoubleCoder is self
>> contained because it always uses exactly 8 bytes, but one needs to know the
>> double coder to leverage this. VarInt coder is self-contained a different
>> way, as is StringCoder (which does just do prefixing).
>>
>
> Yes, you are right! I think it again that we can not add such interface
> for the coder, due to runner can not call it. And just one more thought:
> does it make sense to add a method such as "registerSelfContained
> Coder(xxx)" or so to let users register the coders which can be processed
> in the SDK Harness?  It's the responsibility of the SDK harness to ensure
> that the coder is supported.
>

Basically, a "please don't add length prefixing to this coder, assume
everyone else can understand it (and errors will ensue if anyone doesn't)"
at the user level? Seems a bit dangerous. Also, there is not "the
SDK"--there may be multiple other SDKs in general, and of course runner
components, some of which may understand the coder in question and some of
which may not.

I would say that if this becomes a problem, we could look at the pros and
cons of various remedies, this being one alternative.


>
>
>> I am hopeful that schemas give us a rich enough way to encode the vast
>> majority of types that we will want to transmit across language barriers
>> (possibly with some widening promotions). For high performance one will
>> want to use formats like arrow rather than one-off coders as well, which
>> also biases us towards the schema work. The set of StandardCoders is not
>> closed, and nor is the possibility of figuring out a way to communicate
>> outside this set for a particular pair of languages, but I think it makes
>> sense to avoid going that direction unless we have to due to the increased
>> API surface aread and complexity it imposes on all runners and SDKs.
>>
>
> Great! Could you share some links about the schema work. It seems very
> interesting and promising.
>

https://beam.apache.org/contribute/design-documents/#sql--schema and of
particular relevance https://s.apache.org/beam-schemas

Reply via email to