On Tue, Jul 30, 2019 at 11:52 AM jincheng sun <sunjincheng...@gmail.com> wrote:
> >>> Is it possible to add an interface such as `isSelfContained()` to the >>> `Coder`? This interface indicates >>> whether the serialized bytes are self contained. If it returns true, >>> then there is no need to add a prefixing length. >>> In this way, there is no need to introduce an extra protocol, Please >>> correct me if I missed something :) >>> >> >> The question is how it is self contained. E.g. DoubleCoder is self >> contained because it always uses exactly 8 bytes, but one needs to know the >> double coder to leverage this. VarInt coder is self-contained a different >> way, as is StringCoder (which does just do prefixing). >> > > Yes, you are right! I think it again that we can not add such interface > for the coder, due to runner can not call it. And just one more thought: > does it make sense to add a method such as "registerSelfContained > Coder(xxx)" or so to let users register the coders which can be processed > in the SDK Harness? It's the responsibility of the SDK harness to ensure > that the coder is supported. > Basically, a "please don't add length prefixing to this coder, assume everyone else can understand it (and errors will ensue if anyone doesn't)" at the user level? Seems a bit dangerous. Also, there is not "the SDK"--there may be multiple other SDKs in general, and of course runner components, some of which may understand the coder in question and some of which may not. I would say that if this becomes a problem, we could look at the pros and cons of various remedies, this being one alternative. > > >> I am hopeful that schemas give us a rich enough way to encode the vast >> majority of types that we will want to transmit across language barriers >> (possibly with some widening promotions). For high performance one will >> want to use formats like arrow rather than one-off coders as well, which >> also biases us towards the schema work. The set of StandardCoders is not >> closed, and nor is the possibility of figuring out a way to communicate >> outside this set for a particular pair of languages, but I think it makes >> sense to avoid going that direction unless we have to due to the increased >> API surface aread and complexity it imposes on all runners and SDKs. >> > > Great! Could you share some links about the schema work. It seems very > interesting and promising. > https://beam.apache.org/contribute/design-documents/#sql--schema and of particular relevance https://s.apache.org/beam-schemas