Yep, Python support under active development, e.g. https://github.com/apache/beam/pull/9188
On Wed, Jul 31, 2019 at 9:24 AM jincheng sun <[email protected]> wrote: > Thanks a lot for sharing the link. I take a quick look at the design and > the implementation in Java and think it could address my concern. It seems > that it's still not supported in the Python SDK Harness. Is there any plan > on that? > > Robert Bradshaw <[email protected]> 于2019年7月30日周二 下午12:33写道: > >> On Tue, Jul 30, 2019 at 11:52 AM jincheng sun <[email protected]> >> wrote: >> >>> >>>>> Is it possible to add an interface such as `isSelfContained()` to the >>>>> `Coder`? This interface indicates >>>>> whether the serialized bytes are self contained. If it returns true, >>>>> then there is no need to add a prefixing length. >>>>> In this way, there is no need to introduce an extra protocol, Please >>>>> correct me if I missed something :) >>>>> >>>> >>>> The question is how it is self contained. E.g. DoubleCoder is self >>>> contained because it always uses exactly 8 bytes, but one needs to know the >>>> double coder to leverage this. VarInt coder is self-contained a different >>>> way, as is StringCoder (which does just do prefixing). >>>> >>> >>> Yes, you are right! I think it again that we can not add such interface >>> for the coder, due to runner can not call it. And just one more thought: >>> does it make sense to add a method such as "registerSelfContained >>> Coder(xxx)" or so to let users register the coders which can be processed >>> in the SDK Harness? It's the responsibility of the SDK harness to ensure >>> that the coder is supported. >>> >> >> Basically, a "please don't add length prefixing to this coder, assume >> everyone else can understand it (and errors will ensue if anyone doesn't)" >> at the user level? Seems a bit dangerous. Also, there is not "the >> SDK"--there may be multiple other SDKs in general, and of course runner >> components, some of which may understand the coder in question and some of >> which may not. >> >> I would say that if this becomes a problem, we could look at the pros and >> cons of various remedies, this being one alternative. >> >> >>> >>> >>>> I am hopeful that schemas give us a rich enough way to encode the vast >>>> majority of types that we will want to transmit across language barriers >>>> (possibly with some widening promotions). For high performance one will >>>> want to use formats like arrow rather than one-off coders as well, which >>>> also biases us towards the schema work. The set of StandardCoders is not >>>> closed, and nor is the possibility of figuring out a way to communicate >>>> outside this set for a particular pair of languages, but I think it makes >>>> sense to avoid going that direction unless we have to due to the increased >>>> API surface aread and complexity it imposes on all runners and SDKs. >>>> >>> >>> Great! Could you share some links about the schema work. It seems very >>> interesting and promising. >>> >> >> https://beam.apache.org/contribute/design-documents/#sql--schema and of >> particular relevance https://s.apache.org/beam-schemas >> >> >> >
