Hi All,

Beam PTransforms are currently primarily identified as operations in a
pipeline that perform specific tasks. PTransform implementations were
traditionally linked to specific Beam SDKs.

With the advent of portability framework, multi-language pipelines
<https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines>,
and expansion services that can be used to build/expand and discover
transforms, we have an opportunity to make this more general and
re-introduce Beam PTransforms as computation units that can serve any
use-case that needs to discover or use Beam transforms. For example, any
Beam SDK that runs a pipeline using a portable Beam runner should be able
to use a transform offered through an expansion service irrespective of the
implementation SDK of the transform or the pipeline.

I believe we can make such use-cases much easier to manage by introducing a
user-deployable service that encapsulates existing Beam expansion services
in the form of a Kubernetes cluster. The service will offer a single gRPC
endpoint and will include Beam expansion services developed in different
languages. Any Beam pipeline, irrespective of the pipeline SDK, should be
able to use any transform offered by the service.

This will also offer a way to make multi-language pipeline execution, which
currently relies on locally downloaded large dependencies and locally
started expansion service processes, more robust.

I have written a proposal for implementing such a service and it's
available at https://s.apache.org/beam-transform-service.

Please take a look and let me know if you have any comments or questions.

Thanks,
Cham

Reply via email to