> Even when running portably, Dataflow still has its own implementation of
> PubSubIO that is switched out for Python's "implementation." (It's actually
> built into the same layer that provides the shuffle/group-by-key
> implementation.) However, if you used the external Java PubSubIO it may not
> recognize this and continue to use that implementation even on dataflow.
>

That's great, actually, as we still have some headaches around using the
Java PubSubIO transform: it requires a custom build of the Java Beam API
and SDK container to add missing dependencies and properly deal with data
conversions from python<->java.

Next question: when using Dataflow+Portability can we specify our own
docker container for the Beam Python SDK when using the Docker executor?

We have two reasons to do this:
1) we have some environments that cannot be bootstrapped on top of the
stock Beam SDK image
2) we have a somewhat modified version of the Beam SDK (changes which we
eventually hope to contribute back, but won't be able to for at least a few
months).

If yes, what are the restrictions around custom SDK images?  e.g. must be
the same version of Beam, must be on a registry accessible to Dataflow,
etc...

thanks
-chad

Reply via email to