Le mer. 9 mai 2018 00:57, Henning Rohde <hero...@google.com> a écrit :
> There are indeed lots of possibilities for interesting docker alternatives > with different tradeoffs and capabilities, but in generally both the runner > as well as the SDK must support them for it to work. As mentioned, docker > (as used in the container contract) is meant as a flexible main option but > not necessarily the only option. I see no problem with certain > pipeline-SDK-runner combinations additionally supporting a specialized > setup. Pipeline can be a factor, because that some transforms might depend > on aspects of the runtime environment -- such as system libraries or > shelling out to a /bin/foo. > > The worker boot code is tied to the current container contract, so > pre-launched workers would presumably not use that code path and are not be > bound by its assumptions. In particular, such a setup might want to invert > who initiates the connection from the SDK worker to the runner. Pipeline > options and global state in the SDK and user functions process might make > it difficult to safely reuse worker processes across pipelines, but also > doable in certain scenarios. > This is not that hard actually and most java env do it. Main concern is 1. Being tight to an impl detail and 2. A bad architecture which doeent embrace the community > Henning > > On Tue, May 8, 2018 at 3:51 PM Thomas Weise <t...@apache.org> wrote: > >> >> >> On Sat, May 5, 2018 at 3:58 PM, Robert Bradshaw <rober...@google.com> >> wrote: >> >>> >>> I would welcome changes to >>> >>> https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730 >>> that would provide alternatives to docker (one of which comes to mind is >>> "I >>> already brought up a worker(s) for you (which could be the same process >>> that handled pipeline construction in testing scenarios), here's how to >>> connect to it/them.") Another option, which would seem to appeal to you >>> in >>> particular, would be "the worker code is linked into the runner's binary, >>> use this process as the worker" (though note even for java-on-java, it >>> can >>> be advantageous to shield the worker and runner code from each others >>> environments, dependencies, and version requirements.) This latter should >>> still likely use the FnApi to talk to itself (either over GRPC on local >>> ports, or possibly better via direct function calls eliminating the RPC >>> overhead altogether--this is how the fast local runner in Python works). >>> There may be runner environments well controlled enough that "start up >>> the >>> workers" could be specified as "run this command line." We should make >>> this >>> environment message extensible to other alternatives than "docker >>> container >>> url," though of course we don't want the set of options to grow too large >>> or we loose the promise of portability unless every runner supports every >>> protocol. >>> >>> >> The pre-launched worker would be an interesting option, which might work >> well for a sidecar deployment. >> >> The current worker boot code though makes the assumption that the runner >> endpoint to phone home to is known when the process is launched. That >> doesn't work so well with a runner that establishes its endpoint >> dynamically. Also, the assumption is baked in that a worker will only serve >> a single pipeline (provisioning API etc.). >> >> Thanks, >> Thomas >> >> >