Json and Protobuf aren't the same thing. Json is for exchanging
unstructured data, Protobuf is for exchanging structured data. The point of
Portability is to define a protocol for exchanging structured messages
across languages. What do you propose using on top of Json to define
message structure?

I'd like to see the generic runner rewritten in Golang so we can eliminate
the significant overhead imposed by the JVM. I would argue that Go is the
best language for low overhead infrastructure, and is already widely used
by projects in this space such as Docker, Kubernetes, InfluxDB. Even SQL
can take advantage of this. For example, several runners could be passed
raw SQL and use their own SQL engines to implement more efficient
transforms then generic Beam can. Users will save significant $$$ on
infrastructure by not having to involve the JVM at all.

Andrew

On Fri, May 11, 2018 at 8:53 AM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

>
>
> Le mer. 9 mai 2018 17:41, Eugene Kirpichov <kirpic...@google.com> a
> écrit :
>
>>
>>
>> On Wed, May 9, 2018 at 1:08 AM Romain Manni-Bucau <rmannibu...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> Le mer. 9 mai 2018 00:57, Henning Rohde <hero...@google.com> a écrit :
>>>
>>>> There are indeed lots of possibilities for interesting docker
>>>> alternatives with different tradeoffs and capabilities, but in generally
>>>> both the runner as well as the SDK must support them for it to work. As
>>>> mentioned, docker (as used in the container contract) is meant as a
>>>> flexible main option but not necessarily the only option. I see no problem
>>>> with certain pipeline-SDK-runner combinations additionally supporting a
>>>> specialized setup. Pipeline can be a factor, because that some transforms
>>>> might depend on aspects of the runtime environment -- such as system
>>>> libraries or shelling out to a /bin/foo.
>>>>
>>>> The worker boot code is tied to the current container contract, so
>>>> pre-launched workers would presumably not use that code path and are not be
>>>> bound by its assumptions. In particular, such a setup might want to invert
>>>> who initiates the connection from the SDK worker to the runner. Pipeline
>>>> options and global state in the SDK and user functions process might make
>>>> it difficult to safely reuse worker processes across pipelines, but also
>>>> doable in certain scenarios.
>>>>
>>>
>>> This is not that hard actually and most java env do it.
>>>
>>> Main concern is 1. Being tight to an impl detail and 2. A bad
>>> architecture which doeent embrace the community
>>>
>> Could you please be more specific? Concerns about Docker dependency have
>> already been repeatedly addressed in this thread.
>>
>
> My concern is that beam is being driven by an implementation instead of a
> clear and scalable architecture.
>
> The best demonstration is the protobuf usage which is far to be the best
> choice for portability these days due to the implication of its stack in
> several languages (nobody wants it in its classpath in java/scala these
> days for instance cause of conflicts or security careness its requires).
> Json is very tooled and trivial to use whatever lib you want to rely on, in
> any language or environment to cite just one alternative.
>
> Being portable (language) is a good goal but IMHO requires:
>
> 1. Runners in each language (otherwise fallback on the jsr223 and you are
> good with just a json facade)
> 2. A generic runner able to route each task to the right native runner
> 3. A way to run in a single runner when relevant (keep in mind most of
> java users dont even want to see python or portable code or api in their
> classpath and runner)
>
>
>
>
>>
>>>
>>>
>>>
>>>> Henning
>>>>
>>>> On Tue, May 8, 2018 at 3:51 PM Thomas Weise <t...@apache.org> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sat, May 5, 2018 at 3:58 PM, Robert Bradshaw <rober...@google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I would welcome changes to
>>>>>>
>>>>>> https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730
>>>>>> that would provide alternatives to docker (one of which comes to mind
>>>>>> is "I
>>>>>> already brought up a worker(s) for you (which could be the same
>>>>>> process
>>>>>> that handled pipeline construction in testing scenarios), here's how
>>>>>> to
>>>>>> connect to it/them.") Another option, which would seem to appeal to
>>>>>> you in
>>>>>> particular, would be "the worker code is linked into the runner's
>>>>>> binary,
>>>>>> use this process as the worker" (though note even for java-on-java,
>>>>>> it can
>>>>>> be advantageous to shield the worker and runner code from each others
>>>>>> environments, dependencies, and version requirements.) This latter
>>>>>> should
>>>>>> still likely use the FnApi to talk to itself (either over GRPC on
>>>>>> local
>>>>>> ports, or possibly better via direct function calls eliminating the
>>>>>> RPC
>>>>>> overhead altogether--this is how the fast local runner in Python
>>>>>> works).
>>>>>> There may be runner environments well controlled enough that "start
>>>>>> up the
>>>>>> workers" could be specified as "run this command line." We should
>>>>>> make this
>>>>>> environment message extensible to other alternatives than "docker
>>>>>> container
>>>>>> url," though of course we don't want the set of options to grow too
>>>>>> large
>>>>>> or we loose the promise of portability unless every runner supports
>>>>>> every
>>>>>> protocol.
>>>>>>
>>>>>>
>>>>> The pre-launched worker would be an interesting option, which might
>>>>> work well for a sidecar deployment.
>>>>>
>>>>> The current worker boot code though makes the assumption that the
>>>>> runner endpoint to phone home to is known when the process is launched.
>>>>> That doesn't work so well with a runner that establishes its endpoint
>>>>> dynamically. Also, the assumption is baked in that a worker will only 
>>>>> serve
>>>>> a single pipeline (provisioning API etc.).
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>>

Reply via email to