Re: (mini-doc) Beam (Flink) portable job templates

Kyle Weaver Thu, 08 Aug 2019 16:38:18 -0700

> It might also be useful to have the option to just output the proto and
artifacts, as alternative to the jar file.


Sure, that wouldn't be too big a change if we were to decide to go the SDK
route.

> For the Flink entry point we would need to allow for the job server to be
used as a library.

We don't need the whole job server, we only need to add a main method to
FlinkPipelineRunner [1] as the entry point, which would basically just do
the setup described in the doc then call FlinkPipelineRunner::run.

[1]
https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java#L53

Kyle Weaver | Software Engineer | github.com/ibzib | [email protected]


On Thu, Aug 8, 2019 at 4:21 PM Thomas Weise <[email protected]> wrote:

> Hi Kyle,
>
> It might also be useful to have the option to just output the proto and
> artifacts, as alternative to the jar file.
>
> For the Flink entry point we would need to allow for the job server to be
> used as a library. It would probably not be too hard to have the Flink job
> constructed via the context execution environment, which would require no
> changes on the Flink side.
>
> Thanks,
> Thomas
>
>
> On Thu, Aug 8, 2019 at 9:52 AM Kyle Weaver <[email protected]> wrote:
>
>> Re Javaless/serverless solution:
>> I take it this would probably mean that we would construct the jar
>> directly from the SDK. There are advantages to this: full separation of
>> Python and Java environments, no need for a job server, and likely a
>> simpler implementation, since we'd no longer have to work within the
>> constraints of the existing job server infrastructure. The only downside I
>> can think of is the additional cost of implementing/maintaining jar
>> creation code in each SDK, but that cost may be acceptable if it's simple
>> enough.
>>
>> Kyle Weaver | Software Engineer | github.com/ibzib | [email protected]
>>
>>
>> On Thu, Aug 8, 2019 at 9:31 AM Thomas Weise <[email protected]> wrote:
>>
>>>
>>>
>>> On Thu, Aug 8, 2019 at 8:29 AM Robert Bradshaw <[email protected]>
>>> wrote:
>>>
>>>> > Before assembling the jar, the job server runs to create the
>>>> ingredients. That requires the (matching) Java environment on the Python
>>>> developers machine.
>>>>
>>>> We can run the job server and have it create the jar (and if we keep
>>>> the job server running we can use it to interact with the running
>>>> job). However, if the jar layout is simple enough, there's no need to
>>>> even build it from Java.
>>>>
>>>> Taken to the extreme, this is a one-shot, jar-based JobService API. We
>>>> choose a standard layout of where to put the pipeline description and
>>>> artifacts, and can "augment" an existing jar (that has a
>>>> runner-specific main class whose entry point knows how to read this
>>>> data to kick off a pipeline as if it were a users driver code) into
>>>> one that has a portable pipeline packaged into it for submission to a
>>>> cluster.
>>>>
>>>
>>> It would be nice if the Python developer doesn't have to run anything
>>> Java at all.
>>>
>>> As we just discussed offline, this could be accomplished by  including
>>> the proto that is produced by the SDK into the pre-existing jar.
>>>
>>> And if the jar has an entry point that creates the Flink job in the
>>> prescribed manner [1], it can be directly submitted to the Flink REST API.
>>> That would allow for Java free client.
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>>>
>>>

Re: (mini-doc) Beam (Flink) portable job templates

Reply via email to