Re: (mini-doc) Beam (Flink) portable job templates

enrico canzonieri Sun, 18 Aug 2019 12:01:35 -0700

I found the tracking ticket at BEAM-7966
<https://jira.apache.org/jira/browse/BEAM-7966>


On Sun, Aug 18, 2019 at 11:59 AM enrico canzonieri <ecanzoni...@gmail.com>
wrote:

> Is this alternative still being considered? Creating a portable jar sounds
> like a good solution to re-use the existing runner specific deployment
> mechanism (e.g. Flink k8s operator) and in general simplify the deployment
> story.
>
> On Fri, Aug 9, 2019 at 12:46 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> The expansion service is a separate service. (The flink jar happens to
>> bring both up.) However, there is negotiation to receive/validate the
>> pipeline options.
>>
>> On Fri, Aug 9, 2019 at 1:54 AM Thomas Weise <t...@apache.org> wrote:
>> >
>> > We would also need to consider cross-language pipelines that
>> (currently) assume the interaction with an expansion service at
>> construction time.
>> >
>> > On Thu, Aug 8, 2019, 4:38 PM Kyle Weaver <kcwea...@google.com> wrote:
>> >>
>> >> > It might also be useful to have the option to just output the proto
>> and artifacts, as alternative to the jar file.
>> >>
>> >> Sure, that wouldn't be too big a change if we were to decide to go the
>> SDK route.
>> >>
>> >> > For the Flink entry point we would need to allow for the job server
>> to be used as a library.
>> >>
>> >> We don't need the whole job server, we only need to add a main method
>> to FlinkPipelineRunner [1] as the entry point, which would basically just
>> do the setup described in the doc then call FlinkPipelineRunner::run.
>> >>
>> >> [1]
>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java#L53
>> >>
>> >> Kyle Weaver | Software Engineer | github.com/ibzib |
>> kcwea...@google.com
>> >>
>> >>
>> >> On Thu, Aug 8, 2019 at 4:21 PM Thomas Weise <t...@apache.org> wrote:
>> >>>
>> >>> Hi Kyle,
>> >>>
>> >>> It might also be useful to have the option to just output the proto
>> and artifacts, as alternative to the jar file.
>> >>>
>> >>> For the Flink entry point we would need to allow for the job server
>> to be used as a library. It would probably not be too hard to have the
>> Flink job constructed via the context execution environment, which would
>> require no changes on the Flink side.
>> >>>
>> >>> Thanks,
>> >>> Thomas
>> >>>
>> >>>
>> >>> On Thu, Aug 8, 2019 at 9:52 AM Kyle Weaver <kcwea...@google.com>
>> wrote:
>> >>>>
>> >>>> Re Javaless/serverless solution:
>> >>>> I take it this would probably mean that we would construct the jar
>> directly from the SDK. There are advantages to this: full separation of
>> Python and Java environments, no need for a job server, and likely a
>> simpler implementation, since we'd no longer have to work within the
>> constraints of the existing job server infrastructure. The only downside I
>> can think of is the additional cost of implementing/maintaining jar
>> creation code in each SDK, but that cost may be acceptable if it's simple
>> enough.
>> >>>>
>> >>>> Kyle Weaver | Software Engineer | github.com/ibzib |
>> kcwea...@google.com
>> >>>>
>> >>>>
>> >>>> On Thu, Aug 8, 2019 at 9:31 AM Thomas Weise <t...@apache.org> wrote:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Aug 8, 2019 at 8:29 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>> >>>>>>
>> >>>>>> > Before assembling the jar, the job server runs to create the
>> ingredients. That requires the (matching) Java environment on the Python
>> developers machine.
>> >>>>>>
>> >>>>>> We can run the job server and have it create the jar (and if we
>> keep
>> >>>>>> the job server running we can use it to interact with the running
>> >>>>>> job). However, if the jar layout is simple enough, there's no need
>> to
>> >>>>>> even build it from Java.
>> >>>>>>
>> >>>>>> Taken to the extreme, this is a one-shot, jar-based JobService
>> API. We
>> >>>>>> choose a standard layout of where to put the pipeline description
>> and
>> >>>>>> artifacts, and can "augment" an existing jar (that has a
>> >>>>>> runner-specific main class whose entry point knows how to read this
>> >>>>>> data to kick off a pipeline as if it were a users driver code) into
>> >>>>>> one that has a portable pipeline packaged into it for submission
>> to a
>> >>>>>> cluster.
>> >>>>>
>> >>>>>
>> >>>>> It would be nice if the Python developer doesn't have to run
>> anything Java at all.
>> >>>>>
>> >>>>> As we just discussed offline, this could be accomplished by
>> including the proto that is produced by the SDK into the pre-existing jar.
>> >>>>>
>> >>>>> And if the jar has an entry point that creates the Flink job in the
>> prescribed manner [1], it can be directly submitted to the Flink REST API.
>> That would allow for Java free client.
>> >>>>>
>> >>>>> [1]
>> https://lists.apache.org/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>> >>>>>
>>
>

Re: (mini-doc) Beam (Flink) portable job templates

Reply via email to