Re: Launching a Portable Pipeline

Reuven Lax Wed, 23 May 2018 15:15:06 -0700

On Wed, May 23, 2018 at 3:09 PM Ankur Goenka <[email protected]> wrote:


> 1. Why JobService is runner specific? Couldn't at least a good part of it
> be reused given that the runner specific parts are mostly in the
> translation? or I am missing other reasons?
>
> Yes, absolutely. A good chunk of it can be reused. We are reusing a few
> components from ULR in Flink runner. Calling JobService runner specific
> gives freedom to runner to have very custom JobService if needed.
>

So you're suggesting that we should publish common JobService components
and recommend that runners use them, but that runners are free to build
something completely custom if they prefer?

>
> 2. What about authentication and authorisation for production runners ?
> Once you can use such service to submit/cancel Pipelines is the first thing
> I can think of abusing.
>
> Authentication and authorization is still an unsolved problem. To the best
> of my knowledge, it is runner specific and any required information should
> be a part of grpc headers.
>
> On Wed, May 23, 2018 at 2:48 PM Ismaël Mejía <[email protected]> wrote:
>
>> Interesting document, two questions:
>>
>> 1. Why JobService is runner specific? Couldn't at least a good part of it
>> be reused given that the runner specific parts are mostly in the
>> translation? or I am missing other reasons?
>>
>> 2. What about authentication and authorisation for production runners ?
>> Once you can use such service to submit/cancel Pipelines is the first
>> thing
>> I can think of abusing.
>> On Tue, May 22, 2018 at 9:40 PM Ankur Goenka <[email protected]> wrote:
>>
>> > Thank you guys for the input.
>>
>> > Here is the summary.
>>
>> > Responsibility of Beam on Job Management
>>
>> > Beam provide a common interface for basic job management operations
>> called JobService. The supported operations can vary between runners.
>>
>>
>> > What is JobService?
>>
>> > JobService is a runner specific component which implements Beams
>> JobService interface defined here.
>>
>>
>> > What is the life cycle of a JobService?
>>
>> > There are 3 scenarios
>>
>> > With ULR, JobService is short lived and runs as long as the ULR runs. (
>> JobService Lifespan ~= Job Lifespan )
>>
>> > With Production runners ( Flink, Dataflow etc), JobService can either be
>> short lived or long lived. The choice is up to the runner.
>>
>> > With Production runners ( Flink, Dataflow etc) without long running
>> JobService, SDK will spin up a local JobService.
>>
>>
>> > JobService state management
>>
>> > The choice of state management is up to JobService implementation. The
>> basic requirement is that JobService should be able to perform all the
>> operations with the returned job handle.
>>
>> > At the very least it can be the job handle for the underlying runner job
>> and JobService will simply proxy actions to the runner using the provided
>> job handle.
>>
>> > A persistent JobService is free to provide a simple string as a
>> JobHandle. In this case, job handle can only be used with the same job
>> service.
>>
>> > A stateless not persistent JobService can provide a opaque blob
>> containing all the relevant information about the job. In this case the
>> job
>> handle can be used with any instance of JobService with the same code.
>>
>>
>> > JobService code distribution and invocation when JobService is short
>> lived
>>
>> > We will give an easy to run solution using docker. Docker will help in
>> both executable distribution and providing platform independent binary.
>>
>> > We will also give an easy setup script with a supporting document for
>> users who do not want to use docker on local machine.
>>
>>
>> > Should Flink JobService start a local cluster for testing?
>>
>> > Flink JobService will be capable of submitting to a remote Flink cluster
>> if an master url is provided else it will execute the pipeline in an
>> inprocess Flink invocation on the same JVM.
>>
>>
>>
>>
>> > On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <[email protected]
>> >
>> wrote:
>>
>> >> Thanks Ankur, I think there's consensus, so it's probably ready to
>> share
>> :)
>>
>> >> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <[email protected]>
>> wrote:
>>
>> >>> Thanks for all the input.
>> >>> I have summarized the discussions at the bottom of the document ( here
>> ).
>> >>> Please feel free to provide comments.
>> >>> Once we agree, I will publish the conclusion on the mailing list.
>>
>> >>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <
>> [email protected]>
>> wrote:
>>
>> >>>> Thanks Ankur, this document clarifies a few points and raises some
>> very important questions. I encourage everybody with a stake in
>> Portability
>> to take a look and chime in.
>>
>> >>>> +Aljoscha Krettek +Thomas Weise +Henning Rohde
>>
>> >>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <[email protected]>
>> wrote:
>>
>> >>>>> Updated link to the document as the previous link was not working
>> for
>> some people.
>>
>>
>> >>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <[email protected]>
>> wrote:
>>
>> >>>>>> Hi,
>>
>> >>>>>> Recent effort on portability has introduced JobService and
>> ArtifactService to the beam stack along with SDK. This has open up a few
>> questions around how we start a pipeline in a portable setup (with
>> JobService).
>> >>>>>> I am trying to document our approach to launching a portable
>> pipeline and take binding decisions based on the discussion.
>> >>>>>> Please review the document and provide your feedback.
>>
>> >>>>>> Thanks,
>> >>>>>> Ankur
>>
>

Re: Launching a Portable Pipeline

Reply via email to