Re: Portable Flink Runner plan

Lukasz Cwik Thu, 08 Mar 2018 11:09:06 -0800

The goal is to use containers (and similar technologies) in the future. It
really hinders pipeline portability between runners if you also have to
deal with the dependency conflicts between Flink/Dataflow/Spark/...
execution runtimes.


What kinds of penalty are you referring to (perf, user complexity, ...)?



On Thu, Mar 8, 2018 at 11:02 AM, Thomas Weise <[email protected]> wrote:

> I'm curious if pipelines that are exclusively Java will be executed (when
> running on Flink or other JVM based runnner) in separate harness containers
> also? This would impose a significant penalty compared to the current
> execution model. Will this be something the user can control?
>
> Thanks,
> Thomas
>
>
> On Wed, Mar 7, 2018 at 2:09 PM, Aljoscha Krettek <[email protected]>
> wrote:
>
>> @Axel I assigned https://issues.apache.org/jira/browse/BEAM-2588 to you.
>> It might make sense to also grab other issues that you're already working
>> on.
>>
>>
>> On 7. Mar 2018, at 21:18, Aljoscha Krettek <[email protected]> wrote:
>>
>> Cool, so we had the same ideas. I think this indicates that we're not
>> completely on the wrong track with this! ;-)
>>
>> Aljoscha
>>
>> On 7. Mar 2018, at 21:14, Thomas Weise <[email protected]> wrote:
>>
>> Ben,
>>
>> Looks like we hit the send button at the same time. Is the plan the to
>> derive the Flink implementation of the various execution services from
>> those under org.apache.beam.runners.fnexecution ?
>>
>> Thanks
>>
>> On Wed, Mar 7, 2018 at 11:02 AM, Thomas Weise <[email protected]> wrote:
>>
>>> What's the plan for the endpoints that the Flink operator needs to
>>> provide (control/data plane, state, logging)? Is the intention to provide
>>> base implementations that can be shared across runners and then implement
>>> the Flink specific parts on top of it? Has work started on those?
>>>
>>> If there are subtasks ready to be taken up I would be interested.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Wed, Mar 7, 2018 at 9:35 AM, Ben Sidhom <[email protected]> wrote:
>>>
>>>> Yes, Axel has started work on such a shim.
>>>>
>>>> Our plan in the short term is to keep the old FlinkRunner around and to
>>>> call into it to process jobs from the job service itself. That way we can
>>>> keep the non-portable runner fully-functional while working on portability.
>>>> Eventually, I think it makes sense for this to go away, but we haven't
>>>> given much thought to that. The translator layer will likely stay the same,
>>>> and the FlinkRunner bits are a relatively simple wrapper around
>>>> translation, so it should be simple enough to factor this out.
>>>>
>>>> Much of the service code from the Universal Local Runner (ULR) should
>>>> be composed and reused with other runner implementations. Thomas and Axel
>>>> have more context around that.
>>>>
>>>>
>>>> On Wed, Mar 7, 2018 at 8:47 AM Aljoscha Krettek <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Has anyone started on https://issues.apache.org/jira/browse/BEAM-2588
>>>>>  (FlinkRunner shim for serving Job API). If not I would start on that.
>>>>>
>>>>> My plan is to implement a FlinkJobService that implements 
>>>>> JobServiceImplBase,
>>>>> similar to ReferenceRunnerJobService. This would have a lot of the
>>>>> functionality that FlinkRunner currently has. As a next step, I would add 
>>>>> a
>>>>> JobServiceRunner that can submit Pipelines to a JobService.
>>>>>
>>>>> For testing, I would probably add functionality that allows spinning
>>>>> up a JobService in-process with the JobServiceRunner. I can imagine for
>>>>> testing we could even eventually use something like:
>>>>> "--runner=JobServiceRunner", "--streaming=true",
>>>>> "--jobService=FlinkRunnerJobService".
>>>>>
>>>>> Once all of this is done, we only need the python component that talks
>>>>> to the JobService to submit a pipeline.
>>>>>
>>>>> What do you think about the plan?
>>>>>
>>>>> Btw, I feel that the thing currently called Runner, i.e. FlinkRunner
>>>>> will go way in the long run and we will have FlinkJobService,
>>>>> SparkJobService and whatnot, what do you think?
>>>>>
>>>>> Aljoscha
>>>>>
>>>>>
>>>>> On 9. Feb 2018, at 01:31, Ben Sidhom <[email protected]> wrote:
>>>>>
>>>>> Hey all,
>>>>>
>>>>> We're working on getting the portability framework plumbed through the
>>>>> Flink runner. The first iteration will likely only support batch and will
>>>>> be limited in its deployment flexibility, but hopefully it shouldn't be 
>>>>> too
>>>>> painful to expand this.
>>>>>
>>>>> We have the start of a tracking doc here: https://s.apache.org/por
>>>>> table-beam-on-flink.
>>>>>
>>>>> We've documented the general deployment strategy here:
>>>>> https://s.apache.org/portable-flink-runner-overview.
>>>>>
>>>>> Feel free to provide comments on the docs or jump in on any of the
>>>>> referenced bugs.
>>>>>
>>>>> --
>>>>> -Ben
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> -Ben
>>>>
>>>
>>>
>>
>>
>>
>

Re: Portable Flink Runner plan

Reply via email to