Re: Portable Flink Runner plan

Thomas Weise Thu, 08 Mar 2018 11:11:12 -0800

Performance, due to the extra gRPC hop.


On Thu, Mar 8, 2018 at 11:08 AM, Lukasz Cwik <[email protected]> wrote:

> The goal is to use containers (and similar technologies) in the future. It
> really hinders pipeline portability between runners if you also have to
> deal with the dependency conflicts between Flink/Dataflow/Spark/...
> execution runtimes.
>
> What kinds of penalty are you referring to (perf, user complexity, ...)?
>
>
>
> On Thu, Mar 8, 2018 at 11:02 AM, Thomas Weise <[email protected]> wrote:
>
>> I'm curious if pipelines that are exclusively Java will be executed (when
>> running on Flink or other JVM based runnner) in separate harness containers
>> also? This would impose a significant penalty compared to the current
>> execution model. Will this be something the user can control?
>>
>> Thanks,
>> Thomas
>>
>>
>> On Wed, Mar 7, 2018 at 2:09 PM, Aljoscha Krettek <[email protected]>
>> wrote:
>>
>>> @Axel I assigned https://issues.apache.org/jira/browse/BEAM-2588 to
>>> you. It might make sense to also grab other issues that you're already
>>> working on.
>>>
>>>
>>> On 7. Mar 2018, at 21:18, Aljoscha Krettek <[email protected]> wrote:
>>>
>>> Cool, so we had the same ideas. I think this indicates that we're not
>>> completely on the wrong track with this! ;-)
>>>
>>> Aljoscha
>>>
>>> On 7. Mar 2018, at 21:14, Thomas Weise <[email protected]> wrote:
>>>
>>> Ben,
>>>
>>> Looks like we hit the send button at the same time. Is the plan the to
>>> derive the Flink implementation of the various execution services from
>>> those under org.apache.beam.runners.fnexecution ?
>>>
>>> Thanks
>>>
>>> On Wed, Mar 7, 2018 at 11:02 AM, Thomas Weise <[email protected]> wrote:
>>>
>>>> What's the plan for the endpoints that the Flink operator needs to
>>>> provide (control/data plane, state, logging)? Is the intention to provide
>>>> base implementations that can be shared across runners and then implement
>>>> the Flink specific parts on top of it? Has work started on those?
>>>>
>>>> If there are subtasks ready to be taken up I would be interested.
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>>
>>>> On Wed, Mar 7, 2018 at 9:35 AM, Ben Sidhom <[email protected]> wrote:
>>>>
>>>>> Yes, Axel has started work on such a shim.
>>>>>
>>>>> Our plan in the short term is to keep the old FlinkRunner around and
>>>>> to call into it to process jobs from the job service itself. That way we
>>>>> can keep the non-portable runner fully-functional while working on
>>>>> portability. Eventually, I think it makes sense for this to go away, but 
>>>>> we
>>>>> haven't given much thought to that. The translator layer will likely stay
>>>>> the same, and the FlinkRunner bits are a relatively simple wrapper around
>>>>> translation, so it should be simple enough to factor this out.
>>>>>
>>>>> Much of the service code from the Universal Local Runner (ULR) should
>>>>> be composed and reused with other runner implementations. Thomas and Axel
>>>>> have more context around that.
>>>>>
>>>>>
>>>>> On Wed, Mar 7, 2018 at 8:47 AM Aljoscha Krettek <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Has anyone started on https://issues.apache.org/jira/browse/BEAM-2588
>>>>>>  (FlinkRunner shim for serving Job API). If not I would start on
>>>>>> that.
>>>>>>
>>>>>> My plan is to implement a FlinkJobService that implements 
>>>>>> JobServiceImplBase,
>>>>>> similar to ReferenceRunnerJobService. This would have a lot of the
>>>>>> functionality that FlinkRunner currently has. As a next step, I would 
>>>>>> add a
>>>>>> JobServiceRunner that can submit Pipelines to a JobService.
>>>>>>
>>>>>> For testing, I would probably add functionality that allows spinning
>>>>>> up a JobService in-process with the JobServiceRunner. I can imagine for
>>>>>> testing we could even eventually use something like:
>>>>>> "--runner=JobServiceRunner", "--streaming=true",
>>>>>> "--jobService=FlinkRunnerJobService".
>>>>>>
>>>>>> Once all of this is done, we only need the python component that
>>>>>> talks to the JobService to submit a pipeline.
>>>>>>
>>>>>> What do you think about the plan?
>>>>>>
>>>>>> Btw, I feel that the thing currently called Runner, i.e. FlinkRunner
>>>>>> will go way in the long run and we will have FlinkJobService,
>>>>>> SparkJobService and whatnot, what do you think?
>>>>>>
>>>>>> Aljoscha
>>>>>>
>>>>>>
>>>>>> On 9. Feb 2018, at 01:31, Ben Sidhom <[email protected]> wrote:
>>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> We're working on getting the portability framework plumbed through
>>>>>> the Flink runner. The first iteration will likely only support batch and
>>>>>> will be limited in its deployment flexibility, but hopefully it shouldn't
>>>>>> be too painful to expand this.
>>>>>>
>>>>>> We have the start of a tracking doc here: https://s.apache.org/por
>>>>>> table-beam-on-flink.
>>>>>>
>>>>>> We've documented the general deployment strategy here:
>>>>>> https://s.apache.org/portable-flink-runner-overview.
>>>>>>
>>>>>> Feel free to provide comments on the docs or jump in on any of the
>>>>>> referenced bugs.
>>>>>>
>>>>>> --
>>>>>> -Ben
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> -Ben
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: Portable Flink Runner plan

Reply via email to