Re: Portable Flink Runner plan

Lukasz Cwik Thu, 08 Mar 2018 11:19:20 -0800

I ran some very pessimistic pipelines that were shuffle heavy (Random KV ->
GBK -> IdentityDoFn) and found that the performance overhead was 15% when
executed with Dataflow. This is a while back and there was a lot of
inefficiencies due to coder encode/decode cycles and based upon profiling
information surmised the with some work to reduce the amount of times that
byte[] are copied that this could get reduced to about 8%. I can't say how
this will impact Flink as its a different execution engine but we should
gather data first.


On Thu, Mar 8, 2018 at 11:10 AM, Thomas Weise <[email protected]> wrote:

> Performance, due to the extra gRPC hop.
>
>
> On Thu, Mar 8, 2018 at 11:08 AM, Lukasz Cwik <[email protected]> wrote:
>
>> The goal is to use containers (and similar technologies) in the future.
>> It really hinders pipeline portability between runners if you also have to
>> deal with the dependency conflicts between Flink/Dataflow/Spark/...
>> execution runtimes.
>>
>> What kinds of penalty are you referring to (perf, user complexity, ...)?
>>
>>
>>
>> On Thu, Mar 8, 2018 at 11:02 AM, Thomas Weise <[email protected]> wrote:
>>
>>> I'm curious if pipelines that are exclusively Java will be executed
>>> (when running on Flink or other JVM based runnner) in separate harness
>>> containers also? This would impose a significant penalty compared to the
>>> current execution model. Will this be something the user can control?
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Wed, Mar 7, 2018 at 2:09 PM, Aljoscha Krettek <[email protected]>
>>> wrote:
>>>
>>>> @Axel I assigned https://issues.apache.org/jira/browse/BEAM-2588 to
>>>> you. It might make sense to also grab other issues that you're already
>>>> working on.
>>>>
>>>>
>>>> On 7. Mar 2018, at 21:18, Aljoscha Krettek <[email protected]> wrote:
>>>>
>>>> Cool, so we had the same ideas. I think this indicates that we're not
>>>> completely on the wrong track with this! ;-)
>>>>
>>>> Aljoscha
>>>>
>>>> On 7. Mar 2018, at 21:14, Thomas Weise <[email protected]> wrote:
>>>>
>>>> Ben,
>>>>
>>>> Looks like we hit the send button at the same time. Is the plan the to
>>>> derive the Flink implementation of the various execution services from
>>>> those under org.apache.beam.runners.fnexecution ?
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Mar 7, 2018 at 11:02 AM, Thomas Weise <[email protected]> wrote:
>>>>
>>>>> What's the plan for the endpoints that the Flink operator needs to
>>>>> provide (control/data plane, state, logging)? Is the intention to provide
>>>>> base implementations that can be shared across runners and then implement
>>>>> the Flink specific parts on top of it? Has work started on those?
>>>>>
>>>>> If there are subtasks ready to be taken up I would be interested.
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>>> On Wed, Mar 7, 2018 at 9:35 AM, Ben Sidhom <[email protected]> wrote:
>>>>>
>>>>>> Yes, Axel has started work on such a shim.
>>>>>>
>>>>>> Our plan in the short term is to keep the old FlinkRunner around and
>>>>>> to call into it to process jobs from the job service itself. That way we
>>>>>> can keep the non-portable runner fully-functional while working on
>>>>>> portability. Eventually, I think it makes sense for this to go away, but 
>>>>>> we
>>>>>> haven't given much thought to that. The translator layer will likely stay
>>>>>> the same, and the FlinkRunner bits are a relatively simple wrapper around
>>>>>> translation, so it should be simple enough to factor this out.
>>>>>>
>>>>>> Much of the service code from the Universal Local Runner (ULR) should
>>>>>> be composed and reused with other runner implementations. Thomas and Axel
>>>>>> have more context around that.
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 7, 2018 at 8:47 AM Aljoscha Krettek <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Has anyone started on https://issues.apache.org/j
>>>>>>> ira/browse/BEAM-2588 (FlinkRunner shim for serving Job API). If not
>>>>>>> I would start on that.
>>>>>>>
>>>>>>> My plan is to implement a FlinkJobService that implements 
>>>>>>> JobServiceImplBase,
>>>>>>> similar to ReferenceRunnerJobService. This would have a lot of the
>>>>>>> functionality that FlinkRunner currently has. As a next step, I would 
>>>>>>> add a
>>>>>>> JobServiceRunner that can submit Pipelines to a JobService.
>>>>>>>
>>>>>>> For testing, I would probably add functionality that allows spinning
>>>>>>> up a JobService in-process with the JobServiceRunner. I can imagine for
>>>>>>> testing we could even eventually use something like:
>>>>>>> "--runner=JobServiceRunner", "--streaming=true",
>>>>>>> "--jobService=FlinkRunnerJobService".
>>>>>>>
>>>>>>> Once all of this is done, we only need the python component that
>>>>>>> talks to the JobService to submit a pipeline.
>>>>>>>
>>>>>>> What do you think about the plan?
>>>>>>>
>>>>>>> Btw, I feel that the thing currently called Runner, i.e. FlinkRunner
>>>>>>> will go way in the long run and we will have FlinkJobService,
>>>>>>> SparkJobService and whatnot, what do you think?
>>>>>>>
>>>>>>> Aljoscha
>>>>>>>
>>>>>>>
>>>>>>> On 9. Feb 2018, at 01:31, Ben Sidhom <[email protected]> wrote:
>>>>>>>
>>>>>>> Hey all,
>>>>>>>
>>>>>>> We're working on getting the portability framework plumbed through
>>>>>>> the Flink runner. The first iteration will likely only support batch and
>>>>>>> will be limited in its deployment flexibility, but hopefully it 
>>>>>>> shouldn't
>>>>>>> be too painful to expand this.
>>>>>>>
>>>>>>> We have the start of a tracking doc here: https://s.apache.org/por
>>>>>>> table-beam-on-flink.
>>>>>>>
>>>>>>> We've documented the general deployment strategy here:
>>>>>>> https://s.apache.org/portable-flink-runner-overview.
>>>>>>>
>>>>>>> Feel free to provide comments on the docs or jump in on any of the
>>>>>>> referenced bugs.
>>>>>>>
>>>>>>> --
>>>>>>> -Ben
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Ben
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Portable Flink Runner plan

Reply via email to