If there are no concerns, I say let's merge this.

On Fri, Feb 16, 2018 at 9:39 AM, Charles Chen <c...@google.com> wrote:
> I hope those interested have had time to test this out.  I have sent out
> https://github.com/apache/beam/pull/4696 to switch to using this fast runner
> as the default DirectRunner for local execution.  Let me know if there are
> any concerns.
>
> On Tue, Feb 13, 2018 at 12:17 PM Charles Chen <c...@google.com> wrote:
>>
>> This is now checked into master.  You can use it by setting
>> --runner=SwitchingDirectRunner.  Please let us know if you run into any
>> issues.
>>
>>
>> On Thu, Feb 8, 2018 at 10:30 AM Romain Manni-Bucau <rmannibu...@gmail.com>
>> wrote:
>>>
>>> Very interesting! Sounds like a sane way for beam future and I'm very
>>> happy it is consistent with the current Java experience: no need to
>>> interlace runners at the end, it makes design, code and user experience way
>>> better than trying to put everything in the direct runner :).
>>>
>>> Le 8 févr. 2018 19:20, "María García Herrero" <mari...@google.com> a
>>> écrit :
>>>>
>>>> Amazing improvement, Charles.
>>>> Thanks for the effort!
>>>>
>>>>
>>>> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <kirpic...@google.com>
>>>> wrote:
>>>>>
>>>>> Sounds awesome, congratulations and thanks for making this happen!
>>>>>
>>>>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <rang...@google.com>
>>>>> wrote:
>>>>>>
>>>>>> This is terrific news! Thanks Charles.
>>>>>>
>>>>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <c...@google.com> wrote:
>>>>>>>
>>>>>>> Local execution of Beam pipelines on the Python DirectRunner
>>>>>>> currently suffers from performance issues, which makes it hard for 
>>>>>>> pipeline
>>>>>>> authors to iterate, especially on medium to large size datasets.  We 
>>>>>>> would
>>>>>>> like to optimize and make this a better experience for Beam users.
>>>>>>>
>>>>>>>
>>>>>>> The FnApiRunner was written as a way of leveraging the portability
>>>>>>> framework execution code path for local portability development. We've 
>>>>>>> found
>>>>>>> it also provides great speedups in batch execution with no user changes
>>>>>>> required, so we propose to switch to use this runner by default in batch
>>>>>>> pipelines.  For example, WordCount on the Shakespeare dataset with a 
>>>>>>> single
>>>>>>> CPU core now takes 50 seconds to run, compared to 12 minutes before; 
>>>>>>> this is
>>>>>>> a 15x performance improvement that users can get for free, with no user
>>>>>>> pipeline changes.
>>>>>>>
>>>>>>>
>>>>>>> The JIRA for this change is here
>>>>>>> (https://issues.apache.org/jira/browse/BEAM-3644), and a candidate 
>>>>>>> patch is
>>>>>>> available here (https://github.com/apache/beam/pull/4634). I have been
>>>>>>> working over the last month on making this an automatic drop-in 
>>>>>>> replacement
>>>>>>> for the current DirectRunner when applicable.  Before it becomes the
>>>>>>> default, you can try this runner now by manually specifying
>>>>>>> apache_beam.runners.portability.fn_api_runner.FnApiRunner as the runner.
>>>>>>>
>>>>>>>
>>>>>>> Even with this change, local Python pipeline execution can only
>>>>>>> effectively use one core because of the Python GIL.  A natural next 
>>>>>>> step to
>>>>>>> further improve performance will be to refactor the FnApiRunner to 
>>>>>>> allow for
>>>>>>> multi-process execution.  This is being tracked here
>>>>>>> (https://issues.apache.org/jira/browse/BEAM-3645).
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Charles
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Impact is the effect that wouldn’t have happened if you hadn’t done what
>>>> you did.
>>>>
>>>>
>

Reply via email to