Very interesting! Sounds like a sane way for beam future and I'm very happy
it is consistent with the current Java experience: no need to interlace
runners at the end, it makes design, code and user experience way better
than trying to put everything in the direct runner :).

Le 8 févr. 2018 19:20, "María García Herrero" <mari...@google.com> a écrit :

> Amazing improvement, Charles.
> Thanks for the effort!
>
>
> On Thu, Feb 8, 2018 at 10:14 AM Eugene Kirpichov <kirpic...@google.com>
> wrote:
>
>> Sounds awesome, congratulations and thanks for making this happen!
>>
>> On Thu, Feb 8, 2018 at 10:07 AM Raghu Angadi <rang...@google.com> wrote:
>>
>>> This is terrific news! Thanks Charles.
>>>
>>> On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen <c...@google.com> wrote:
>>>
>>>> Local execution of Beam pipelines on the Python DirectRunner currently
>>>> suffers from performance issues, which makes it hard for pipeline authors
>>>> to iterate, especially on medium to large size datasets.  We would like to
>>>> optimize and make this a better experience for Beam users.
>>>>
>>>> The FnApiRunner was written as a way of leveraging the portability
>>>> framework execution code path for local portability development. We've
>>>> found it also provides great speedups in batch execution with no user
>>>> changes required, so we propose to switch to use this runner by default in
>>>> batch pipelines.  For example, WordCount on the Shakespeare dataset with a
>>>> single CPU core now takes 50 seconds to run, compared to 12 minutes before;
>>>> this is a 15x performance improvement that users can get for free,
>>>> with no user pipeline changes.
>>>>
>>>> The JIRA for this change is here (https://issues.apache.org/
>>>> jira/browse/BEAM-3644), and a candidate patch is available here (
>>>> https://github.com/apache/beam/pull/4634). I have been working over
>>>> the last month on making this an automatic drop-in replacement for the
>>>> current DirectRunner when applicable.  Before it becomes the default, you
>>>> can try this runner now by manually specifying apache_beam.runners.
>>>> portability.fn_api_runner.FnApiRunner as the runner.
>>>>
>>>> Even with this change, local Python pipeline execution can only
>>>> effectively use one core because of the Python GIL.  A natural next step to
>>>> further improve performance will be to refactor the FnApiRunner to allow
>>>> for multi-process execution.  This is being tracked here (
>>>> https://issues.apache.org/jira/browse/BEAM-3645).
>>>>
>>>> Best,
>>>>
>>>> Charles
>>>>
>>>
>
> --
>
> Impact is the effect that wouldn’t have happened if you hadn’t done what you
> did.
>
>

Reply via email to