[ 
https://issues.apache.org/jira/browse/BEAM-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Chen updated BEAM-3644:
-------------------------------
    Summary: Speed up Python DirectRunner execution by using the FnApiRunner 
when possible  (was: Speeding up Python DirectRunner execution by using the 
FnApiRunner when possible)

> Speed up Python DirectRunner execution by using the FnApiRunner when possible
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-3644
>                 URL: https://issues.apache.org/jira/browse/BEAM-3644
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Charles Chen
>            Assignee: Charles Chen
>            Priority: Major
>
> Local execution of Beam pipelines on the current Python DirectRunner 
> currently suffers from performance issues, which makes it hard for pipeline 
> authors to iterate, especially on medium to large size datasets. We would 
> like to optimize and make this a better experience for Beam users.
> In the past few months, Robert implemented the FnApiRunner as a way of 
> leveraging the portability framework execution code path for local execution. 
> We've found great speedups in batch execution, so we propose to switch to use 
> this runner in batch pipelines. For example, WordCount on the Shakespeare 
> dataset with a single CPU core now takes 50 seconds to run, compared to 12 
> minutes before, a 15x performance improvement that users can get for free, 
> with no pipeline changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to