[ https://issues.apache.org/jira/browse/BEAM-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated BEAM-3644: ------------------------------- Summary: Speed up Python DirectRunner execution by using the FnApiRunner when possible (was: Speeding up Python DirectRunner execution by using the FnApiRunner when possible) > Speed up Python DirectRunner execution by using the FnApiRunner when possible > ----------------------------------------------------------------------------- > > Key: BEAM-3644 > URL: https://issues.apache.org/jira/browse/BEAM-3644 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core > Affects Versions: 2.2.0, 2.3.0 > Reporter: Charles Chen > Assignee: Charles Chen > Priority: Major > > Local execution of Beam pipelines on the current Python DirectRunner > currently suffers from performance issues, which makes it hard for pipeline > authors to iterate, especially on medium to large size datasets. We would > like to optimize and make this a better experience for Beam users. > In the past few months, Robert implemented the FnApiRunner as a way of > leveraging the portability framework execution code path for local execution. > We've found great speedups in batch execution, so we propose to switch to use > this runner in batch pipelines. For example, WordCount on the Shakespeare > dataset with a single CPU core now takes 50 seconds to run, compared to 12 > minutes before, a 15x performance improvement that users can get for free, > with no pipeline changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)