[ https://issues.apache.org/jira/browse/BEAM-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ahmet Altay updated BEAM-2208: ------------------------------ Summary: Python SDK wordcount on cloud Dataflow runner is slow (was: Apache Beam Python SDK is atleast 5 times slower) > Python SDK wordcount on cloud Dataflow runner is slow > ----------------------------------------------------- > > Key: BEAM-2208 > URL: https://issues.apache.org/jira/browse/BEAM-2208 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-py > Affects Versions: 0.6.0 > Reporter: Anant Bhandarkar > Assignee: Ahmet Altay > Priority: Critical > > I have been trying to run the Beam Word count example with a 2GB file. > When I run the Java Example for word count of this csv file the job gets > completed in 7.15secs Mins. > Job ID > 2017-04-18_23_57_02-2832613177376293063 > But word count example with same file using Python SDK takes 28 to 35mins > 2017-04-20_04_48_27-8924552896141769408 > SDK version > Apache Beam SDK for Python 0.6.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)