There are known performance issues with Beam on Spark that are being worked on, e.g. https://issues.apache.org/jira/browse/BEAM-5036 . It's possible you're hitting something different, but would be worth investigating. See also https://lists.apache.org/list.html?dev@beam.apache.org:lte=1M:Performance%20of%20write
On Tue, Sep 18, 2018 at 8:39 AM devinduan(段丁瑞) <devind...@tencent.com> wrote: > Hi, > I'm testing Beam on Spark. > I use spark example code WordCount processing 1G data file, cost 1 > minutes. > However, I use Beam example code WordCount processing the same file, > cost 30minutes. > My Spark parameter is : --deploy-mode client --executor-memory 1g > --num-executors 1 --driver-memory 1g > My Spark version is 2.3.1, Beam version is 2.5 > Is there any optimization method? > Thank you. > > >