Something here on the Beam side is clearly linear in the input size, as if there's a bottleneck where were' not able to get any parallelization. Is the spark variant running in parallel?
On Fri, Sep 28, 2018 at 4:57 AM devinduan(段丁瑞) <devind...@tencent.com> wrote: > Hi > I have completed my test. > 1. Spark parameter : > deploy-mode client > executor-memory 1g > num-executors 1 > driver-memory 1g > > WordCount: > > > > 300MB > > 600MB > > 1.2G > > Spark > > 1min8s > > 1min11s > > 1min18s > > Beam > > 6.4min > > 11min > > 22min > > > > Filter: > > > > 300MB > > 600MB > > 1.2G > > Spark > > 1.2min > > 1.7min > > 2.8min > > Beam > > 2.7min > > 4.1min > > 5.7min > > > > GroupbyKey + sum > > > > 300MB > > 600MB > > 1.2G > > Spark > > 3.6min > > > > > > Beam > > Failed, executor oom > > > > > > > > Union > > > > 300MB > > 600MB > > 1.2G > > Spark > > 1.7min > > 2.6min > > 5.1min > > Beam > > 3.6min > > 6.2min > > 11min > > > > 2. Spark parameter : > > deploy-mode client > > executor-memory 1g > > driver-memory 1g > > spark.dynamicAllocation.enabled true >