Something here on the Beam side is clearly linear in the input size, as if
there's a bottleneck where were' not able to get any parallelization. Is
the spark variant running in parallel?

On Fri, Sep 28, 2018 at 4:57 AM devinduan(段丁瑞) <devind...@tencent.com>
wrote:

> Hi
>     I have completed my test.
> 1. Spark parameter :
> deploy-mode client
> executor-memory 1g
> num-executors 1
> driver-memory 1g
>
> WordCount:
>
>
>
> 300MB
>
> 600MB
>
> 1.2G
>
> Spark
>
> 1min8s
>
> 1min11s
>
> 1min18s
>
> Beam
>
> 6.4min
>
> 11min
>
> 22min
>
>
>
> Filter:
>
>
>
> 300MB
>
> 600MB
>
> 1.2G
>
> Spark
>
> 1.2min
>
> 1.7min
>
> 2.8min
>
> Beam
>
> 2.7min
>
> 4.1min
>
> 5.7min
>
>
>
> GroupbyKey + sum
>
>
>
> 300MB
>
> 600MB
>
> 1.2G
>
> Spark
>
> 3.6min
>
>
>
>
>
> Beam
>
> Failed, executor oom
>
>
>
>
>
>
>
> Union
>
>
>
> 300MB
>
> 600MB
>
> 1.2G
>
> Spark
>
> 1.7min
>
> 2.6min
>
> 5.1min
>
> Beam
>
> 3.6min
>
> 6.2min
>
> 11min
>
>
>
> 2. Spark parameter :
>
> deploy-mode client
>
> executor-memory 1g
>
> driver-memory 1g
>
> spark.dynamicAllocation.enabled                            true
>

Reply via email to