Hi, did you compare the stages in the Spark UI in order to identify which stage is taking time ?
You use spark-submit in both cases for the bootstrapping ? I will do a test here as well. Regards JB On 19/09/2018 05:34, devinduan(段丁瑞) wrote: > Hi, > Thanks for you reply. > Our team plan to use Beam instead of Spark, So I'm testing the > performance of Beam API. > I'm coding some example through Spark API and Beam API , like > "WordCount" , "Join", "OrderBy", "Union" ... > I use the same Resources and configuration to run these Job. > Tim said I should remove "withNumShards(1)" and > set spark.default.parallelism=32. I did it and tried again, but Beam job > still running very slowly. > Here is My Beam code and Spark code: > Beam "WordCount": > > Spark "WordCount": > > I will try the other example later. > > Regards > devin > > > *From:* Jean-Baptiste Onofré <mailto:j...@nanthrax.net> > *Date:* 2018-09-18 22:43 > *To:* dev@beam.apache.org <mailto:dev@beam.apache.org> > *Subject:* Re: How to optimize the performance of Beam on > Spark(Internet mail) > > Hi, > > The first huge difference is the fact that the spark runner still uses > RDD whereas directly using spark, you are using dataset. A bunch of > optimization in spark are related to dataset. > > I started a large refactoring of the spark runner to leverage Spark 2.x > (and dataset). > It's not yet ready as it includes other improvements (the portability > layer with Job API, a first check of state API, ...). > > Anyway, by Spark wordcount, you mean the one included in the spark > distribution ? > > Regards > JB > > On 18/09/2018 08:39, devinduan(段丁瑞) wrote: > > Hi, > > I'm testing Beam on Spark. > > I use spark example code WordCount processing 1G data file, cost 1 > > minutes. > > However, I use Beam example code WordCount processing the same > file, > > cost 30minutes. > > My Spark parameter is : --deploy-mode client > --executor-memory 1g > > --num-executors 1 --driver-memory 1g > > My Spark version is 2.3.1, Beam version is 2.5 > > Is there any optimization method? > > Thank you. > > > > > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com