Re: spark application running in yarn client mode is slower than in local mode.

2018-04-10 Thread Junfeng Chen
But I still have one question. I find the task number in stage is 3. So where is this 3 from? How to increase the parallelism? Regard, Junfeng Chen On Tue, Apr 10, 2018 at 11:31 AM, Junfeng Chen wrote: > Yeah, I have increase the executor number and executor cores, and it

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen
Yeah, I have increase the executor number and executor cores, and it runs normally now. The hdp spark 2 have only 2 executor and 1 executor cores by default. Regard, Junfeng Chen On Tue, Apr 10, 2018 at 10:19 AM, Saisai Shao wrote: > In yarn mode, only two executor

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Saisai Shao
> > In yarn mode, only two executor are assigned to process the task, since > one executor can process one task only, they need 6 min in total. > This is not true. You should set --executor-cores/--num-executors to increase the task parallelism for executor. To be fair, Spark application should

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen
I found the potential reason. In local mode, all tasks in one stage runs concurrently, while tasks in yarn mode runs in sequence. For example, in one stage, each task costs 3 mins. If in local mode, they will run together, and cost 3 min in total. In yarn mode, only two executor are assigned to

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen
Hi Jorn, I checked the log info of my application: The ResultStage3 (parquet writing) cost a very long time,nearly 300s, where the total processing time of this loop is 6 mins. Regard, Junfeng Chen On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke wrote: > Probably network /

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen
hi, My kafka topic has three partitions. The time cost I mentioned means , each streaming loop cost more time with yarn client mode. For example yarn mode cost 300 seconds to process some data, and local mode just cost 200 seconds to process similar amount of data. Regard, Junfeng Chen On

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Junfeng Chen
I read json string value from kafka, then transform them to df: Dataset df = spark.read().json(stringjavaRDD); Then add some new data to each row: > JavaRDD rowJavaRDD = df.javaRDD().map(...) > StructType type = df.schema().add() > Dataset newdf = spark.createDataFrame(rowJavaRDD,type);

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Gopala Krishna Manchukonda
Hi Junfeng , Is your kafka topic partitioned? Are you referring to the duration or the CPU time spent by the job as being 20% - 50% higher than running in local? Thanks & Regards Gopal > On 09-Apr-2018, at 11:42 AM, Jörn Franke wrote: > > Probably network /

Re: spark application running in yarn client mode is slower than in local mode.

2018-04-09 Thread Jörn Franke
Probably network / shuffling cost? Or broadcast variables? Can you provide more details what you do and some timings? > On 9. Apr 2018, at 07:07, Junfeng Chen wrote: > > I have wrote an spark streaming application reading kafka data and convert > the json data to parquet

spark application running in yarn client mode is slower than in local mode.

2018-04-08 Thread Junfeng Chen
I have wrote an spark streaming application reading kafka data and convert the json data to parquet and save to hdfs. What make me puzzled is, the processing time of app in yarn mode cost 20% to 50% more time than in local mode. My cluster have three nodes with three node managers, and all three