thanks for adding RDD lineage graph.
I could see 18 parallel tasks for HDFS Read was it changed. 


what is the spark job configuration, how many executors and cores per
exeuctor

i would say keep the partitioning multiple of  (no of executors * cores) for
all the RDD's

if you have 3 executors with 3 cores assigned for the job, 9 parallel tasks
are posible
set repartitioning on rdd;s to multiple of 9 

spark.read.parquet().repartition(27)
kafka.createDStream().repartition(27)

coalesce with shuff=false will actually causes problem with upstream
parallelism. 

please test the above scenario and share the findings.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to