thanks for adding RDD lineage graph. I could see 18 parallel tasks for HDFS Read was it changed.
what is the spark job configuration, how many executors and cores per exeuctor i would say keep the partitioning multiple of (no of executors * cores) for all the RDD's if you have 3 executors with 3 cores assigned for the job, 9 parallel tasks are posible set repartitioning on rdd;s to multiple of 9 spark.read.parquet().repartition(27) kafka.createDStream().repartition(27) coalesce with shuff=false will actually causes problem with upstream parallelism. please test the above scenario and share the findings. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org