combineByKey at ShuffledDStream.scala

Bill Jay Tue, 22 Jul 2014 11:06:41 -0700

Hi all,

I am currently running a Spark Streaming program, which consumes data from
Kakfa and does the group by operation on the data. I try to optimize the
running time of the program because it looks slow to me. It seems the stage
named:


* combineByKey at ShuffledDStream.scala:42 *

always takes the longest running time. And If I open this stage, I only see
two executors on this stage. Does anyone has an idea what this stage does
and how to increase the speed for this stage? Thanks!

Bill

combineByKey at ShuffledDStream.scala

Reply via email to