Re: [Spark structured streaming] Use of (flat)mapgroupswithstate takes long time

2018-01-23 Thread Christiaan Ras
uary 2018 at 00:04 To: Christiaan Ras <christiaan@semmelwise.nl> Cc: user <user@spark.apache.org> Subject: Re: [Spark structured streaming] Use of (flat)mapgroupswithstate takes long time For computing mapGroupsWithState, can you check the following. - How many tasks are being l

Re: [Spark structured streaming] Use of (flat)mapgroupswithstate takes long time

2018-01-22 Thread Tathagata Das
For computing mapGroupsWithState, can you check the following. - How many tasks are being launched in the reduce stage (that is, the stage after the shuffle, that is computing mapGroupsWithState) - How long each task is taking? - How many cores does the cluster have? On Thu, Jan 18, 2018 at

[Spark structured streaming] Use of (flat)mapgroupswithstate takes long time

2018-01-18 Thread chris-sw
Hi, I recently did some experiments with stateful structured streaming by using flatmapgroupswithstate. The streaming application is quit simple: It receives data from Kafka, feed it to the stateful operator (flatmapgroupswithstate) and sinks the output to console. During a test with small