Re: adding a split and union to a streaming application cause big performance hit

Ted Yu Thu, 18 Feb 2016 04:55:22 -0800

bq. streamingContext.remember("duration") did not help

Can you give a bit more detail on the above ?
Did you mean the job encountered OOME later on ?


Which Spark release are you using ?

Cheers

On Wed, Feb 17, 2016 at 6:03 PM, ramach1776 <ram...@s1776.com> wrote:

> We have a streaming application containing approximately 12 jobs every
> batch,
> running in streaming mode (4 sec batches). Each  job has several
> transformations and 1 action (output to cassandra) which causes the
> execution of the job (DAG)
>
> For example the first job,
>
> /job 1
> ---> receive Stream A --> map --> filter -> (union with another stream B)
> --> map -->/ groupbykey --> transform --> reducebykey --> map
>
> Likewise we go thro' few more transforms and save to database (job2,
> job3...)
>
> Recently we added a new transformation further downstream wherein we union
> the output of DStream from job 1 (in italics) with output from a new
> transformation(job 5). It appears the whole execution thus far is repeated
> which is redundant (I can see this in execution graph & also performance ->
> processing time).
>
> That is, with this additional transformation (union with a stream processed
> upstream) each batch runs as much as 2.5 times slower compared to runs
> without the union. If I cache the DStream from job 1(italics), performance
> improves substantially but hit out of memory errors within few hours.
>
> What is the recommended way to cache/unpersist in such a scenario? there is
> no dstream level "unpersist"
> setting "spark.streaming.unpersist" to true and
> streamingContext.remember("duration") did not help.
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/adding-a-split-and-union-to-a-streaming-application-cause-big-performance-hit-tp26259.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: adding a split and union to a streaming application cause big performance hit

Reply via email to