You cannot guarantee that each key will forever be on the same executor.
That is flawed approach to designing an application if you have to take
ensure fault-tolerance toward executor failures.

On Thu, Jan 7, 2016 at 9:34 AM, Lin Zhao <l...@exabeam.com> wrote:

> I have a need to route the dstream through the streming pipeline by some
> key, such that data with the same key always goes through the same
> executor.
>
> There doesn't seem to be a way to do manual routing with Spark Streaming.
> The closest I can come up with is:
>
> stream.foreachRDD {rdd =>
>   rdd.groupBy(rdd.key).flatMap { line =>…}.map(…).map(…)
> }
>
> Does this do what I expect? How about between batches? Does it guarrantee
> the same key goes to the same executor in all batches?
>
> Thanks,
>
> Lin
>

Reply via email to