If you execute the collect step (foreach in 1, possibly reduce in 2) in two
threads in the driver then both of them will be executed in parallel.
Whichever gets submitted to Spark first gets executed first - you can use a
semaphore if you need to ensure the ordering of execution, though I would
assume that the ordering wouldn't matter.

-------
Regards,
Andy

On Sat, Oct 24, 2015 at 10:08 PM, Nipun Arora <nipunarora2...@gmail.com>
wrote:

> I wanted to understand something about the internals of spark streaming
> executions.
>
> If I have a stream X, and in my program I send stream X to function A and
> function B:
>
> 1. In function A, I do a few transform/filter operations etc. on X->Y->Z
> to create stream Z. Now I do a forEach Operation on Z and print the output
> to a file.
>
> 2. Then in function B, I reduce stream X -> X2 (say min value of each
> RDD), and print the output to file
>
> Are both functions being executed for each RDD in parallel? How does it
> work?
>
> Thanks
> Nipun
>
>

Reply via email to