Its important to note that running multiple streaming queries, as of today,
would read the input data that many number of time. So there is a trade off
between the two approaches.
So even though scenario 1 wont get great catalyst optimization, it may be
more efficient overall in terms of resource
This is not easy to say without testing. It depends on type of computation etc.
it also depends on the Spark version. Generally vectorization / SIMD could be
much faster if it is applied by Spark / the JVM in scenario 2.
> On 9. Aug 2017, at 07:05, Raghavendra Pandey