Hi,

We intend to run adhoc windowed continuous queries on spark streaming data.
The queries could be registered/deregistered dynamically or can be
submitted through command line. Currently Spark streaming doesn’t allow
adding any new inputs, transformations, and output operations after
starting a StreamingContext. But doing following code changes in
DStream.scala allows me to create an window on DStream even after
StreamingContext has started (in StreamingContextState.ACTIVE).

1) In DStream.validateAtInit()
Allowed adding new inputs, transformations, and output operations after
starting a streaming context
2) In DStream.persist()
Allowed to change storage level of an DStream after streaming context has
started

Ultimately the window api just does slice on the parentRDD and returns
allRDDsInWindow.
We create DataFrames out of these RDDs from this particular
WindowedDStream, and evaluate queries on those DataFrames.

1) Do you see any challenges and consequences with this approach ?
2) Will these on the fly created WindowedDStreams be accounted properly in
Runtime and memory management?
3) What is the reason we do not allow creating new windows with
StreamingContextState.ACTIVE state?
4) Does it make sense to add our own implementation of WindowedDStream in
this case?

- Yogesh

Reply via email to