Have you taken a look at DStream.transformWith( ... ) . That allows you apply arbitrary transformation between RDDs (of the same timestamp) of two different streams.
So you can do something like this. 2s-window-stream.transformWith(1s-window-stream, (rdd1: RDD[...], rdd2: RDD[...]) => { ... // return a new RDD }) And streamingContext.transform() extends it to N DStreams. :) Hope this helps! TD On Wed, Jul 16, 2014 at 10:42 AM, Walrus theCat <walrusthe...@gmail.com> wrote: > hey at least it's something (thanks!) ... not sure what i'm going to do if > i can't find a solution (other than not use spark) as i really need these > capabilities. anyone got anything else? > > > On Wed, Jul 16, 2014 at 10:34 AM, Luis Ángel Vicente Sánchez < > langel.gro...@gmail.com> wrote: > >> hum... maybe consuming all streams at the same time with an actor that >> would act as a new DStream source... but this is just a random idea... I >> don't really know if that would be a good idea or even possible. >> >> >> 2014-07-16 18:30 GMT+01:00 Walrus theCat <walrusthe...@gmail.com>: >> >> Yeah -- I tried the .union operation and it didn't work for that reason. >>> Surely there has to be a way to do this, as I imagine this is a commonly >>> desired goal in streaming applications? >>> >>> >>> On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez < >>> langel.gro...@gmail.com> wrote: >>> >>>> I'm joining several kafka dstreams using the join operation but you >>>> have the limitation that the duration of the batch has to be same,i.e. 1 >>>> second window for all dstreams... so it would not work for you. >>>> >>>> >>>> 2014-07-16 18:08 GMT+01:00 Walrus theCat <walrusthe...@gmail.com>: >>>> >>>> Hi, >>>>> >>>>> My application has multiple dstreams on the same inputstream: >>>>> >>>>> dstream1 // 1 second window >>>>> dstream2 // 2 second window >>>>> dstream3 // 5 minute window >>>>> >>>>> >>>>> I want to write logic that deals with all three windows (e.g. when the >>>>> 1 second window differs from the 2 second window by some delta ...) >>>>> >>>>> I've found some examples online (there's not much out there!), and I >>>>> can only see people transforming a single dstream. In conventional spark, >>>>> we'd do this sort of thing with a cartesian on RDDs. >>>>> >>>>> How can I deal with multiple Dstreams at once? >>>>> >>>>> Thanks >>>>> >>>> >>>> >>> >> >