Re: using multiple dstreams together (spark streaming)

2015-09-28 Thread Archit Thakur
@TD: Doesn't transformWith need both of the DStreams to be of same
slideDuration.
[Spark Version: 1.3.1]



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/using-multiple-dstreams-together-spark-streaming-tp9947p24839.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: using multiple dstreams together (spark streaming)

2014-07-17 Thread Walrus theCat
Thanks!


On Wed, Jul 16, 2014 at 6:34 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 Have you taken a look at DStream.transformWith( ... ) . That allows you
 apply arbitrary transformation between RDDs (of the same timestamp) of two
 different streams.

 So you can do something like this.

 2s-window-stream.transformWith(1s-window-stream, (rdd1: RDD[...], rdd2:
 RDD[...]) = {
  ...
   // return a new RDD
 })


 And streamingContext.transform() extends it to N DStreams. :)

 Hope this helps!

 TD




 On Wed, Jul 16, 2014 at 10:42 AM, Walrus theCat walrusthe...@gmail.com
 wrote:

 hey at least it's something (thanks!) ... not sure what i'm going to do
 if i can't find a solution (other than not use spark) as i really need
 these capabilities.  anyone got anything else?


 On Wed, Jul 16, 2014 at 10:34 AM, Luis Ángel Vicente Sánchez 
 langel.gro...@gmail.com wrote:

 hum... maybe consuming all streams at the same time with an actor that
 would act as a new DStream source... but this is just a random idea... I
 don't really know if that would be a good idea or even possible.


 2014-07-16 18:30 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Yeah -- I tried the .union operation and it didn't work for that
 reason.  Surely there has to be a way to do this, as I imagine this is a
 commonly desired goal in streaming applications?


 On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez 
 langel.gro...@gmail.com wrote:

 I'm joining several kafka dstreams using the join operation but you
 have the limitation that the duration of the batch has to be same,i.e. 1
 second window for all dstreams... so it would not work for you.


 2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Hi,

 My application has multiple dstreams on the same inputstream:

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 I want to write logic that deals with all three windows (e.g. when
 the 1 second window differs from the 2 second window by some delta ...)

 I've found some examples online (there's not much out there!), and I
 can only see people transforming a single dstream.  In conventional 
 spark,
 we'd do this sort of thing with a cartesian on RDDs.

 How can I deal with multiple Dstreams at once?

 Thanks









using multiple dstreams together (spark streaming)

2014-07-16 Thread Walrus theCat
Hi,

My application has multiple dstreams on the same inputstream:

dstream1 // 1 second window
dstream2 // 2 second window
dstream3 // 5 minute window


I want to write logic that deals with all three windows (e.g. when the 1
second window differs from the 2 second window by some delta ...)

I've found some examples online (there's not much out there!), and I can
only see people transforming a single dstream.  In conventional spark, we'd
do this sort of thing with a cartesian on RDDs.

How can I deal with multiple Dstreams at once?

Thanks


Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Luis Ángel Vicente Sánchez
I'm joining several kafka dstreams using the join operation but you have
the limitation that the duration of the batch has to be same,i.e. 1 second
window for all dstreams... so it would not work for you.


2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Hi,

 My application has multiple dstreams on the same inputstream:

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 I want to write logic that deals with all three windows (e.g. when the 1
 second window differs from the 2 second window by some delta ...)

 I've found some examples online (there's not much out there!), and I can
 only see people transforming a single dstream.  In conventional spark, we'd
 do this sort of thing with a cartesian on RDDs.

 How can I deal with multiple Dstreams at once?

 Thanks



Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Walrus theCat
Yeah -- I tried the .union operation and it didn't work for that reason.
Surely there has to be a way to do this, as I imagine this is a commonly
desired goal in streaming applications?


On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez 
langel.gro...@gmail.com wrote:

 I'm joining several kafka dstreams using the join operation but you have
 the limitation that the duration of the batch has to be same,i.e. 1 second
 window for all dstreams... so it would not work for you.


 2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Hi,

 My application has multiple dstreams on the same inputstream:

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 I want to write logic that deals with all three windows (e.g. when the 1
 second window differs from the 2 second window by some delta ...)

 I've found some examples online (there's not much out there!), and I can
 only see people transforming a single dstream.  In conventional spark, we'd
 do this sort of thing with a cartesian on RDDs.

 How can I deal with multiple Dstreams at once?

 Thanks





Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Luis Ángel Vicente Sánchez
hum... maybe consuming all streams at the same time with an actor that
would act as a new DStream source... but this is just a random idea... I
don't really know if that would be a good idea or even possible.


2014-07-16 18:30 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Yeah -- I tried the .union operation and it didn't work for that reason.
 Surely there has to be a way to do this, as I imagine this is a commonly
 desired goal in streaming applications?


 On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez 
 langel.gro...@gmail.com wrote:

 I'm joining several kafka dstreams using the join operation but you have
 the limitation that the duration of the batch has to be same,i.e. 1 second
 window for all dstreams... so it would not work for you.


 2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Hi,

 My application has multiple dstreams on the same inputstream:

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 I want to write logic that deals with all three windows (e.g. when the 1
 second window differs from the 2 second window by some delta ...)

 I've found some examples online (there's not much out there!), and I can
 only see people transforming a single dstream.  In conventional spark, we'd
 do this sort of thing with a cartesian on RDDs.

 How can I deal with multiple Dstreams at once?

 Thanks






Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Walrus theCat
Or, if not, is there a way to do this in terms of a single dstream?  Keep
in mind that dstream1, dstream2, and dstream3 have already had
transformations applied.  I tried creating the dstreams by calling .window
on the first one, but that ends up with me having ... 3 dstreams... which
is the same problem.


On Wed, Jul 16, 2014 at 10:30 AM, Walrus theCat walrusthe...@gmail.com
wrote:

 Yeah -- I tried the .union operation and it didn't work for that reason.
 Surely there has to be a way to do this, as I imagine this is a commonly
 desired goal in streaming applications?


 On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez 
 langel.gro...@gmail.com wrote:

 I'm joining several kafka dstreams using the join operation but you have
 the limitation that the duration of the batch has to be same,i.e. 1 second
 window for all dstreams... so it would not work for you.


 2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Hi,

 My application has multiple dstreams on the same inputstream:

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 I want to write logic that deals with all three windows (e.g. when the 1
 second window differs from the 2 second window by some delta ...)

 I've found some examples online (there's not much out there!), and I can
 only see people transforming a single dstream.  In conventional spark, we'd
 do this sort of thing with a cartesian on RDDs.

 How can I deal with multiple Dstreams at once?

 Thanks






Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Walrus theCat
hey at least it's something (thanks!) ... not sure what i'm going to do if
i can't find a solution (other than not use spark) as i really need these
capabilities.  anyone got anything else?


On Wed, Jul 16, 2014 at 10:34 AM, Luis Ángel Vicente Sánchez 
langel.gro...@gmail.com wrote:

 hum... maybe consuming all streams at the same time with an actor that
 would act as a new DStream source... but this is just a random idea... I
 don't really know if that would be a good idea or even possible.


 2014-07-16 18:30 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Yeah -- I tried the .union operation and it didn't work for that reason.
 Surely there has to be a way to do this, as I imagine this is a commonly
 desired goal in streaming applications?


 On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez 
 langel.gro...@gmail.com wrote:

 I'm joining several kafka dstreams using the join operation but you have
 the limitation that the duration of the batch has to be same,i.e. 1 second
 window for all dstreams... so it would not work for you.


 2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Hi,

 My application has multiple dstreams on the same inputstream:

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 I want to write logic that deals with all three windows (e.g. when the
 1 second window differs from the 2 second window by some delta ...)

 I've found some examples online (there's not much out there!), and I
 can only see people transforming a single dstream.  In conventional spark,
 we'd do this sort of thing with a cartesian on RDDs.

 How can I deal with multiple Dstreams at once?

 Thanks







Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Tathagata Das
Have you taken a look at DStream.transformWith( ... ) . That allows you
apply arbitrary transformation between RDDs (of the same timestamp) of two
different streams.

So you can do something like this.

2s-window-stream.transformWith(1s-window-stream, (rdd1: RDD[...], rdd2:
RDD[...]) = {
 ...
  // return a new RDD
})


And streamingContext.transform() extends it to N DStreams. :)

Hope this helps!

TD




On Wed, Jul 16, 2014 at 10:42 AM, Walrus theCat walrusthe...@gmail.com
wrote:

 hey at least it's something (thanks!) ... not sure what i'm going to do if
 i can't find a solution (other than not use spark) as i really need these
 capabilities.  anyone got anything else?


 On Wed, Jul 16, 2014 at 10:34 AM, Luis Ángel Vicente Sánchez 
 langel.gro...@gmail.com wrote:

 hum... maybe consuming all streams at the same time with an actor that
 would act as a new DStream source... but this is just a random idea... I
 don't really know if that would be a good idea or even possible.


 2014-07-16 18:30 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Yeah -- I tried the .union operation and it didn't work for that reason.
 Surely there has to be a way to do this, as I imagine this is a commonly
 desired goal in streaming applications?


 On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez 
 langel.gro...@gmail.com wrote:

 I'm joining several kafka dstreams using the join operation but you
 have the limitation that the duration of the batch has to be same,i.e. 1
 second window for all dstreams... so it would not work for you.


 2014-07-16 18:08 GMT+01:00 Walrus theCat walrusthe...@gmail.com:

 Hi,

 My application has multiple dstreams on the same inputstream:

 dstream1 // 1 second window
 dstream2 // 2 second window
 dstream3 // 5 minute window


 I want to write logic that deals with all three windows (e.g. when the
 1 second window differs from the 2 second window by some delta ...)

 I've found some examples online (there's not much out there!), and I
 can only see people transforming a single dstream.  In conventional spark,
 we'd do this sort of thing with a cartesian on RDDs.

 How can I deal with multiple Dstreams at once?

 Thanks