15, 2015 8:38 AM
To: 'Ilove Data'; 'Tathagata Das'
Cc: 'Akhil Das'; 'user'
Subject: RE: Join between DStream and Periodically-Changing-RDD
Then go for the second option I suggested - simply turn (keep turning) your
HDFS file (Batch RDD) into a stream of messages (outside spark streaming
; user
Subject: Re: Join between DStream and Periodically-Changing-RDD
@Akhil Das
Join two Dstreams might not be an option since I want to join stream with
historical data in HDFS folder.
@Tagatha Das @Evo Eftimov
Batch RDD to be reloaded is considerably huge compare to Dstream data since
:* user@spark.apache.org
*Subject:* Re: Join between DStream and Periodically-Changing-RDD
RDD's are immutable, why not join two DStreams?
Not sure, but you can try something like this also:
kvDstream.foreachRDD(rdd = {
val file = ssc.sparkContext.textFile(/sigmoid
RDD's are immutable, why not join two DStreams?
Not sure, but you can try something like this also:
kvDstream.foreachRDD(rdd = {
val file = ssc.sparkContext.textFile(/sigmoid/)
val kvFile = file.map(x = (x.split(,)(0), x))
rdd.join(kvFile)
})
Thanks
Best Regards
On
Data
Cc: user@spark.apache.org
Subject: Re: Join between DStream and Periodically-Changing-RDD
RDD's are immutable, why not join two DStreams?
Not sure, but you can try something like this also:
kvDstream.foreachRDD(rdd = {
val file = ssc.sparkContext.textFile(/sigmoid