RE: Join between DStream and Periodically-Changing-RDD

2015-06-15 Thread Evo Eftimov
15, 2015 8:38 AM To: 'Ilove Data'; 'Tathagata Das' Cc: 'Akhil Das'; 'user' Subject: RE: Join between DStream and Periodically-Changing-RDD Then go for the second option I suggested - simply turn (keep turning) your HDFS file (Batch RDD) into a stream of messages (outside spark streaming

RE: Join between DStream and Periodically-Changing-RDD

2015-06-15 Thread Evo Eftimov
; user Subject: Re: Join between DStream and Periodically-Changing-RDD @Akhil Das Join two Dstreams might not be an option since I want to join stream with historical data in HDFS folder. @Tagatha Das @Evo Eftimov Batch RDD to be reloaded is considerably huge compare to Dstream data since

Re: Join between DStream and Periodically-Changing-RDD

2015-06-11 Thread Tathagata Das
:* user@spark.apache.org *Subject:* Re: Join between DStream and Periodically-Changing-RDD RDD's are immutable, why not join two DStreams? Not sure, but you can try something like this also: kvDstream.foreachRDD(rdd = { val file = ssc.sparkContext.textFile(/sigmoid

Re: Join between DStream and Periodically-Changing-RDD

2015-06-10 Thread Akhil Das
RDD's are immutable, why not join two DStreams? Not sure, but you can try something like this also: kvDstream.foreachRDD(rdd = { val file = ssc.sparkContext.textFile(/sigmoid/) val kvFile = file.map(x = (x.split(,)(0), x)) rdd.join(kvFile) }) Thanks Best Regards On

RE: Join between DStream and Periodically-Changing-RDD

2015-06-10 Thread Evo Eftimov
Data Cc: user@spark.apache.org Subject: Re: Join between DStream and Periodically-Changing-RDD RDD's are immutable, why not join two DStreams? Not sure, but you can try something like this also: kvDstream.foreachRDD(rdd = { val file = ssc.sparkContext.textFile(/sigmoid