RDD's are immutable, why not join two DStreams?

Not sure, but you can try something like this also:

kvDstream.foreachRDD(rdd => {

      val file = ssc.sparkContext.textFile("/sigmoid/")
      val kvFile = file.map(x => (x.split(",")(0), x))

      rdd.join(kvFile)


    })


Thanks
Best Regards

On Tue, Jun 9, 2015 at 7:37 PM, Ilove Data <data4...@gmail.com> wrote:

> Hi,
>
> I'm trying to join DStream with interval let say 20s, join with RDD loaded
> from HDFS folder which is changing periodically, let say new file is coming
> to the folder for every 10 minutes.
>
> How should it be done, considering the HDFS files in the folder is
> periodically changing/adding new files? Do RDD automatically detect changes
> in HDFS folder as RDD source and automatically reload RDD?
>
> Thanks!
> Rendy
>

Reply via email to