RDD's are immutable, why not join two DStreams? Not sure, but you can try something like this also:
kvDstream.foreachRDD(rdd => { val file = ssc.sparkContext.textFile("/sigmoid/") val kvFile = file.map(x => (x.split(",")(0), x)) rdd.join(kvFile) }) Thanks Best Regards On Tue, Jun 9, 2015 at 7:37 PM, Ilove Data <data4...@gmail.com> wrote: > Hi, > > I'm trying to join DStream with interval let say 20s, join with RDD loaded > from HDFS folder which is changing periodically, let say new file is coming > to the folder for every 10 minutes. > > How should it be done, considering the HDFS files in the folder is > periodically changing/adding new files? Do RDD automatically detect changes > in HDFS folder as RDD source and automatically reload RDD? > > Thanks! > Rendy >