There is no mechanism for keeping an RDD up to date with a changing source. However you could set up a steam that watches for changes to the directory and processes the new files or use the Hive integration in SparkSQL to run Hive queries directly. (However, old query results will still grow stale. )
Sent from my rotary phone. > On May 31, 2015, at 7:11 AM, Ashish Mukherjee <ashish.mukher...@gmail.com> > wrote: > > Hello, > > Since RDDs are created from data from Hive tables or HDFS, how do we ensure > they are invalidated when the source data is updated? > > Regards, > Ashish --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org