There is no mechanism for keeping an RDD up to date with a changing source. 
However you could set up a steam that watches for changes to the directory and 
processes the new files or use the Hive integration in SparkSQL to run Hive 
queries directly. (However, old query results will still grow stale. )

Sent from my rotary phone. 


> On May 31, 2015, at 7:11 AM, Ashish Mukherjee <ashish.mukher...@gmail.com> 
> wrote:
> 
> Hello,
> 
> Since RDDs are created from data from Hive tables or HDFS, how do we ensure 
> they are invalidated when the source data is updated?
> 
> Regards,
> Ashish

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to