RDD staleness

2015-05-31 Thread Ashish Mukherjee
Hello,

Since RDDs are created from data from Hive tables or HDFS, how do we ensure
they are invalidated when the source data is updated?

Regards,
Ashish


Re: RDD staleness

2015-05-31 Thread DW @ Gmail
There is no mechanism for keeping an RDD up to date with a changing source. 
However you could set up a steam that watches for changes to the directory and 
processes the new files or use the Hive integration in SparkSQL to run Hive 
queries directly. (However, old query results will still grow stale. )

Sent from my rotary phone. 


> On May 31, 2015, at 7:11 AM, Ashish Mukherjee  
> wrote:
> 
> Hello,
> 
> Since RDDs are created from data from Hive tables or HDFS, how do we ensure 
> they are invalidated when the source data is updated?
> 
> Regards,
> Ashish

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: RDD staleness

2015-05-31 Thread Michael Armbrust
Each time you run a Spark SQL query we will create new RDDs that load the
data and thus you should see the newest results.  There is one caveat:
formats that use the native Data Source API (parquet, ORC (in Spark 1.4),
JSON (in Spark 1.5)) cache file metadata to speed up interactive querying.
To clear the metadata cache run sql("REFRESH TABLE ").

On Sun, May 31, 2015 at 10:46 PM, DW @ Gmail  wrote:

> There is no mechanism for keeping an RDD up to date with a changing
> source. However you could set up a steam that watches for changes to the
> directory and processes the new files or use the Hive integration in
> SparkSQL to run Hive queries directly. (However, old query results will
> still grow stale. )
>
> Sent from my rotary phone.
>
>
> > On May 31, 2015, at 7:11 AM, Ashish Mukherjee <
> ashish.mukher...@gmail.com> wrote:
> >
> > Hello,
> >
> > Since RDDs are created from data from Hive tables or HDFS, how do we
> ensure they are invalidated when the source data is updated?
> >
> > Regards,
> > Ashish
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>