subject:"RDD staleness"

RDD staleness

2015-05-31 Thread Ashish Mukherjee

Hello,

Since RDDs are created from data from Hive tables or HDFS, how do we ensure
they are invalidated when the source data is updated?

Regards,
Ashish

Re: RDD staleness

2015-05-31 Thread DW @ Gmail

There is no mechanism for keeping an RDD up to date with a changing source. 
However you could set up a steam that watches for changes to the directory and 
processes the new files or use the Hive integration in SparkSQL to run Hive 
queries directly. (However, old query results will still grow stale. )

Sent from my rotary phone. 

> On May 31, 2015, at 7:11 AM, Ashish Mukherjee  
> wrote:
> 
> Hello,
> 
> Since RDDs are created from data from Hive tables or HDFS, how do we ensure 
> they are invalidated when the source data is updated?
> 
> Regards,
> Ashish

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: RDD staleness

2015-05-31 Thread Michael Armbrust

Each time you run a Spark SQL query we will create new RDDs that load the
data and thus you should see the newest results.  There is one caveat:
formats that use the native Data Source API (parquet, ORC (in Spark 1.4),
JSON (in Spark 1.5)) cache file metadata to speed up interactive querying.
To clear the metadata cache run sql("REFRESH TABLE ").

On Sun, May 31, 2015 at 10:46 PM, DW @ Gmail  wrote:

> There is no mechanism for keeping an RDD up to date with a changing
> source. However you could set up a steam that watches for changes to the
> directory and processes the new files or use the Hive integration in
> SparkSQL to run Hive queries directly. (However, old query results will
> still grow stale. )
>
> Sent from my rotary phone.
>
>
> > On May 31, 2015, at 7:11 AM, Ashish Mukherjee <
> ashish.mukher...@gmail.com> wrote:
> >
> > Hello,
> >
> > Since RDDs are created from data from Hive tables or HDFS, how do we
> ensure they are invalidated when the source data is updated?
> >
> > Regards,
> > Ashish
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

RDD staleness

Re: RDD staleness

Re: RDD staleness

3 matches

Site Navigation

Mail list logo

Footer information