LittleCode snippet:

line1: cacheTable(existingRDDTableName)
line2: //some operations which will materialize existingRDD dataset.
line3: existingRDD.union(newRDD).registerAsTable(new_existingRDDTableName)
line4: cacheTable(new_existingRDDTableName)
line5: //some operation that will materialize new _existingRDD.

now, what we expect is in line4 rather than caching both
existingRDDTableName and new_existingRDDTableName, it should cache only
new_existingRDDTableName. but we cannot explicitly uncache
existingRDDTableName because we want the union to use the cached
existingRDDTableName. since being lazy new_existingRDDTableName could be
materialized later and by then we cant lose existingRDDTableName from
cache.

What if keep the same name of the new table

so, cacheTable(existingRDDTableName)
existingRDD.union(newRDD).registerAsTable(existingRDDTableName)
cacheTable(existingRDDTableName) //might not be needed again.

Will our both cases be satisfied, that it uses existingRDDTableName from
cache for union and dont duplicate the data in the cache but somehow,
append to the older cacheTable.

Thanks and Regards,


Archit Thakur.
Sr Software Developer,
Guavus, Inc.

On Sat, Sep 13, 2014 at 12:01 AM, pankaj arora <pankajarora.n...@gmail.com>
wrote:

> I think i should elaborate usecase little more.
>
> So we have UI dashboard whose response time is quite fast as all the data
> is
> cached. Users query data based on time range and also there is always new
> data coming into the system at predefined frequency lets say 1 hour.
>
> As you said i can uncache tables it will basically drop all data from
> memory.
> I cannot afford losing my cache even for short interval. As all queries
> from
> UI will get slow till the time cache loads again. UI response time needs to
> be predictable and shoudl be fast enough so that user does not get
> irritated.
>
> Also i cannot keep two copies of data(till newrdd materialize) into memory
> as it will surpass total available memory in system.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Re-Use-Case-of-mutable-RDD-any-ideas-around-will-help-tp14095p14112.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to