As the documentation says, Cache Manager is only invoked when a caching (i.e.
persist) function is called by the user in the code. Therefore, giving that,
as far as I understood, unless cache/persist operations are not explicitly
called, the job's results (including inputs and intermediate ones) will
never be stored to be reused.

I am wondering if there exist any optimization for the query execution plan
that applies any implicit cache mechanism without calling the cache/persist
operation. Or if there is any other mechanism that can implicitly invoke the
cache for any other situation.

In the case that I understood correctly, is there any strong reason why
Catalyst Optimizer does not enforce any cache mechanism for the intermediate
results between jobs?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to