Re: Spark DF CacheTable method. Will it save data to disk?

neil90 Wed, 17 Aug 2016 08:57:06 -0700

>From the spark
documentation(http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence)
yes you can use persist on a dataframe instead of cache. All cache is, is a
shorthand for the default persist storage level "MEMORY_ONLY". If you want
to persist the dataframe to disk you should do
dataframe.persist(StorageLevel.DISK_ONLY).


IMO If reads are expensive against the DB and your afraid of failure why not
just save the data as a parquet on your cluster in hive and read from there?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DF-CacheTable-method-Will-it-save-data-to-disk-tp27533p27551.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark DF CacheTable method. Will it save data to disk?

Reply via email to