>From the spark documentation(http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence) yes you can use persist on a dataframe instead of cache. All cache is, is a shorthand for the default persist storage level "MEMORY_ONLY". If you want to persist the dataframe to disk you should do dataframe.persist(StorageLevel.DISK_ONLY).
IMO If reads are expensive against the DB and your afraid of failure why not just save the data as a parquet on your cluster in hive and read from there? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DF-CacheTable-method-Will-it-save-data-to-disk-tp27533p27551.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org