Re: Spark DF CacheTable method. Will it save data to disk?

2016-08-18 Thread Olivier Girardot
that's another "pipeline" step to add whereas when using persist is just relevant during the lifetime of your jobs and not in HDFS but in the local disk of your executors. On Wed, Aug 17, 2016 5:56 PM, neil90 neilp1...@icloud.com wrote: >From the spark

Re: Spark DF CacheTable method. Will it save data to disk?

2016-08-17 Thread neil90
>From the spark documentation(http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence) yes you can use persist on a dataframe instead of cache. All cache is, is a shorthand for the default persist storage level "MEMORY_ONLY". If you want to persist the dataframe to disk you