Re: Spark DF CacheTable method. Will it save data to disk?

2016-08-18 Thread Olivier Girardot
that's another "pipeline" step to add whereas when using persist is just
relevant during the lifetime of your jobs and not in HDFS but in the local disk
of your executors.





On Wed, Aug 17, 2016 5:56 PM, neil90 neilp1...@icloud.com wrote:
>From the spark


documentation(http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence)

yes you can use persist on a dataframe instead of cache. All cache is, is a

shorthand for the default persist storage level "MEMORY_ONLY". If you want

to persist the dataframe to disk you should do

dataframe.persist(StorageLevel.DISK_ONLY).




IMO If reads are expensive against the DB and your afraid of failure why not

just save the data as a parquet on your cluster in hive and read from there?










--

View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DF-CacheTable-method-Will-it-save-data-to-disk-tp27533p27551.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.




-

To unsubscribe e-mail: user-unsubscr...@spark.apache.org









Olivier Girardot | AssociƩ
o.girar...@lateral-thoughts.com
+33 6 24 09 17 94

Re: Spark DF CacheTable method. Will it save data to disk?

2016-08-17 Thread neil90
>From the spark
documentation(http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence)
yes you can use persist on a dataframe instead of cache. All cache is, is a
shorthand for the default persist storage level "MEMORY_ONLY". If you want
to persist the dataframe to disk you should do
dataframe.persist(StorageLevel.DISK_ONLY). 

IMO If reads are expensive against the DB and your afraid of failure why not
just save the data as a parquet on your cluster in hive and read from there?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DF-CacheTable-method-Will-it-save-data-to-disk-tp27533p27551.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org