Re: Questions about caching

2019-01-01 Thread Gourav Sengupta
Hi Andrew, If you use Spark UI then all your questions are already answered there let me know if you need any help to browse the UI to look at the contents that are cached. Regards, Gourav On Tue, 11 Dec 2018, 17:13 Andrew Melo Greetings, Spark Aficionados- > > I'm working on a project to

Re: Questions about caching

2018-12-24 Thread Bin Fan
Hi Andrew, Since you mentioned the alternative solution with Alluxio , here is a more comprehensive tutorial on caching Spark dataframes on Alluxio: https://www.alluxio.com/blog/effective-spark-dataframes-with-alluxio Namely, caching your dataframe is simply running

Re: Questions about caching

2018-12-18 Thread Reza Safi
Hi Andrew, 1) df2 will cache all the columns 2) In spark2 you will receive a warning like: WARN execution.CacheManager: Asked to cache already cached data. I don't recall whether it is the same in 1.6. Seems you are not using spark 2. 2a) Not sure whether you are suggesting for a feature in

Questions about caching

2018-12-11 Thread Andrew Melo
Greetings, Spark Aficionados- I'm working on a project to (ab-)use PySpark to do particle physics analysis, which involves iterating with a lot of transformations (to apply weights and select candidate events) and reductions (to produce histograms of relevant physics objects). We have a basic