subject:"sqlContext.cacheTable\(\"tableName\"\) vs dataFrame.cache\(\)"

Re: sqlContext.cacheTable("tableName") vs dataFrame.cache()

2016-01-19 Thread Jerry Lam

Is cacheTable similar to asTempTable before? Sent from my iPhone > On 19 Jan, 2016, at 4:18 am, George Sigletos wrote: > > Thanks Kevin for your reply. > > I was suspecting the same thing as well, although it still does not make much > sense to me why would you need

sqlContext.cacheTable("tableName") vs dataFrame.cache()

2016-01-15 Thread George Sigletos

According to the documentation they are exactly the same, but in my queries dataFrame.cache() results in much faster execution times vs doing sqlContext.cacheTable("tableName") Is there any explanation about this? I am not caching the RDD prior to creating the dataframe. Using Pyspark on Spark

Re: sqlContext.cacheTable("tableName") vs dataFrame.cache()

2016-01-15 Thread Kevin Mellott

Hi George, I believe that sqlContext.cacheTable("tableName") is to be used when you want to cache the data that is being used within a Spark SQL query. For example, take a look at the code below. > val myData = sqlContext.load("com.databricks.spark.csv", Map("path" -> > "hdfs://somepath/file",