That's an unfortunate documentation bug in the programming guide... We
failed to update it after making the change.
Cheng
On 2/28/15 8:13 AM, Deborah Siegel wrote:
Hi Michael,
Would you help me understand the apparent difference here..
The Spark 1.2.1 programming guide indicates:
Note
when cached the table using
cacheTable()?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-function-on-parquet-without-sql-tp21833p21850.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi Michael,
Would you help me understand the apparent difference here..
The Spark 1.2.1 programming guide indicates:
Note that if you call schemaRDD.cache() rather than
sqlContext.cacheTable(...), tables will *not* be cached using the in-memory
columnar format, and therefore
From Zhan Zhang's reply, yes I still get the parquet's advantage.
You will need to at least use SQL or the DataFrame API (coming in Spark
1.3) to specify the columns that you want in order to get the parquet
benefits. The rest of your operations can be standard Spark.
My next question is,
Regards
Tridib
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-function-on-parquet-without-sql-tp21833.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
as it would have been if I have used SQL?
Please advice.
Thanks Regards
Tridib
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-function-on-parquet-without-sql-tp21833.html
Sent from the Apache Spark User List mailing list archive