Hi Mich, Thank you for your quick reply!
What type of table is the underlying table? Is it Hbase, Hive ORC or what? > It is a custom datasource, but ultimately backed by HBase. > By Key you mean a UNIQUE ID or something similar and then you do multiple > scans on the tempTable which stores data using in-memory columnar format. > The key is a unique ID, yes. But note that I don't actually do multiple scans on the same temp table: I create a new temp table for every query I want to run, because each query will be based on a different key range. The caching is at the level of the full key-value RDD. If I did instead cache the temp table, I don't see a way of exploiting key ordering for key range filters? > That is the optimisation of tempTable storage as far as I know. > So it seems to me that my current solution won't be using this optimisation, as I'm caching the RDD rather than the temp table. > Have you tried it using predicate push-down on the underlying table itself? > No, because I essentially want to load the entire table into memory before doing any queries. At that point I have nothing to push down. Cheers, Michael