Hi Mich, Thank you again for your reply.
As I see you are caching the table already sorted > > val keyValRDDSorted = keyValRDD.sortByKey().cache > > and the next stage is you are creating multiple tempTables (different > ranges) that cache a subset of rows already cached in RDD. The data stored > in tempTable is in Hive columnar format (I assume that means ORC format) > But the thing is that I don't explicitly cache the tempTables, and I don't really want to because I'll only run a single query on each tempTable. So I expect the SQL query processor to operate directly on the underlying key-value RDD, and my concern is that this may be inefficient. > Well that is all you can do. > Ok, thanks - that's really what I wanted to get confirmation of. > Bear in mind that these tempTables are immutable and I do not know any way > of dropping tempTable to free more memory. > I'm assuming there won't be any (significant) memory overhead of registering the temp tables as long as I don't explicitly cache them. Am I wrong? In any case I'll be calling sqlContext.dropTempTable once the query has completed, which according to the documentation should also free up memory. Cheers, Michael