I have been working a lot recently with denormalised tables with lots of columns, nearly 600. We are using this form to avoid joins.
I have tried to use cache table with this data, but it proves too expensive as it seems to try to cache all the data in the table. For data sets such as the one I am using you find that certain columns will be hot, referenced frequently in queries, others will be used very infrequently. Therefore it would be great if caches could be column based. I realise that this may not be optimal for all use cases, but I think it could be quite a common need. Has something like this been considered? Thanks Mick -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Caching-tables-at-column-level-tp10377.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org