I have been working a lot recently with denormalised tables with lots of
columns, nearly 600. We are using this form to avoid joins. 

I have tried to use cache table with this data, but it proves too expensive
as it seems to try to cache all the data in the table.

For data sets such as the one I am using you find that certain columns will
be hot, referenced frequently in queries, others will be used very
infrequently.

Therefore it would be great if caches could be column based. I realise that
this may not be optimal for all use cases, but I think it could be quite a
common need.  Has something like this been considered?

Thanks Mick



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Caching-tables-at-column-level-tp10377.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to