Hi,

I was wondering if there's any planned support for local disk columnar
storage.

This could be an extension of the in-memory columnar store, or possibly
something similar to the recently added local checkpointing for RDDs

This could also have the added benefit of enabling iterative usage for
DataFrames by pruning the query plan through local checkpoints.

A further enhancement would be to add update support to the columnar format
(in the immutable copy-on-write sense of course), by maintaining references
to unchanged row blocks and only copying and mutating the ones that have
changed.

A use case here is streaming and merging updates in a large dataset that
can be efficiently stored internally in a columnar format, rather than
accessing a more inefficient external  data store like HDFS or Cassandra.

Thanks,
Cristian

Reply via email to