Thanks for the email. Can you explain what the difference is between this
and existing formats such as Parquet/ORC?


On Wed, Nov 11, 2015 at 4:59 AM, Cristian O <cristian.b.op...@googlemail.com
> wrote:

> Hi,
>
> I was wondering if there's any planned support for local disk columnar
> storage.
>
> This could be an extension of the in-memory columnar store, or possibly
> something similar to the recently added local checkpointing for RDDs
>
> This could also have the added benefit of enabling iterative usage for
> DataFrames by pruning the query plan through local checkpoints.
>
> A further enhancement would be to add update support to the columnar
> format (in the immutable copy-on-write sense of course), by maintaining
> references to unchanged row blocks and only copying and mutating the ones
> that have changed.
>
> A use case here is streaming and merging updates in a large dataset that
> can be efficiently stored internally in a columnar format, rather than
> accessing a more inefficient external  data store like HDFS or Cassandra.
>
> Thanks,
> Cristian
>

Reply via email to