Re: [Spark SQL] off-heap columnar store

Evan Chan Thu, 28 Aug 2014 23:32:31 -0700

>
>> The reason I'm asking about the columnar compressed format is that
>> there are some problems for which Parquet is not practical.
>
>
> Can you elaborate?


Sure.

- Organization or co has no Hadoop, but significant investment in some
other NoSQL store.
- Need to efficiently add a new column to existing data
- Need to mark some existing rows as deleted or replace small bits of
existing data

For these use cases, it would be much more efficient and practical if
we didn't have to take the origin of the data from the datastore,
convert it to Parquet first.  Doing so loses significant latency and
causes Ops headaches in having to maintain HDFS.     It would be great
to be able to load data directly into the columnar format, into the
InMemoryColumnarCache.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [Spark SQL] off-heap columnar store

Reply via email to