Re: merge-on-read?

Owen O'Malley Wed, 28 Nov 2018 10:14:29 -0800

I’m not sure what use case Erik is looking for, but I’ve had users that want to 
do the equivalent of HBase’s column families. They want some of the columns to 
be stored separately and the merged together on read. The requirements would be 
that there is a 1:1 mapping between rows in the matching files and stripes.

It would look like:

file1.orc: struct<name:string,email:string> file2.orc: 
struct<lastAccess:timestamp>

It would let them leave the stable information and only re-write the second 
column family when the information in the mutable column family changes. It 
would also support use cases where you add data enrichment columns after the 
data has been ingested. 

From there it is easy to imagine having a replace operator where file2’s 
version of a column replaces file1’s version. 

.. Owen

> On Nov 28, 2018, at 9:44 AM, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> What do you mean by merge on read?
> 
> A few people I've talked to are interested in building delete and upsert
> features. Those would create files that track the changes, which would be
> merged at read time to apply them. Is that what you mean?
> 
> rb
> 
> On Tue, Nov 27, 2018 at 12:26 PM Erik Wright
> <erik.wri...@shopify.com.invalid> wrote:
> 
>> Has any consideration been given to the possibility of eventual
>> merge-on-read support in the Iceberg table spec?
>> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: merge-on-read?

Reply via email to