Re: Parquet's new table format

Paul Rogers Fri, 08 Dec 2017 19:19:14 -0800

Very cool indeed.

Iceberg, if I understand the post, is a file container format: it identifies 
the set of files (in this case, Parquet files) that make up a “table.” Since 
Iceberg mentions Hive, it presumably would work for any file format (since it 
is just a file container.)

This would be a great way to solve our Parquet metadata problem: it allows us 
to identify the set of files that define Drill’s Parquet metadata, and to make 
changes to that metadata in a transactional way.

I wonder if Iceberg could be augmented to include both data and metadata in the 
same container? That way, Drill views and/or Parquet metadata could be managed 
as a unit with the files they describe.

Further, it would seem that Iceberg might be extended to support MVCC by 
listing the files that make up each version (or, equivalently, by listing the 
deltas between versions.)

This is definitely something to watch. Thanks, Parth for bringing it to our 
attention (and thanks to Netflix for open sourcing the format).

- Paul

> On Dec 8, 2017, at 9:37 AM, Parth Chandra <[email protected]> wrote:
> 
> FYI
> 
> The Parquet is working on introducing new table format called 'Iceberg' [1]
> that has interesting and useful features.
> 
> Take a look at the initial post.
> 
> 
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_f90ac1c268dea4077e358df1df8dd48f3766db8d4db476c3e0d9baa8-40-253Cdev.parquet.apache.org-253E&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Dz59a-Un_5n3KbQ2RYN0KA&m=PHcXxx7w-DDoW_UM90WJzI_LGQAhQAGqF1z-3z9xIRc&s=NkV7as3K7vfi7rObvLdSmeDv58FdehfcqxhfRqSBjTU&e=

Re: Parquet's new table format

Reply via email to