Very cool indeed. Iceberg, if I understand the post, is a file container format: it identifies the set of files (in this case, Parquet files) that make up a “table.” Since Iceberg mentions Hive, it presumably would work for any file format (since it is just a file container.)
This would be a great way to solve our Parquet metadata problem: it allows us to identify the set of files that define Drill’s Parquet metadata, and to make changes to that metadata in a transactional way. I wonder if Iceberg could be augmented to include both data and metadata in the same container? That way, Drill views and/or Parquet metadata could be managed as a unit with the files they describe. Further, it would seem that Iceberg might be extended to support MVCC by listing the files that make up each version (or, equivalently, by listing the deltas between versions.) This is definitely something to watch. Thanks, Parth for bringing it to our attention (and thanks to Netflix for open sourcing the format). - Paul > On Dec 8, 2017, at 9:37 AM, Parth Chandra <[email protected]> wrote: > > FYI > > The Parquet is working on introducing new table format called 'Iceberg' [1] > that has interesting and useful features. > > Take a look at the initial post. > > > [1] > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_f90ac1c268dea4077e358df1df8dd48f3766db8d4db476c3e0d9baa8-40-253Cdev.parquet.apache.org-253E&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Dz59a-Un_5n3KbQ2RYN0KA&m=PHcXxx7w-DDoW_UM90WJzI_LGQAhQAGqF1z-3z9xIRc&s=NkV7as3K7vfi7rObvLdSmeDv58FdehfcqxhfRqSBjTU&e=
