Re: Parquet's new table format

Joel Pfaff Fri, 05 Jan 2018 00:50:52 -0800

Hello,

Iceberg's code has been pushed yesterday to github:
https://github.com/Netflix/iceberg


And the draft of specs have been published there:
https://docs.google.com/document/d/1Q-zL5lSCle6NEEdyfiYsXYzX_
Q8Qf0ctMyGBKslOswA/edit?usp=sharing

--
Joel


On Sat, Dec 9, 2017 at 4:18 AM, Paul Rogers <[email protected]> wrote:

> Very cool indeed.
>
> Iceberg, if I understand the post, is a file container format: it
> identifies the set of files (in this case, Parquet files) that make up a
> “table.” Since Iceberg mentions Hive, it presumably would work for any file
> format (since it is just a file container.)
>
> This would be a great way to solve our Parquet metadata problem: it allows
> us to identify the set of files that define Drill’s Parquet metadata, and
> to make changes to that metadata in a transactional way.
>
> I wonder if Iceberg could be augmented to include both data and metadata
> in the same container? That way, Drill views and/or Parquet metadata could
> be managed as a unit with the files they describe.
>
> Further, it would seem that Iceberg might be extended to support MVCC by
> listing the files that make up each version (or, equivalently, by listing
> the deltas between versions.)
>
> This is definitely something to watch. Thanks, Parth for bringing it to
> our attention (and thanks to Netflix for open sourcing the format).
>
> - Paul
>
> > On Dec 8, 2017, at 9:37 AM, Parth Chandra <[email protected]> wrote:
> >
> > FYI
> >
> > The Parquet is working on introducing new table format called 'Iceberg'
> [1]
> > that has interesting and useful features.
> >
> > Take a look at the initial post.
> >
> >
> > [1]
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.
> apache.org_thread.html_f90ac1c268dea4077e358df1df8dd4
> 8f3766db8d4db476c3e0d9baa8-40-253Cdev.parquet.apache.org-253E&d=DwIBaQ&c=
> cskdkSMqhcnjZxdQVpwTXg&r=Dz59a-Un_5n3KbQ2RYN0KA&m=PHcXxx7w-DDoW_UM90WJzI_
> LGQAhQAGqF1z-3z9xIRc&s=NkV7as3K7vfi7rObvLdSmeDv58FdehfcqxhfRqSBjTU&e=
>
>

Re: Parquet's new table format

Reply via email to