+1 on the format! It looks great!

Thanks for materializing the initial design idea.

Miao
From: Kyle Bendickson <kjbendick...@gmail.com>
Date: Sunday, June 12, 2022 at 1:55 PM
To: dev@iceberg.apache.org <dev@iceberg.apache.org>
Subject: Re: [VOTE] Adopt Puffin format as a file format for statistics and 
indexes

EXTERNAL: Use caution when clicking on links or opening attachments.


+1 [non-binding]

Thank you Piotr for all of the work you’ve put into this.

This should greatly benefit not only Iceberg on Trino, but hopefully can be 
used in many novel ways due to its well thought out generic design and 
incorporation of the ability to extend with new sketches.

Looking forward to the improvements this will bring.

- Kyle

On Fri, Jun 10, 2022 at 1:47 PM Alexander Jo 
<alex...@starburstdata.com<mailto:alex...@starburstdata.com>> wrote:
+1, let's do it!

On Fri, Jun 10, 2022 at 2:47 PM John Zhuge 
<jzh...@apache.org<mailto:jzh...@apache.org>> wrote:
+1  Looking forward to the features it enables.

On Fri, Jun 10, 2022 at 10:11 AM Yufei Gu 
<flyrain...@gmail.com<mailto:flyrain...@gmail.com>> wrote:
+1. Looking forward to the partition stats.
Best,

Yufei


On Thu, Jun 9, 2022 at 6:32 PM Daniel Weeks 
<dwe...@apache.org<mailto:dwe...@apache.org>> wrote:
+1 as well.  Excited about the progress here.

-Dan
On Thu, Jun 9, 2022, 6:25 PM Junjie Chen 
<chenjunjied...@gmail.com<mailto:chenjunjied...@gmail.com>> wrote:
+1, really nice! Indexes are coming!

On Fri, Jun 10, 2022 at 8:04 AM Szehon Ho 
<szehon.apa...@gmail.com<mailto:szehon.apa...@gmail.com>> wrote:
+1, it's an exciting step for Iceberg, look forward to all the new statistics 
and secondary indices it will allow.

Had a few questions of what the reference to Puffin file(s) will be in the 
Iceberg spec, but it's orthogonal to Puffin file format itself.

Thanks,
Szehon

On Thu, Jun 9, 2022 at 3:32 PM Ryan Blue 
<b...@tabular.io<mailto:b...@tabular.io>> wrote:
+1 from me!

There may also be people that haven't followed the design discussions and we 
can start a DISCUSS thread if needed. But if everyone is comfortable with the 
design and implementation, I think it's ready for a vote as well.

Huge thanks to Piotr for getting this ready! I think the format is going to be 
really useful for both stats and indexes in Iceberg.

On Thu, Jun 9, 2022 at 3:35 AM Piotr Findeisen 
<pi...@starburstdata.com<mailto:pi...@starburstdata.com>> wrote:
Hi Everyone,

I propose that we adopt Puffin file format as a file format for statistics and 
indexes in Iceberg tables.

Puffin file format specification:
https://github.com/apache/iceberg/blob/master/format/puffin-spec.md<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fblob%2Fmaster%2Fformat%2Fpuffin-spec.md&data=05%7C01%7Cmiwang%40adobe.com%7Cba30cde28d1b4e3abe5108da4cb5ef83%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637906641543835876%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3Y04jqMw6ZIc%2BojDmWlpOeLL5zQ3YvLcdAgoHJTwL8c%3D&reserved=0>
(previous discussions:  
https://github.com/apache/iceberg/pull/4944<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F4944&data=05%7C01%7Cmiwang%40adobe.com%7Cba30cde28d1b4e3abe5108da4cb5ef83%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637906641543835876%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tWuoyTfEaIWmOFivROQRt0fD1KRYc%2FqwRO2KoZhIoi8%3D&reserved=0>,
 
https://github.com/apache/iceberg-docs/pull/69<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg-docs%2Fpull%2F69&data=05%7C01%7Cmiwang%40adobe.com%7Cba30cde28d1b4e3abe5108da4cb5ef83%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637906641543835876%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Uf8XiuLSLEO8YtCMkk%2BSXWS6lefw95O22K844P5Iovc%3D&reserved=0>)

Intend use:
* statistics in Iceberg tables (see 
https://github.com/apache/iceberg/pull/4945<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F4945&data=05%7C01%7Cmiwang%40adobe.com%7Cba30cde28d1b4e3abe5108da4cb5ef83%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637906641543835876%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swByVgNPD6lbSlJjHIZZX4jgeVzC%2BT%2BWUvxrrg0Wpx8%3D&reserved=0>
 and associated proposed implementation 
https://github.com/apache/iceberg/pull/4741<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F4741&data=05%7C01%7Cmiwang%40adobe.com%7Cba30cde28d1b4e3abe5108da4cb5ef83%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637906641543835876%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dYckyv1f36iQqs9%2FaRQRsumtB2xEmwcFJAQihYZRYlw%3D&reserved=0>)
* in the future: storage for secondary indexes

Puffin file reader and writer implementation:
https://github.com/apache/iceberg/pull/4537<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F4537&data=05%7C01%7Cmiwang%40adobe.com%7Cba30cde28d1b4e3abe5108da4cb5ef83%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637906641543835876%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YY%2B52Eq%2FcnnseM5Nd4E0D3Xw8IWMsD4QaI98LXFMu9c%3D&reserved=0>

Thanks,
PF



--
Ryan Blue
Tabular


--
Best Regards


--
John Zhuge

Reply via email to