It works for me. With a quick thought, there may be a few concerns about consolidated fashion storage.
1). Maintaining the consolidated storage may be a bit more complex; 2). It may make collecting index while writing data file (i.e., online index building) more complex (e.g., we need to consider that multiple writers write to the same consolidated index file in parallel); 3). We need to have some auxiliary structure in the index file to quickly locate relevant index given some key (e.g., a data file name); However, I do think consolidated fashion storage is some meaningful optimization on the disk. If we properly design splitable and mergeable index file format, the consolidation fashion and 1-data-file-1-index (1:1 index file) are not mutual exclusive. Therefore, 1:1 index file can be the building block for larger consolidated index files and index at different levels, like partition level index. Our team member went through one pass of the design and shared some thoughts with me. I will complete my pass. Thanks! Miao From: Ryan Blue <rb...@netflix.com.INVALID> Date: Wednesday, March 3, 2021 at 6:08 PM To: OpenInx <open...@gmail.com> Cc: Iceberg Dev List <dev@iceberg.apache.org> Subject: Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg Great, thank you for planning to join! I definitely want to get your input on this as well. On Wed, Mar 3, 2021 at 6:06 PM OpenInx <open...@gmail.com<mailto:open...@gmail.com>> wrote: It will be 1:00 AM (China Standard Time) on 18 March, and it works for our Asia people. I'd love to attend this discussion, Thanks. On Thu, Mar 4, 2021 at 9:50 AM Ryan Blue <rb...@netflix.com.invalid> wrote: Thanks for putting this together, Guy! I just did a pass over the doc and it looks like a really reasonable proposal for being able to inject custom file filter implementations. One of the main things we need to think about is how to store and track the index data. There's a comment in the doc about storing them in a "consolidated fashion" and I'd like to hear more about what you're thinking there. The index-per-file approach that Adobe is working on is a good way to track index data because we get a clear lifecycle for index data because it is tied to a data file that is immutable. On the other hand, the drawback is that we have a lot of index files -- one per data file. Let's set up a time to go talk through the options. Would 9AM PST (17:00 UTC) on 17 March work for everyone? I'm thinking in the morning so everyone from IBM can attend. We can do a second discussion at a time that works more for people in Asia later on as well. If that day works, then I'll send out an invite. On Fri, Feb 19, 2021 at 8:49 AM Guy Khazma <guyk...@gmail.com<mailto:guyk...@gmail.com>> wrote: Hi All, Following up on our discussion from Wednesday sync here attached is a proposal to enhance iceberg with a pluggable interface for data skipping indexes to enable use of existing indexes in job planning. https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit?usp=sharing<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY%2Fedit%3Fusp%3Dsharing&data=04%7C01%7Cmiwang%40adobe.com%7C9ce4b2e7876c4e23a8ac08d8deb26ffc%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637504205348408643%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vFOaNdSwCYQO1p%2FDeX5glae%2BSo9aOF3S%2BR2bU2O1tM0%3D&reserved=0> We will be glad to get you feedback. Thanks, Guy -- Ryan Blue Software Engineer Netflix -- Ryan Blue Software Engineer Netflix