> I am not sure examples need to be "official" -- I suspect people would be interested in public open source examples of various types of indexes that they could adapt to their own needs.
I mean, considering the problems above, maybe it's hard to define some "official" indexes. Even parquet already has "BlockSplitBloomFilter", the indexes involves fast and might be hard to "make it a standard"( But maybe someone interested in this can have a try). Best, Xuwei Fu Andrew Lamb <[email protected]> 于2025年7月17日周四 19:18写道: > > 1. Parquet file format seems have index page [1], but I don't know who's > > The INDEX_PAGE type a fascinating point -- I am not sure what benefit > writing indexes using that annotation would be 🤔 > > > Currently I don't know whether we can have some "offcial" sample index. > > I am not sure examples need to be "official" -- I suspect people would be > interested in public open source examples of various types of indexes that > they could adapt to their own needs. > > Andrew > > On Wed, Jul 16, 2025 at 7:16 AM wish maple <[email protected]> wrote: > > > Seems good. Personally I think > > > > 1. Parquet file format seems have index page [1], but I don't know who's > > using it. > > 2. Currently, Parquet only have single column bloom filter and column > > index. Maybe > > some kind of multi-column or other filter might work > > 3. Index can have different "levels", like Page Index is designed for > > "Page", and bloom > > filter / statistics for RowGroup. We can even define index for "file" > > > > Currently I don't know whether we can have some "offcial" sample index. > > Personally I > > might be interested in some "sketches" > > > > Best, > > Xuwei Fu > > > > [1] > > > > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L655 > > > > Andrew Lamb <[email protected]> 于2025年7月16日周三 19:08写道: > > > > > I wrote a blog with Qi Zhu, Jigao Luo explaining how to embed user > > defined > > > indexes into Parquet files without needing any changes to the > format[1]. > > > > > > I am sorry for the somewhat shameless self promotion, but I think this > > > topic may be of general interest to the community in the context of > other > > > extensions to the format we have discussed recently. Techniques such as > > > this widen potential usecases of Parquet without any need for > consensus > > or > > > timeline for ecosystem adoption. > > > > > > Andrew > > > > > > [1]: > > > > > > https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/ > > > > > >
