> I am not sure examples need to be "official" -- I suspect people would be
interested in public open source examples of various types of indexes that
they could adapt to their own needs.

I mean, considering the problems above, maybe it's hard to define some
"official"
indexes. Even parquet already has "BlockSplitBloomFilter", the indexes
involves
fast and might be hard to "make it a standard"( But maybe someone interested
in this can have a try).

Best,
Xuwei Fu

Andrew Lamb <[email protected]> 于2025年7月17日周四 19:18写道:

> > 1. Parquet file format seems have index page [1], but I don't know who's
>
> The INDEX_PAGE type a fascinating point -- I am not sure what benefit
> writing indexes using that annotation would be 🤔
>
> > Currently I don't know whether we can have some "offcial" sample index.
>
> I am not sure examples need to be "official" -- I suspect people would be
> interested in public open source examples of various types of indexes that
> they could adapt to their own needs.
>
> Andrew
>
> On Wed, Jul 16, 2025 at 7:16 AM wish maple <[email protected]> wrote:
>
> > Seems good. Personally I think
> >
> > 1. Parquet file format seems have index page [1], but I don't know who's
> > using it.
> > 2. Currently, Parquet only have single column bloom filter and column
> > index. Maybe
> >     some kind of multi-column or other filter might work
> > 3. Index can have different "levels", like Page Index is designed for
> > "Page", and bloom
> >     filter / statistics for RowGroup. We can even define index for "file"
> >
> > Currently I don't know whether we can have some "offcial" sample index.
> > Personally I
> > might be interested in some "sketches"
> >
> > Best,
> > Xuwei Fu
> >
> > [1]
> >
> >
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L655
> >
> > Andrew Lamb <[email protected]> 于2025年7月16日周三 19:08写道:
> >
> > > I wrote a blog with Qi Zhu, Jigao Luo explaining how to embed user
> > defined
> > > indexes into Parquet files without needing any changes to the
> format[1].
> > >
> > > I am sorry for the somewhat shameless self promotion, but I think this
> > > topic may be of general interest to the community in the context of
> other
> > > extensions to the format we have discussed recently. Techniques such as
> > > this widen potential usecases of  Parquet without any need for
> consensus
> > or
> > > timeline for ecosystem adoption.
> > >
> > > Andrew
> > >
> > > [1]:
> > >
> >
> https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/
> > >
> >
>

Reply via email to