This is a big improvement, but I don't think it's for low cardinal fields, 
because the index at the file level, and for low cardinal fields (e.g. gender 
is only male and female) in most cases (the field is not sorted) it is present 
in all files.

For specific business, we wants a json index, bitmap index, reverse index, etc  
to adapt to different query conditions. So we also need a priority, using 
different indexes for different query filter and finally combining the results 
(based on the actual filter criteria and/or)

________________________________
发件人: yu zelin <[email protected]>
发送时间: 2024年3月15日 14:43
收件人: [email protected] <[email protected]>
主题: Re: [DISCUSS] PIP-17: Introduce secondary column index

An exciting feature, +1.

Best Regards,
Zelin Yu

On Thu, Mar 14, 2024 at 5:53 PM yejunhao <[email protected]> wrote:

> Hi, Paimon Devs, I’d like to start a discussion about PIP-17[1].
>
> Up to now, Paimon use zorder & order & hilbert sort compaction to speed up
> query. After sort compaction, files will be sorted by the order of
> specified columns. But in some situations, for example, we have tens of
> columns that should be added in the filter column, sometimes all of them
> come up together, sometimes, just a few of them. Zorder or order compaction
> can't handle this situation, because too many columns will reduce the
> effect of sorting. So if the column base number of these columns is small,
> we can use bloomfilter or other indexes to speed up queries. That's why
> this PIP comes up. I want to introduce an index framework to support paimon
> with flexible index system.
>
> Look forward to your question and suggestions.
>
> Best, junhao
>
> [1]
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-17%3A+Introduce+secondary+column+index

Reply via email to