Re: [DISCUSS] PIP-17: Introduce secondary column index

Aitozi Tue, 19 Mar 2024 08:10:54 -0700

Hi, junhao

    I's nice to see the secondary index feature in paimon. After read the
PIP, I have several questions here.


(1) For the primary key table, we only push down the filter for the primary
key, because,
 we can not filter the value if the value should be merged with other
levels data. So will
the primary key table be benefit from the secondary column index ? Or the
main improvement
is for the append table ?

(2) The storage of the index file, "one file for one datafile of one index
type", will this bring too much
extra files, an index type will x2 the file number ?

(3) "While drop column index, for example, I have indexed column a and b, I
don't want to index a anymore. I just need to drop the target index bytes
from index file,
and don't have to read the data file again."

Do you mean we will have to rewrite the index file when drop one column
index in it ?

Best,
Aitozi

JUNHAO YE <[email protected]> 于2024年3月19日周二 19:26写道：

> Hi, Zhang YiLong
>
> You are right, as I mentioned in PIP-17. We should have priority of
> different index types. We should consider about combine the result of
> different index type.
>
> Best, junhao.
>
>
> > 2024年3月18日 上午10:49，Zhang YiLong <[email protected]> 写道：
> >
> > This is a big improvement, but I don't think it's for low cardinal
> fields, because the index at the file level, and for low cardinal fields
> (e.g. gender is only male and female) in most cases (the field is not
> sorted) it is present in all files.
> >
> > For specific business, we wants a json index, bitmap index, reverse
> index, etc  to adapt to different query conditions. So we also need a
> priority, using different indexes for different query filter and finally
> combining the results (based on the actual filter criteria and/or)
> >
> > ________________________________
> > 发件人: yu zelin <[email protected]>
> > 发送时间: 2024年3月15日 14:43
> > 收件人: [email protected] <[email protected]>
> > 主题: Re: [DISCUSS] PIP-17: Introduce secondary column index
> >
> > An exciting feature, +1.
> >
> > Best Regards,
> > Zelin Yu
> >
> > On Thu, Mar 14, 2024 at 5:53 PM yejunhao <[email protected]>
> wrote:
> >
> >> Hi, Paimon Devs, I’d like to start a discussion about PIP-17[1].
> >>
> >> Up to now, Paimon use zorder & order & hilbert sort compaction to speed
> up
> >> query. After sort compaction, files will be sorted by the order of
> >> specified columns. But in some situations, for example, we have tens of
> >> columns that should be added in the filter column, sometimes all of them
> >> come up together, sometimes, just a few of them. Zorder or order
> compaction
> >> can't handle this situation, because too many columns will reduce the
> >> effect of sorting. So if the column base number of these columns is
> small,
> >> we can use bloomfilter or other indexes to speed up queries. That's why
> >> this PIP comes up. I want to introduce an index framework to support
> paimon
> >> with flexible index system.
> >>
> >> Look forward to your question and suggestions.
> >>
> >> Best, junhao
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-17%3A+Introduce+secondary+column+index
>
>

Re: [DISCUSS] PIP-17: Introduce secondary column index

Reply via email to