Hi Jacky, If we create bloom filter in blocklet level, maybe too similar to bloom datamap and have to face the same problems bloom datamap facing, except the pruning is running in executor side. Page level is preferred since page size is KNOWN and this let us get rid of considering how many bit should we need in the bitmap of bloom filter, only the FPP needed to be set. I checked the problem you mentioned actually exists. This also a problem when pruning pages by page minmax. Although minmax may believes this page does not need to scan, current query logic already loaded both the datachunk3 and column pages. The IO for column page is wasted. Should we change this first? Is this worth for us to separate one IO operation into two?
Anyone interesting in this part is welcomed to share you ideas also. Thanks. Manhua On 2019/11/04 09:15:35, Jacky Li <jacky.li...@qq.com> wrote: > Hi Manhua, > > +1 for this feature. > > One question: > Since one column chunk in one blocklet is carbon's minimum IO unit, why not > create bloom filter in blocklet level? If it is page level, we still need to > read page data into memory, the saving is only for decompression. > > > Regards, > Jacky > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >