Re: [DISCUSSION] Page Level Bloom Filter

Manhua Jiang Mon, 04 Nov 2019 18:31:10 -0800

Hi Jacky,
  If we create bloom filter in blocklet level, maybe too similar to bloom 
datamap and have to face the same problems bloom datamap facing, except the 
pruning is running in executor side.
  Page level is preferred since page size is KNOWN and this let us get rid of 
considering how many bit should we need in the bitmap of bloom filter, only the 
FPP needed to be set.
  I checked the problem you mentioned actually exists. This also a problem when 
pruning pages by page minmax. Although minmax may believes this page does not 
need to scan, current query logic already loaded both the datachunk3 and column 
pages. The IO for column page is wasted. Should we change this first? Is this 
worth for us to separate one IO operation into two?


Anyone interesting in this part is welcomed to share you ideas also.

Thanks.
Manhua

On 2019/11/04 09:15:35, Jacky Li <jacky.li...@qq.com> wrote: 
> Hi Manhua,
> 
> +1 for this feature.
> 
> One question:
> Since one column chunk in one blocklet is carbon's minimum IO unit, why not
> create bloom filter in blocklet level? If it is page level, we still need to
> read page data into memory, the saving is only for decompression.
> 
> 
> Regards,
> Jacky
> 
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Re: [DISCUSSION] Page Level Bloom Filter

Reply via email to