[ 
https://issues.apache.org/jira/browse/KUDU-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864309#comment-16864309
 ] 

Todd Lipcon commented on KUDU-2854:
-----------------------------------

I think we should also consider fast-pathing equality checks for dictionary 
predicates. Currently, we evaluate the predicate against the dictionary and 
come up with a bitmap of matching values. Then, for each codeword, we test the 
corresponding bit in the bitmap. That bitmap testing likely requires a few 
cycles and a branch, and can't readily be done with SIMD outside of AVX512 
gather instructions.

In the case that we see that exactly one dictionary value matches the 
predicate, we can transform it into an equality predicate on the codewords, and 
then use the SIMD-optimized equality code path.

I don't have perf numbers on hand but I know I often am querying datasets using 
equality predicates on dictionary-coded columns.

> Short circuit predicates on dictionary-coded columns
> ----------------------------------------------------
>
>                 Key: KUDU-2854
>                 URL: https://issues.apache.org/jira/browse/KUDU-2854
>             Project: Kudu
>          Issue Type: Improvement
>          Components: cfile, perf, tserver
>            Reporter: Todd Lipcon
>            Priority: Major
>
> In the common case that a column has no updates in a given DRS, if we see 
> that no entries in the dictionary match the predicate, we can short circuit 
> at a few layers:
> - we can store a flag in the cfile footer that indicates that all blocks are 
> dict-coded (ie there are no fallbacks). In that case, we can skip the whole 
> rowset
> - if a cfile is partially dict-encoded, we can skip any dict-coded blocks 
> without decoding the dictionary words



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to