Like filters optimization for dictionary columns

Sujith (JIRA) Mon, 12 Dec 2016 10:18:20 -0800

Sujith created CARBONDATA-527:
---------------------------------

             Summary: Greater than/less-than/Like filters optimization for 
dictionary columns
                 Key: CARBONDATA-527
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-527
             Project: CarbonData
          Issue Type: Improvement
            Reporter: Sujith



Current design 
In greater than/less-than/Like filters, system first iterates each row present 
in the dictionary cache for identifying valid filter actual members  by 
applying the filter expression , once evaluation done system will hold the list 
of identified valid filter actual member values(String), now in next step again 
 system will look up the dictionary cache in order to identify the dictionary 
surrogate values of the identified members. this look up is an additional cost 
to our system even though the look up methodology is an binary search in 
dictionary cache.
 
Proposed design/solution:
Identify the dictionary surrogate values in filter expression evaluation step 
itself  when actual dictionary values will be scanned for identifying valid 
filter members .

Keep a dictionary counter variable which will be increased  when system 
iterates through  the dictionary cache in order to retrieve each actual member 
stored in dictionary cache , after this system will evaluate each row against 
the filter expression to identify whether its a valid filter member or not, 
while doing this process itself counter value can be taken as valid selected 
dictionary value since the actual member values and its  dictionary values will 
be kept in same order in dictionary cache as the iteration order.

thus it will eliminate the further dictionary look up step which is required  
to retrieve the dictionary surrogate value against identified actual valid 
filter member. this can also increase significantly the filter query 
performance of such filter queries which require expression evaluation to 
identify it the filter members by looking up dictionary cache, like greater 
than/less-than/Like filters .

Note : this optimization is applicable for dictionary columns.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-527) Greater than/less-than/Like filters optimization for dictionary columns

Reply via email to