Todd Lipcon created KUDU-2844:
---------------------------------

             Summary: Avoid copying strings from dictionary or plain-encoded 
blocks
                 Key: KUDU-2844
                 URL: https://issues.apache.org/jira/browse/KUDU-2844
             Project: Kudu
          Issue Type: Improvement
          Components: cfile, perf
            Reporter: Todd Lipcon


When scanning a plain or dictionary-encoded binary column, we currently loop 
over each entry and copy the string into the destination RowBlock's arena. In 
TPCH Q1, the scanner threads use a significant percentage of CPU doing this 
copying, and it also increases CPU cache footprint which likely decreases 
performance in downstream operations like predicate evaluation, merging, result 
serialization, etc.

Instead of doing this, we could "attach" the dictionary block (with 
ref-counting) to the RowBlock and refer directly to the dictionary entry from 
the RowBlock. When the RowBlock eventually is reset, we can drop the reference. 
This should be safe because we never mutate indirect data in-place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to