Alex Behm has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG
Commit Message:

Line 21:  * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
> Can do - but where would that extra slowness really come from? I would have
I'm assuming you are measuring response time. Since there is overall more work 
for the scanner to do in your dict-encoded   experiment, any difference in perf 
will be less pronounced because it affects a relatively smaller portion of the 
work. With plain encoded there is no "overhead" of decoding the dictionary 
indexes and fetching the values from the dictionary. For a single decimal 
column, the work of decoding the dict indexes and fetching their values should 
be in the same ball park as just populating the slot directly with plain 
encoding, so there is roughly 50% "noise" it seems.


Line 23:  * No performance difference measured by introduction of extra
> No, but I can do. What do you expect to change?
I'm assuming you compared response times. With multi-threaded scans the loss in 
perf might not be apparent.

With mt_dop=1 we're running the whole query in a single thread, so any slowdown 
along that critical path should prominently affect response time.


-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to