Alex Behm has posted comments on this change. Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner ......................................................................
Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5115/1//COMMIT_MSG Commit Message: Line 21: * Tested computing SUM(col) for 1 billion distinct dictionary-encoded > Can do - but where would that extra slowness really come from? I would have I'm assuming you are measuring response time. Since there is overall more work for the scanner to do in your dict-encoded experiment, any difference in perf will be less pronounced because it affects a relatively smaller portion of the work. With plain encoded there is no "overhead" of decoding the dictionary indexes and fetching the values from the dictionary. For a single decimal column, the work of decoding the dict indexes and fetching their values should be in the same ball park as just populating the slot directly with plain encoding, so there is roughly 50% "noise" it seems. Line 23: * No performance difference measured by introduction of extra > No, but I can do. What do you expect to change? I'm assuming you compared response times. With multi-threaded scans the loss in perf might not be apparent. With mt_dop=1 we're running the whole query in a single thread, so any slowdown along that critical path should prominently affect response time. -- To view, visit http://gerrit.cloudera.org:8080/5115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes