Taras Bobrovytsky has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/7438 )
Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication ...................................................................... IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +-----------------------------------------------------------------------+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +-----------------------------------------------------------------------+ | DECIMAL(38,38) | +-----------------------------------------------------------------------+ +-----------------------------------------------------------------------+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +-----------------------------------------------------------------------+ | DECIMAL(38,30) | +-----------------------------------------------------------------------+ DECIMAL V2: +-----------------------------------------------------------------------+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +-----------------------------------------------------------------------+ | DECIMAL(38,37) | +-----------------------------------------------------------------------+ +-----------------------------------------------------------------------+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +-----------------------------------------------------------------------+ | DECIMAL(38,6) | +-----------------------------------------------------------------------+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 332 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/8 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 8 Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>