Zach Amsden has uploaded a new patch set (#16).

Change subject: IMPALA-2020: Add rounding for decimal casts
......................................................................

IMPALA-2020: Add rounding for decimal casts

This change adds support for DECIMAL_V2 rounding behavior for both
DECIMAL to INT and DOUBLE to DECIMAL casts.  The round behavior
implemented for exact halves is round halves away from zero (e.g
(0.5 -> 1) and (-0.5 -> -1)).

Testing: Added expr-test and decimal-test test coverage as well as
manual testing.  I tried to update the expr benchmark to get some
kind of results but the benchmark is pretty bit-rotted.  It was
throwing JNI exceptions.  Fixed up the JNI init call, but there is
still a lot of work to do to get this back in a runnable state.
Even with the hack to get at the RuntimeContext, we end up getting
null derefs due to the slot descriptor table not being initialized.

I have decided to wait on expanding the python test until the bugs
with overflow are fixed, which will make it easier to test sane
behavior.

[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 0                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.5                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] > select cast(0.59999 AS int);
+----------------------+
| cast(0.59999 as int) |
+----------------------+
| 1                    |
+----------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > select cast(cast(0.5999 as float) as decimal(5,1));
+---------------------------------------------+
| cast(cast(0.5999 as float) as decimal(5,1)) |
+---------------------------------------------+
| 0.6                                         |
+---------------------------------------------+
Fetched 1 row(s) in 0.01s

Performance summary.  In all cases I have tried multiple times and
taken the fastest query results.

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:31:32 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=bf4d06517e3093ca:b053953b00000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.53s

With this change, and decimal_v2 off:

[localhost:21000] > select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:06:05 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=e54f6fc2c21d63d0:da7b6baa00000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293784575265                        |
+--------------------------------------+
Fetched 1 row(s) in 0.52s

Note that there is some noise / instability in these results and across
invocations there is quite a bit of variance.  Still we appear not to
have regressed.

With decimal V2 enabled, we loose some performance due to rounding.

[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] >  select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem;
Query: select sum(cast(l_extendedprice as bigint)) from
tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:04:45 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=444a15a709f0672:74197af00000000
+--------------------------------------+
| sum(cast(l_extendedprice as bigint)) |
+--------------------------------------+
| 2293814088985                        |
+--------------------------------------+
Fetched 1 row(s) in 0.63s

So we're about 20% slower.  The variance is quite a lot so this is not a
scientific number, but the trend is maintained.  So we have some work to
do to get this back.

Casting from double seems to be roughly at parity:

Old version, head at 815c76f9cbbe6585ebed961da506fc54ce2ef4e3:

[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 19:41:07 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=2645812a55b081c7:8b8eecc800000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s

[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 20:59:38 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=72472ad54b4d2324:4e5354c500000000
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.72s

[localhost:21000] > set decimal_v2=0;
DECIMAL_V2 set to 0
[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:02:48 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=5f48db565c4426e1:9a9b255000000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.64s

[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:01:44 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=a148fc7f88508bf3:7d0381a00000000
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3596978911                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

[localhost:21000] > set decimal_v2=1;
DECIMAL_V2 set to 1
[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:03:44 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=724bf16838f51065:78675af700000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813156773.36                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s
[localhost:21000] >  select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(38,10))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-21 21:03:53 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=d746528416873278:a94e61ef00000000
+--------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(38,10))) |
+--------------------------------------------------------------+
| 2293813156773.3600000000                                     |
+--------------------------------------------------------------+
Fetched 1 row(s) in 0.73s

Interestingly, you can see the effect of the rounding as well - the
decimal 38,10 result is now precise, where as the truncation before
left artifacts from the division.

Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr.h
M be/src/exprs/literal.cc
M be/src/runtime/decimal-test.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/decimal-value.inline.h
M be/src/udf/udf.h
9 files changed, 485 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/5951/16
-- 
To view, visit http://gerrit.cloudera.org:8080/5951
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2daf186b4770a022f9cb349d512067a1dd624810
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <zams...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>

Reply via email to