On 13.02.2017 19:20, Tom Lane wrote:
Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:
I wonder why SUM aggregate is calculated for real (float4) type using
floating point accumulator?
If you can't deal with the vagaries of floating-point arithmetic, you
shouldn't be storing your data in float format. Use numeric.
4-byte floats are widely used for example in trading applications just
because it is two times shorter then double and range of stored data is
relatively small (do not need a lot of significant digits). At the same
time volume of stored data is very large and switching from float4 to
float8 will almost double it. It requires two times more storage and
almost two times increase query execution time.
So this is not acceptable answer.
Are there are reasons of using float4pl function for SUM aggregate instead of
float4_accum?
The latter is probably a good two orders of magnitude slower, and it
wouldn't really do much to solve the inherent accuracy problems of
adding float4 values that have a wide dynamic range.
It is not true - please notice query execution time of this two queries:
postgres=# select sum(l_quantity) from lineitem where l_shipdate <=
'1998-12-01';
sum
-------------
1.52688e+09
(1 row)
Time: 2858.852 ms
postgres=# select sum(l_quantity+0.0) from lineitem where l_shipdate <=
'1998-12-01';
sum
------------
1529738036
(1 row)
Time: 3174.529 ms
Looks like now in Postgres aggregate calculation itself is not a
bottleneck, comparing with tuple deform cost.
The expectation for SUM(float4) is that you want speed and are
prepared to cope with the consequences. It's easy enough to cast your
input to float8 if you want a wider accumulator, or to numeric if
you'd like more stable (not necessarily more accurate :-() results.
I do not think it's the database's job to make those choices for you.
From my point of your it is strange and wrong expectation.
I am choosing "float4" type for a column just because it is enough to
represent range of data I have and I need to minimize size of record.
But when I am calculating sum, I expect to receive more or less precise
result. Certainly I realize that even in case of using double it is
possible to loose precision while calculation and result may depend on
sum order (if we add very small and very larger values). But in real use
cases (for example in trading data) such large difference in attribute
values is very rare. If you have, for example, stock price, then it is
very unlikely that one company has value 0.000001 and another 10000000.0
At least in TPC-H example (which certainly deal with dummy generated
data), double type produce "almost price" result.
regards, tom lane
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company