Re: [HACKERS] Sum aggregate calculation for single precsion real

Konstantin Knizhnik Mon, 13 Feb 2017 08:45:54 -0800


On 13.02.2017 19:20, Tom Lane wrote:

Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:

I wonder why SUM aggregate is calculated for real (float4) type using
floating point accumulator?

If you can't deal with the vagaries of floating-point arithmetic, you
shouldn't be storing your data in float format.  Use numeric.

4-byte floats are widely used for example in trading applications justbecause it is two times shorter then double and range of stored data isrelatively small (do not need a lot of significant digits). At the sametime volume of stored data is very large and switching from float4 tofloat8 will almost double it. It requires two times more storage andalmost two times increase query execution time.

So this is not acceptable answer.

Are there are reasons of using float4pl function for SUM aggregate instead of 
float4_accum?

The latter is probably a good two orders of magnitude slower, and it
wouldn't really do much to solve the inherent accuracy problems of
adding float4 values that have a wide dynamic range.


It is not true - please notice query execution time of this two queries:

postgres=# select sum(l_quantity) from lineitem where l_shipdate <='1998-12-01';

     sum
-------------
 1.52688e+09
(1 row)

Time: 2858.852 ms

postgres=# select sum(l_quantity+0.0) from lineitem where l_shipdate <='1998-12-01';

    sum
------------
 1529738036
(1 row)

Time: 3174.529 ms

Looks like now in Postgres aggregate calculation itself is not abottleneck, comparing with tuple deform cost.

The expectation for SUM(float4) is that you want speed and are
prepared to cope with the consequences.  It's easy enough to cast your
input to float8 if you want a wider accumulator, or to numeric if
you'd like more stable (not necessarily more accurate :-() results.
I do not think it's the database's job to make those choices for you.


From my point of your it is strange and wrong expectation.

I am choosing "float4" type for a column just because it is enough torepresent range of data I have and I need to minimize size of record.But when I am calculating sum, I expect to receive more or less preciseresult. Certainly I realize that even in case of using double it ispossible to loose precision while calculation and result may depend onsum order (if we add very small and very larger values). But in real usecases (for example in trading data) such large difference in attributevalues is very rare. If you have, for example, stock price, then it isvery unlikely that one company has value 0.000001 and another 10000000.0At least in TPC-H example (which certainly deal with dummy generateddata), double type produce "almost price" result.


                        regards, tom lane


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Sum aggregate calculation for single precsion real

Reply via email to