On 14.09.2012 11:25, Kyotaro HORIGUCHI wrote:
Hello, I will propose reduce palloc's in numeric operations.

The numeric operations are slow by nature, but usually it is not
a problem for on-disk operations. Altough the slowdown is
enhanced on on-memory operations.

I inspcted them and found some very short term pallocs. These
palloc's are used for temporary storage for digits of unpaked
numerics.

The formats of numeric digits in packed and unpaked forms are
same. So we can kicked out a part of palloc's using digits in
packed numeric in-place to make unpakced one.

In this patch, I added new function set_var_from_num_nocopy() to
do this. And make use of it for operands which won't modified.

Have to be careful to really not modify the operands. numeric_out() and numeric_out_sci() are wrong; they call get_str_from_var(), which modifies the argument. Same with numeric_expr(): it passes the argument to numericvar_to_double_no_overflow(), which passes it to get_str_from_var(). numericvar_to_int8() also modifies its argument, so all the functions that use that, directly or indirectly, must make a copy.

Perhaps get_str_from_var(), and the other functions that currently scribble on the arguments, should be modified to not do so. They could easily make a copy of the argument within the function. Then the callers could safely use set_var_from_num_nocopy(). The performance would be the same, you would have the same number of pallocs, but you would get rid of the surprising argument-modifying behavior of those functions.

The performance gain seems quite moderate....

'SELECT SUM(numeric_column) FROM on_memory_table' for ten million
rows and about 8 digits numeric runs for 3480 ms aganst original
3930 ms. It's 11% gain.  'SELECT SUM(int_column) FROM
on_memory_table' needed 1570 ms.

Similary 8% gain for about 30 - 50 digits numeric. Performance of
avg(numeric) made no gain in contrast.

Do you think this worth doing?

Yes, I think this is worthwhile. I'm seeing an even bigger gain, with smaller numerics. I created a table with this:

CREATE TABLE numtest AS SELECT a::numeric AS col FROM generate_series(1, 10000000) a;

And repeated this query with \timing:

SELECT SUM(col) FROM numtest;

The execution time of that query fell from about 5300 ms to 4300 ms, ie. about 20%.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to