As part of making tuplestores faster [1], I noticed that in WindowAgg, when
we end one partition we call tuplestore_end() and then we do
tuplestore_begin_heap() again for the next partition in begin_partition()
and then go on to set up the tuplestore read pointers according to what's
required for the given frameOptions of the WindowAgg.  This might make
sense if the frameOptions could change between partitions, but they can't,
so I don't see any reason why we can't just do tuplestore_clear() at the
end of a partition.  That resets the read pointer positions back to the
start again ready for the next partition.

I wrote the attached patch and checked how it affects performance. It helps
quite a bit when there are lots of partitions.

CREATE TABLE a (a INT NOT NULL);
INSERT INTO a SELECT x FROM generate_series(1,1000000)x;
VACUUM FREEZE ANALYZE a;

bench.sql:
SELECT a,count(*) OVER (PARTITION BY a) FROM a OFFSET 1000000;

master:
$ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
latency average = 293.488 ms
latency average = 295.509 ms
latency average = 297.772 ms

patched:
$ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency
latency average = 203.234 ms
latency average = 204.538 ms
latency average = 203.877 ms

About 45% faster.

Here's the top of perf top of each:

master:
     8.61%  libc.so.6          [.] _int_malloc
     6.71%  libc.so.6          [.] _int_free
     6.42%  postgres           [.] heap_form_minimal_tuple
     6.40%  postgres           [.] tuplestore_alloc_read_pointer
     5.87%  libc.so.6          [.] unlink_chunk.constprop.0
     3.86%  libc.so.6          [.] __memmove_avx512_unaligned_erms
     3.80%  postgres           [.] AllocSetFree
     3.51%  postgres           [.] spool_tuples
     3.45%  postgres           [.] ExecWindowAgg
     2.09%  postgres           [.] tuplesort_puttuple_common
     1.92%  postgres           [.] comparetup_datum
     1.88%  postgres           [.] AllocSetAlloc
     1.85%  postgres           [.] tuplesort_heap_replace_top
     1.84%  postgres           [.] ExecStoreBufferHeapTuple
     1.69%  postgres           [.] window_gettupleslot

patched:
     8.95%  postgres           [.] ExecWindowAgg
     8.10%  postgres           [.] heap_form_minimal_tuple
     5.16%  postgres           [.] spool_tuples
     4.03%  libc.so.6          [.] __memmove_avx512_unaligned_erms
     4.02%  postgres           [.] begin_partition
     3.11%  postgres           [.] tuplesort_puttuple_common
     2.97%  postgres           [.] AllocSetAlloc
     2.96%  postgres           [.] comparetup_datum
     2.83%  postgres           [.] tuplesort_heap_replace_top
     2.79%  postgres           [.] window_gettupleslot
     2.22%  postgres           [.] AllocSetFree
     2.11%  postgres           [.] MemoryContextReset
     1.99%  postgres           [.] LogicalTapeWrite
     1.98%  postgres           [.] advance_windowaggregate

Lots less malloc/free work going on.

I'm also tempted to do a cleanup of the state machine inside
nodeWindowAgg.c as I had to add another bool flag to WindowAggState to
replace the previous test that checked if the tuplestore was empty (i.e
just finished a partition) with if (winstate->buffer == NULL). I couldn't
do that anymore due to no longer nullifying that field. I think such a
cleanup would be a separate patch. Annoyingly the extra bool is the 9th
bool flag and widens that struct by 8 bytes and leaves a 7 byte hole.

David

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=590b045c37aad44915f7f472343f24c2bafbe5d8

Attachment: v1-0001-Optimize-WindowAgg-s-use-of-tuplestores.patch
Description: Binary data

Reply via email to