On Thu, Feb 8, 2024 at 6:33 PM Hayato Kuroda (Fujitsu) <[email protected]> wrote: > > Dear Sawada-san, > > Thanks for making v3 patchset. I have also benchmarked the case [1]. > Below results are the average of 5th, there are almost the same result > even when median is used for the comparison. On my env, the regression > cannot be seen. > > HEAD (1e285a5) HEAD + v3 patches difference > 10910.722 ms 10714.540 ms around 1.8% >
Thank you for doing the performance test!
> Also, here are mino comments for v3 set.
>
> 01.
> bh_nodeidx_entry and ReorderBufferMemTrackState is missing in typedefs.list.
Will add them.
>
> 02. ReorderBufferTXNSizeCompare
> Should we assert {ta, tb} are not NULL?
Not sure we really need it as other binaryheap users don't have such checks.
On Tue, Feb 6, 2024 at 2:45 PM Hayato Kuroda (Fujitsu)
<[email protected]> wrote:
>
> > I've run a benchmark test that I shared before[1]. Here are results of
> > decoding a transaction that has 1M subtransaction each of which has 1
> > INSERT:
> >
> > HEAD:
> > 1810.192 ms
> >
> > HEAD w/ patch:
> > 2001.094 ms
> >
> > I set a large enough value to logical_decoding_work_mem not to evict
> > any transactions. I can see about about 10% performance regression in
> > this case.
>
> Thanks for running. I think this workload is the worst and an extreme case
> which
> would not be occurred on the real system (Such a system should be fixed), so
> we
> can say that the regression is up to -10%. I felt it could be negligible but
> how
> do other think?
I think this performance regression is not acceptable. In this
workload, one transaction has 10k subtransactions and the logical
decoding becomes quite slow if logical_decoding_work_mem is not big
enough. Therefore, it's a legitimate and common approach to increase
logical_decoding_work_mem to speedup the decoding. However, with thie
patch, the decoding becomes slower than today. It's a bad idea in
general to optimize an extreme case while sacrificing the normal (or
more common) cases.
Therefore, I've improved the algorithm so that we don't touch the
max-heap at all if the number of transactions is small enough. I've
run benchmark test with two workloads:
workload-1, decode single transaction with 800k tuples (normal.sql):
* without spill
HEAD: 13235.136 ms
v3 patch: 14320.082 ms
v4 patch: 13300.665 ms
* with spill
HEAD: 22970.204 ms
v3 patch: 23625.649 ms
v4 patch: 23304.366
workload-2, decode one transaction with 100k subtransaction (many-subtxn.sql):
* without spill
HEAD: 345.718 ms
v3 patch: 409.686 ms
v4 patch: 353.026 ms
* with spill
HEAD: 136718.313 ms
v3 patch: 2675.539 ms
v4 patch: 2734.981 ms
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
v4-0001-Make-binaryheap-enlareable.patch
Description: Binary data
v4-0002-Add-functions-to-binaryheap-to-efficiently-remove.patch
Description: Binary data
v4-0003-Use-max-heap-to-evict-largest-transactions-in-Reo.patch
Description: Binary data
normal.sql
Description: Binary data
many-subtxn.sql
Description: Binary data
