Re: Improve eviction algorithm in ReorderBuffer

Dilip Kumar Mon, 11 Dec 2023 20:33:51 -0800

On Tue, Dec 12, 2023 at 9:01 AM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> Hi all,
>
> As the comment of ReorderBufferLargestTXN() says, it's very slow with
> many subtransactions:
>
> /*
>  * Find the largest transaction (toplevel or subxact) to evict (spill to 
> disk).
>  *
>  * XXX With many subtransactions this might be quite slow, because we'll have
>  * to walk through all of them. There are some options how we could improve
>  * that: (a) maintain some secondary structure with transactions sorted by
>  * amount of changes, (b) not looking for the entirely largest transaction,
>  * but e.g. for transaction using at least some fraction of the memory limit,
>  * and (c) evicting multiple transactions at once, e.g. to free a given 
> portion
>  * of the memory limit (e.g. 50%).
>  */
>
> This is because the reorderbuffer has transaction entries for each
> top-level and sub transaction, and ReorderBufferLargestTXN() walks
> through all transaction entries to pick the transaction to evict.
> I've heard the report internally that replication lag became huge when
> decoding transactions each consisting of 500k sub transactions. Note
> that ReorderBufferLargetstTXN() is used only in non-streaming mode.
>
> Here is a test script for a many subtransactions scenario. In my
> environment, the logical decoding took over 2min to decode one top
> transaction having 100k subtransctions.
>
> -----
> create table test (c int);
> create or replace function testfn (cnt int) returns void as $$
> begin
>   for i in 1..cnt loop
>     begin
>       insert into test values (i);
>     exception when division_by_zero then
>       raise notice 'caught error';
>       return;
>     end;
>   end loop;
> end;
> $$
> language plpgsql;
> select testfn(100000)
> set logical_decoding_work_mem to '4MB';
> select count(*) from pg_logical_slot_peek_changes('s', null, null)
> ----
>
> To deal with this problem, I initially thought of the idea (a)
> mentioned in the comment; use a binary heap to maintain the
> transactions sorted by the amount of changes or the size. But it seems
> not a good idea to try maintaining all transactions by  its size since
> the size of each transaction could be changed frequently.
>
> The attached patch uses a different approach that consists of three
> strategies; (1) maintain the list of transactions whose size is larger
> than 10% of logical_decoding_work_mem, and preferentially evict a
> transaction from this list. If the list is empty, all transactions are
> small enough, (2) so we evict the oldest top-level transaction from
> rb->toplevel_by_lsn list. Evicting older transactions would help in
> freeing memory blocks in GenerationContext. Finally, if this is also
> empty, (3) we evict a transaction that size is > 0. Here, we need to
> note the fact that even if a transaction is evicted the
> ReorderBufferTXN entry is not removed from rb->by_txn but its size is
> 0. In the worst case where all (quite a few) transactions are smaller
> than 10% of the memory limit, we might end up checking many
> transactions to find non-zero size transaction entries to evict. So
> the patch adds a new list to maintain all transactions that have at
> least one change in memory.
>
> Summarizing the algorithm I've implemented in the patch,
>
> 1. pick a transaction from the list of large transactions (larger than
> 10% of memory limit).
> 2. pick a transaction from the top-level transaction list in LSN order.
> 3. pick a transaction from the list of transactions that have at least
> one change in memory.
>
> With the patch, the above test case completed within 3 seconds in my
> environment.


Thanks for working on this, I think it would be good to test other
scenarios as well where this might have some negative impact and see
where we stand.  I mean
1) A scenario where suppose you have one very large transaction that
is consuming ~40% of the memory and 5-6 comparatively smaller
transactions that are just above 10% of the memory limit.  And now for
coming under the memory limit instead of getting 1 large transaction
evicted out, we are evicting out multiple times.
2) Another scenario where all the transactions are under 10% of the
memory limit but let's say there are some transactions are consuming
around 8-9% of the memory limit each but those are not very old
transactions whereas there are certain old transactions which are
fairly small and consuming under 1% of memory limit and there are many
such transactions.  So how it would affect if we frequently select
many of these transactions to come under memory limit instead of
selecting a couple of large transactions which are consuming 8-9%?

>
> As a side note, the idea (c) mentioned in the comment, evicting
> multiple transactions at once to free a given portion of the memory,
> would also help in avoiding back and forth the memory threshold. It's
> also worth considering.

Yes, I think it is worth considering.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Improve eviction algorithm in ReorderBuffer

Reply via email to