Re: Sync scan & regression tests

Heikki Linnakangas Mon, 18 Sep 2023 03:49:53 -0700

On 05/09/2023 06:16, Tom Lane wrote:

Heikki Linnakangas <hlinn...@iki.fi> writes:

With shared_buffers='20MB', the tests passed. I'm going to change it
back to 10MB now, so that we continue to cover that case.


So chipmunk is getting through the core tests now, but instead it
is failing in contrib/pg_visibility [1]:

diff -U3 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out
 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
--- 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out
   2022-10-08 19:00:15.905074105 +0300
+++ 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
    2023-09-02 00:25:51.814148116 +0300
@@ -218,7 +218,8 @@
       0 | t           | t
       1 | t           | t
       2 | t           | t
-(3 rows)
+     3 | f           | f
+(4 rows)

select * from pg_check_frozen('copyfreeze');

   t_ctid

I find this easily reproducible by setting shared_buffers=10MB.
But I'm confused about why, because the affected test case
dates to Tomas' commit 7db0cd214 of 2021-01-17, and chipmunk
passed many times after that.  Might be worth bisecting in
the interval where chipmunk wasn't reporting?


I bisected it to this:

commit 82a4edabd272f70d044faec8cf7fd1eab92d9991 (HEAD)
Author: Andres Freund <and...@anarazel.de>
Date:   Mon Aug 14 09:54:03 2023 -0700

    hio: Take number of prior relation extensions into account

The new relation extension logic, introduced in 00d1e02be24, couldlead toslowdowns in some scenarios. E.g., when loading narrow rows into atable usingCOPY, the caller of RelationGetBufferForTuple() will only request asmallnumber of pages. Without concurrency, we just extended usingpwritev() in thatcase. However, if there is *some* concurrency, we switched betweenextendingby a small number of pages and a larger number of pages, dependingon the

    number of waiters for the relation extension logic.  However, some

filesystems, XFS in particular, do not perform well when switchingbetween

    extending files using fallocate() and pwritev().

To avoid that issue, remember the number of prior relationextensions inBulkInsertState and extend more aggressively if there were priorrelationextensions. That not just avoids the aforementioned slowdown, butalso leads

    to noticeable performance gains in other situations, primarily due to

extending more aggressively when there is no concurrency. I shouldhave done

    it this way from the get go.

    Reported-by: Masahiko Sawada <sawada.m...@gmail.com>
    Author: Andres Freund <and...@anarazel.de>

Discussion:https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=kp6mszngk3bq9yr...@mail.gmail.com

    Backpatch: 16-, where the new relation extension code was added

Before this patch, the test table was 3 pages long, now it is 4 pageswith a small shared_buffers setting.

In this test, the relation needs to be at least 3 pages long to hold allthe COPYed rows. With a larger shared_buffers, the table is extended tothree pages in a single call to heap_multi_insert(). Withshared_buffers='10 MB', the table is extended twice, becauseLimitAdditionalPins() restricts how many pages are extended in one go totwo pages. With the logic that that commit added, we first extend thetable with 2 pages, then with 2 pages again.

I think the behavior is fine. The reasons given in the commit messagemake sense. But it would be nice to silence the test failure. Somealternatives:


a) Add an alternative expected output file

b) Change the pg_visibilitymap query so that it passes even if the tablehas four pages. "select * from pg_visibility_map('copyfreeze') whereblkno <= 3";

c) Change the extension logic so that we don't extend so much when thetable is small. The efficiency of bulk extension doesn't matter when thetable is tiny, so arguably we should rather try to minimize the tablesize. If you have millions of tiny tables, allocating one extra block oneach adds up.

d) Copy fewer rows to the table in the test. If we copy only 6 rows, forexample, the table will have only two pages, regardless of shared_buffers.

I'm leaning towards d). The whole test is a little fragile, it will alsofail with a non-default block size, for example. But c) seems like asimple fix and wouldn't look too out of place in the test.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Sync scan & regression tests

Reply via email to