On 05/09/2023 06:16, Tom Lane wrote:
Heikki Linnakangas <hlinn...@iki.fi> writes:
With shared_buffers='20MB', the tests passed. I'm going to change it
back to 10MB now, so that we continue to cover that case.

So chipmunk is getting through the core tests now, but instead it
is failing in contrib/pg_visibility [1]:

diff -U3 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out
 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
--- 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/expected/pg_visibility.out
   2022-10-08 19:00:15.905074105 +0300
+++ 
/home/pgbfarm/buildroot/HEAD/pgsql.build/contrib/pg_visibility/results/pg_visibility.out
    2023-09-02 00:25:51.814148116 +0300
@@ -218,7 +218,8 @@
       0 | t           | t
       1 | t           | t
       2 | t           | t
-(3 rows)
+     3 | f           | f
+(4 rows)
select * from pg_check_frozen('copyfreeze');
   t_ctid

I find this easily reproducible by setting shared_buffers=10MB.
But I'm confused about why, because the affected test case
dates to Tomas' commit 7db0cd214 of 2021-01-17, and chipmunk
passed many times after that.  Might be worth bisecting in
the interval where chipmunk wasn't reporting?

I bisected it to this:

commit 82a4edabd272f70d044faec8cf7fd1eab92d9991 (HEAD)
Author: Andres Freund <and...@anarazel.de>
Date:   Mon Aug 14 09:54:03 2023 -0700

    hio: Take number of prior relation extensions into account

The new relation extension logic, introduced in 00d1e02be24, could lead to slowdowns in some scenarios. E.g., when loading narrow rows into a table using COPY, the caller of RelationGetBufferForTuple() will only request a small number of pages. Without concurrency, we just extended using pwritev() in that case. However, if there is *some* concurrency, we switched between extending by a small number of pages and a larger number of pages, depending on the
    number of waiters for the relation extension logic.  However, some
filesystems, XFS in particular, do not perform well when switching between
    extending files using fallocate() and pwritev().

To avoid that issue, remember the number of prior relation extensions in BulkInsertState and extend more aggressively if there were prior relation extensions. That not just avoids the aforementioned slowdown, but also leads
    to noticeable performance gains in other situations, primarily due to
extending more aggressively when there is no concurrency. I should have done
    it this way from the get go.

    Reported-by: Masahiko Sawada <sawada.m...@gmail.com>
    Author: Andres Freund <and...@anarazel.de>
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=kp6mszngk3bq9yr...@mail.gmail.com
    Backpatch: 16-, where the new relation extension code was added


Before this patch, the test table was 3 pages long, now it is 4 pages with a small shared_buffers setting.

In this test, the relation needs to be at least 3 pages long to hold all the COPYed rows. With a larger shared_buffers, the table is extended to three pages in a single call to heap_multi_insert(). With shared_buffers='10 MB', the table is extended twice, because LimitAdditionalPins() restricts how many pages are extended in one go to two pages. With the logic that that commit added, we first extend the table with 2 pages, then with 2 pages again.

I think the behavior is fine. The reasons given in the commit message make sense. But it would be nice to silence the test failure. Some alternatives:

a) Add an alternative expected output file

b) Change the pg_visibilitymap query so that it passes even if the table has four pages. "select * from pg_visibility_map('copyfreeze') where blkno <= 3";

c) Change the extension logic so that we don't extend so much when the table is small. The efficiency of bulk extension doesn't matter when the table is tiny, so arguably we should rather try to minimize the table size. If you have millions of tiny tables, allocating one extra block on each adds up.

d) Copy fewer rows to the table in the test. If we copy only 6 rows, for example, the table will have only two pages, regardless of shared_buffers.

I'm leaning towards d). The whole test is a little fragile, it will also fail with a non-default block size, for example. But c) seems like a simple fix and wouldn't look too out of place in the test.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to