Hello Amit,

03.01.2024 14:42, Amit Kapila wrote:

So I started to think about other approach: to perform unlink as it's
implemented now, but then wait until the DELETE_PENDING state is gone.

There is a comment in the code which suggests we shouldn't wait
indefinitely. See "However, we won't wait indefinitely for someone
else to close the file, as the caller might be holding locks and
blocking other backends."

Yes, I saw it, but initially I thought that we have a transient condition
there, so waiting in open() (instead of failing immediately) seemed like a
good idea then...

And the internal process is ... background writer (BgBufferSync()).

So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 tests passing.

Thus, it we want just to get rid of the test failure, maybe it's enough to
add this to the test's config...

What about checkpoints? Can't it do the same while writing the buffers?

As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
the desired effect, but I think it's not impossible in principle.
More details below.
What happens during the pg_upgrade execution is essentially:
1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
-- this command flushes file buffers as well
2) ALTER DATABASE postgres OWNER TO ...
3) COMMENT ON DATABASE "postgres" IS ...
4)     -- For binary upgrade, preserve pg_largeobject and index relfilenodes
    SELECT 
pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
    SELECT 
pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
    TRUNCATE pg_catalog.pg_largeobject;
--  ^^^ here we can get the error "could not create file "base/5/2683": File 
exists"
...

We get the effect discussed when the background writer process decides to
flush a file buffer for pg_largeobject during stage 1.
(Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
the result must be the same.)
And another important factor is shared_buffers = 1MB (set during the test).
With the default setting of 128MB I couldn't see the failure.

It can be reproduced easily (on old Windows versions) just by running
pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
default cluster)).
If an old cluster contains dozen of databases, this increases the failure
probability significantly (with 10 additional databases I've got failures
on iterations 4, 1, 6).

Please see the reproducing script attached.

Best regards,
Alexander
@echo off
REM define PGBIN and PATH
set PGBIN=%cd%\tmp_install\usr\local\pgsql\bin
set PATH=%PGBIN%;%PATH%

setlocal enabledelayedexpansion

rmdir /S /Q tmpdb
initdb -D tmpdb >initdb.log 2>&1
echo shared_buffers = 1MB>>tmpdb\postgresql.conf

set /a i = 0
:LOOP
set /a i+=1
echo ITERATION %i%

rmdir /S /Q tmpdb_old
xcopy /e /i /q /r tmpdb tmpdb_old\

pg_ctl -D tmpdb_old -l server.log start
echo SELECT concat('CREATE DATABASE db', g) FROM generate_series(1, 10) g 
\gexec | psql -d postgres
pg_ctl -D tmpdb_old stop

rmdir /S /Q tmpdb_new
xcopy /e /i /q /r tmpdb tmpdb_new\

pg_upgrade --no-sync -r -b "%PGBIN%" -B "%PGBIN%" -d tmpdb_old -D tmpdb_new
if %ERRORLEVEL% NEQ 0 goto ERR
goto LOOP

:ERR
echo ERROR
pause

:EXIT

Reply via email to