Hello. The few things I have got so far:
1) It is not required to order by random() to reproduce the issue - it could be done using queries like: BEGIN; SELECT omg.* FROM something_is_wrong_here AS omg ORDER BY value -- change is here LIMIT 1 FOR UPDATE \gset UPDATE something_is_wrong_here SET value = :value + 1 WHERE id = :id; COMMIT; But for some reason it is harder to reproduce without random in my case (typically need to wait for about a minute with 100 connections). 2) It is not an issue at table creation time. Issue is reproducible if vacuum_defer_cleanup_age set after table preparation. 3) To reproduce the issue, vacuum_defer_cleanup_age should flip xid over zero (be >= txid_current()). And it is stable.... So, for example - unable to reproduce with 733 value, but 734 gives error each time. Just a single additional txid_current() (after data is filled) fixes a crash... It looks like the first SELECT FOR UPDATE + UPDATE silently poisons everything somehow. You could use such PSQL script: DROP TABLE IF EXISTS something_is_wrong_here; CREATE TABLE something_is_wrong_here (id bigserial PRIMARY KEY, value numeric(15,4) DEFAULT 0 NOT NULL); INSERT INTO something_is_wrong_here (value) (SELECT 10000 from generate_series(0, 100)); SELECT txid_current() \gset SELECT :txid_current + 1 as txid \gset ALTER SYSTEM SET vacuum_defer_cleanup_age to :txid;SELECT pg_reload_conf(); I have attached some scripts if someone goes to reproduce. Best regards, Michail.
reproduce.sh
Description: application/shellscript
reproduce.bench
Description: Binary data
scheme.sql
Description: application/sql