Hi Hackers, PG19 added support for stats on virtual generated columns [1]. Creating extended statistics on a virtual generated column whose expression can raise an error leads to ANALYZE failing repeatedly, and autovacuum retrying indefinitely. This floods the server logs and also wastes resources. Vacuum analyze on that column (without extended stats) succeeds.
In order to avoid retry storms, I think we have two options. (1)
skipping the offending row from the sample, (2) skipping the extended stats
computation for that table with a warning message. At least this avoid
autovacuum infinite retry. Attached a draft patch for the option (2).
Thoughts?
Repro:
CREATE TABLE t (
id int PRIMARY KEY,
a int,
gen int GENERATED ALWAYS AS (100 / a) VIRTUAL
);
INSERT INTO t VALUES (1, 10), (2, 5), (3, 0);
-- This succeeds (per-column stats don't evaluate the expression for
every row)
ANALYZE t;
-- Add extended statistics referencing the virtual gen col
CREATE STATISTICS t_stat ON a, gen FROM t;
-- This fails
ANALYZE t;
-- ERROR: division by zero
-- this succeeds
ANALYZE t(gen)
[1]:
https://www.postgresql.org/message-id/flat/20250422181006.dd6f9d1d81299f5b2ad55e1a%40sraoss.co.jp
Thanks,
Satya
v1-0001-fix-analyze-extended-stats-virtual-gen-col.patch
Description: Binary data
v1-0001-test-analyze-extended-stats-virtual-gen-col.patch
Description: Binary data
