Hey Adam,

Apologies for the delay, and as promised on discord, I did a review of the
current patch (cf 6305) and wanted to share findings that line up with the
thread’s design discussion, plus one additional correctness bug that I
could reproduce.

1. In the non-concurrent REFRESH ... WHERE .... path, the UPSERT SQL is
built using the unique index metadata. The code currently uses indnatts
when building the ON Conflict (...) target list. That includes INCLUDE
columns, so for an index like:

CREATE UNIQUE INDEX ON mv(id) INCLUDE (extra);
the generated statement becomes effectively ON CONFLICT (id, extra) ...,
which fails with:
ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
specification

The fix appears straightforward: use indnkeyatts (key attributes only) when
generating the conflict target, and also when deciding which columns are
“key” for the UPDATE SET clause. I’ve attached a minimal repro SQL script
(repro_include_issue.sql)

2. Another small test quality issue: the regression script has a comment
“Subqueries -> Error” but the expected output shows no error for the
schema-qualified subquery. There is no explicit check forbidding subqueries
in transformRefreshWhereClause(), so schema-qualified subqueries appear
allowed.

Moving on to broader questions



> I believe the issue is that the DELETE -> INSERT strategy leaves a
> consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
> reads, the moment we delete the rows, we lose the physical lock on them. If
> a concurrent transaction inserts a colliding row during that gap, the
> materialized view ends up inconsistent with the base query (or hits a
> constraint violation).


Consistency gap in the non-concurrent mode matches what I’d expect: with
ROW EXCLUSIVE you allow concurrent readers/writers, and a pure DELETE →
INSERT approach can create a window where the old tuple is gone and a
concurrent session can insert a conflicting logical row.

That said, I think it would help the patch to explicitly define the
intended safety model:
1. Is the goal to be safe against concurrent DML on base tables only (i.e.,
refresh sees a snapshot and updates MV accordingly), or also to be safe
against concurrent partial refreshes and direct writes to the MV (when
maintenance is enabled)?
2. Should the non-concurrent partial refresh be “best effort” like normal
DML (user coordinates), or should it be “maintenance-like” (serialized /
logically safe by default)?

If the intent is “safe by default”, I’d encourage documenting very clearly
what’s guaranteed, and adding regression/README-style notes for footguns

>From a reviewer standpoint, I think the feature concept is sound and
valuable, but it needs a crisp statement of semantics and safety
boundaries. The tricky part is exactly what you called out: incremental
refresh implies concurrency questions that aren’t present with full rebuild
+ strong locks.

I’m happy to keep reviewing iterations (especially around the advisory lock
approach), and I’ll attach the reproduction scripts and notes I used.

As a possible staging approach: it might be simplest to start with a
conservative serialization model for non-concurrent WHERE (while still
allowing readers), and then iterate toward finer-grained logical locking
if/when needed for throughput.


Thanks,
Dharin


On Sun, Jan 4, 2026 at 3:56 AM Adam Brusselback <[email protected]>
wrote:

> Hi all,
>
> I've been running some more concurrency tests against this patch
> (specifically looking for race conditions), and I found a flaw in the
> implementation for the  REFRESH ... WHERE ... mode (without CONCURRENTLY).
>
> I believe the issue is that the DELETE -> INSERT strategy leaves a
> consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
> reads, the moment we delete the rows, we lose the physical lock on them. If
> a concurrent transaction inserts a colliding row during that gap, the
> materialized view ends up inconsistent with the base query (or hits a
> constraint violation).
>
> I initially was using SELECT ... FOR UPDATE to lock the rows before
> modification, but that lock is (now that I know) obviously lost when the
> row is deleted.
>
> My plan is to replace that row-locking strategy with transaction-level
> advisory locks inside the refresh logic:
>
> Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid,
> hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.
>
> This effectively locks the "logical" ID of the row, preventing concurrent
> refreshes on the same ID even while the physical tuple is temporarily gone.
> Hash collisions should not have any correctness issues that I can think of.
>
> However, before I sink time into implementing that fix:
>
> Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE
> ... in core?
> If the community feels this feature is a footgun or conceptually wrong for
> Postgres, I'd rather know now before spending more time on this.
>
> If the feature concept is sound, does the advisory lock approach seem like
> the right way to handle the concurrency safety here?
>
> Thanks,
> Adam Brusselback
>

Attachment: test_include_bug.sql
Description: Binary data

Reply via email to