Hi all,

I've been running some more concurrency tests against this patch
(specifically looking for race conditions), and I found a flaw in the
implementation for the  REFRESH ... WHERE ... mode (without CONCURRENTLY).

I believe the issue is that the DELETE -> INSERT strategy leaves a
consistency gap. Since we relied on ROW EXCLUSIVE locks to allow concurrent
reads, the moment we delete the rows, we lose the physical lock on them. If
a concurrent transaction inserts a colliding row during that gap, the
materialized view ends up inconsistent with the base query (or hits a
constraint violation).

I initially was using SELECT ... FOR UPDATE to lock the rows before
modification, but that lock is (now that I know) obviously lost when the
row is deleted.

My plan is to replace that row-locking strategy with transaction-level
advisory locks inside the refresh logic:

Before the DELETE, run a SELECT pg_advisory_xact_lock(mv_oid,
hashtext(ROW(unique_keys)::text)) for the rows matching the WHERE clause.

This effectively locks the "logical" ID of the row, preventing concurrent
refreshes on the same ID even while the physical tuple is temporarily gone.
Hash collisions should not have any correctness issues that I can think of.

However, before I sink time into implementing that fix:

Is there general interest in having REFRESH MATERIALIZED VIEW ... WHERE ...
in core?
If the community feels this feature is a footgun or conceptually wrong for
Postgres, I'd rather know now before spending more time on this.

If the feature concept is sound, does the advisory lock approach seem like
the right way to handle the concurrency safety here?

Thanks,
Adam Brusselback

Reply via email to