Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

Thomas Munro Tue, 16 Mar 2021 19:08:54 -0700

On Tue, Mar 16, 2021 at 2:41 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Mon, Mar 15, 2021 at 8:25 PM Bharath Rupireddy
> <bharath.rupireddyforpostg...@gmail.com> wrote:
> > > > The problem with a case like REFRESH MATERIALIZED VIEW is that there's
> > > > nothing to prevent something that gets run in the course of the query
> > > > from trying to access the view (and the heavyweight lock won't prevent
> > > > that, due to group locking). That's probably a stupid thing to do,
> > > > but it can't be allowed to break the world. The other cases are safe
> > > > from that particular problem because the table doesn't exist yet.
> >
> > Please correct me if my understanding of the above comment (from the
> > commit e9baa5e9) is wrong -  even if the leader opens the matview
> > relation in exclusive mode, because of group locking(in case we allow
> > parallel workers to feed in the data to the new heap that gets created
> > for RMV, see ExecRefreshMatView->make_new_heap), can other sessions
> > still access the matview relation with older data?
> >
> > I performed below testing to prove myself wrong for the above understanding:
> > session 1:
> > 1) added few rows to the table t1 on which the mv1 is defined;
> > 2) refresh materialized view mv1;
> >
> > session 2:
> > 1) select count(*) from mv1;   ---> this query is blocked until
> > session 1's step (2) is completed and gives the latest result even if
> > the underlying data-generating query runs select part in parallel.
> >
> > Is there anything I'm missing here?
>
> I think he was talking about things like functions that try to access
> the mv from inside the same query, in a worker.  I haven't figured out
> exactly which hazards he meant.  I thought about wrong-relfilenode
> hazards and combocid hazards, but considering the way this thing
> always inserts into a fresh table before performing merge or swap
> steps later, I don't yet see why this is different from any other
> insert-from-select-with-gather.


I asked Robert if he had some hazard in mind that we haven't already
discussed here when he wrote that, and didn't recall any.  I think
we're OK here.

I added the "concurrently" variant to the regression test, just to get
it exercised too.

The documentation needed a small tweak where we have a list of
data-writing commands that are allowed to use parallelism.  That run
of sentences was getting a bit tortured so I converted it into a
bullet list; I hope I didn't upset the documentation style police.

Pushed.  Thanks for working on this!  This is really going to fly with
INSERT pushdown.  The 3 merge queries used by CONCURRENTLY will take
some more work.

Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

Reply via email to