subject:"generic plans and \"initial\" pruning"

Re: generic plans and "initial" pruning

2024-06-19 Thread Alvaro Herrera

I had occasion to run the same benchmark you described in the initial
email in this thread.  To do so I applied patch series v49 on top of
07cb29737a4e, which is just one that happened to have the same date as
v49.

I then used a script like this (against a server having
plan_cache_mode=force_generic_mode)

for numparts in 0 1 2 4 8 16 32 48 64 80 81 96 127 128 160 200 256 257 288 300 
384 512 1024 1536 2048;  do
pgbench testdb -i --partitions=$numparts 2>/dev/null
echo -ne "$numparts\t"
pgbench -n testdb -S -T30 -Mprepared | grep "^tps" | sed -e 's/^tps = 
\([0-9.]*\) .*/\1/'
done

and did the same with the commit mentioned above (that is, unpatched).
I got this table as result

 partitions │   patched│  07cb29737a  
┼──┼──
  0 │ 65632.090431 │ 68967.712741
  1 │ 68096.641831 │ 65356.587223
  2 │ 59456.507575 │ 60884.679464
  4 │62097.426 │ 59698.747104
  8 │ 58044.311175 │ 57817.104562
 16 │ 59741.926563 │ 52549.916262
 32 │ 59261.693449 │ 44815.317215
 48 │ 59047.125629 │ 38362.123652
 64 │ 59748.738797 │ 34051.158525
 80 │ 59276.839183 │ 32026.135076
 81 │ 62318.572932 │ 30418.122933
 96 │ 59678.857163 │ 28478.113651
127 │ 58761.960028 │ 24272.303742
128 │ 59934.268306 │ 24275.214593
160 │ 56688.790899 │ 21119.043564
200 │ 56323.188599 │ 18111.212849
256 │  55915.22466 │ 14753.953709
257 │ 57810.530461 │ 15093.497575
288 │ 56874.780092 │ 13873.332162
300 │ 57222.056549 │ 13463.768946
384 │  54073.77295 │ 11183.558339
512 │ 37503.766847 │   8114.32532
   1024 │ 42746.866448 │   4468.41359
   1536 │  39500.58411 │  3049.984599
   2048 │ 36988.519486 │  2269.362006

where already at 16 partitions we can see that things are going downhill
with the unpatched code.  (However, what happens when the table is not
partitioned looks a bit funny.)

I hope we can get this new executor code in 18.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

Re: generic plans and "initial" pruning

2024-05-20 Thread Amit Langote

On Sun, May 19, 2024 at 9:39 AM David Rowley  wrote:
> For #1, the locks taken for SELECT queries are less likely to conflict
> with other locks obtained by PostgreSQL, but at least at the moment if
> someone is getting deadlocks with a DDL type operation, they can
> change their query or DDL script so that locks are taken in the same
> order.  If we allowed executor startup to do this then if someone
> comes complaining that PG18 deadlocks when PG17 didn't we'd just have
> to tell them to live with it.  There's a comment at the bottom of
> find_inheritance_children_extended() just above the qsort() which
> explains about the deadlocking issue.

Thought to chime in on this.

A deadlock may occur with the execution-time locking proposed in the
patch if the DDL script makes assumptions about how a cached plan's
execution determines the locking order for children of multiple parent
relations. Specifically, the deadlock can happen if the script tries
to lock the child relations directly, instead of locking them through
their respective parent relations.  The patch doesn't change the order
of locking of relations mentioned in the query, because that's defined
in AcquirePlannerLocks().

--
Thanks, Amit Langote

Re: generic plans and "initial" pruning

2024-05-18 Thread David Rowley

On Sun, 19 May 2024 at 13:27, Tom Lane  wrote:
>
> David Rowley  writes:
> > 1. No ability to control the order that the locks are obtained. The
> > order in which the locks are taken will be at the mercy of the plan
> > the planner chooses.
>
> I do not think I buy this argument, because plancache.c doesn't
> provide any "ability to control the order" today, and never has.
> The order in which AcquireExecutorLocks re-gets relation locks is only
> weakly related to the order in which the parser/planner got them
> originally.  The order in which AcquirePlannerLocks re-gets the locks
> is even less related to the original.  This doesn't cause any big
> problems that I'm aware of, because these locks are fairly weak.

It may not bite many people, it's just that if it does, I don't see
what we could do to help those people. At the moment we could tell
them to adjust their DDL script to obtain the locks in the same order
as their query.  With your idea that cannot be done as the order could
change when the planner switches the join order.

> I think we do have a guarantee that for partitioned tables, parents
> will be locked before children, and that's probably valuable.
> But an executor-driven lock order could preserve that property too.

I think you'd have to lock the parent before the child. That would
remain true and consistent anyway when taking locks during a
breadth-first plan traversal.

> > For #3, I've been thinking about what improvements we can do to make
> > the executor more efficient. In [1], Andres talks about some very
> > interesting things. In particular, in his email items 3) and 5) are
> > relevant here. If we did move lots of executor startup code into the
> > planner, I think it would be possible to one day get rid of executor
> > startup and have the plan record how much memory is needed for the
> > non-readonly part of the executor state and tag each plan node with
> > the offset in bytes they should use for their portion of the executor
> > working state.
>
> I'm fairly skeptical about that idea.  The entire reason we have an
> issue here is that we want to do runtime partition pruning, which
> by definition can't be done at plan time.  So I doubt it's going
> to play nice with what we are trying to accomplish in this thread.

I think we could have both, providing there was a way to still
traverse the executor state tree in EXPLAIN. We'd need a way to skip
portions of the plan that are not relevant or could be invalid for the
current execution. e.g can't show Index Scan because index has been
dropped.

> > I think what Amit had before your objection was starting to turn into
> > something workable and we should switch back to working on that.
>
> The reason I posted this idea was that I didn't think the previously
> existing patch looked promising at all.

Ok.  It would be good if you could expand on that so we could
determine if there's some fundamental reason it can't work or if
that's because you were blinded by your epiphany and didn't give that
any thought after thinking of the alternative idea.

I've gone to effort to point out things that I think are concerning
with your idea. It would be good if you could do the same for the
previous patch other than "it didn't look promising". It's pretty hard
for me to argue with that level of detail.

David

Re: generic plans and "initial" pruning

2024-05-18 Thread Tom Lane

David Rowley  writes:
> With the caveat of not yet having looked at the latest patch, my
> thoughts are that having the executor startup responsible for taking
> locks is a bad idea and I don't think we should go down this path.

OK, it's certainly still up for argument, but ...

> 1. No ability to control the order that the locks are obtained. The
> order in which the locks are taken will be at the mercy of the plan
> the planner chooses.

I do not think I buy this argument, because plancache.c doesn't
provide any "ability to control the order" today, and never has.
The order in which AcquireExecutorLocks re-gets relation locks is only
weakly related to the order in which the parser/planner got them
originally.  The order in which AcquirePlannerLocks re-gets the locks
is even less related to the original.  This doesn't cause any big
problems that I'm aware of, because these locks are fairly weak.

I think we do have a guarantee that for partitioned tables, parents
will be locked before children, and that's probably valuable.
But an executor-driven lock order could preserve that property too.

> 2. It introduces lots of complexity regarding how to cleanly clean up
> after a failed executor startup which is likely to make exec startup
> slower and the code more complex

Perhaps true, I'm not sure.  But the patch we'd been discussing
before this proposal was darn complex as well.

> 3. It puts us even further down the path of actually needing an
> executor startup phase.

Huh?  We have such a thing already.

> For #1, the locks taken for SELECT queries are less likely to conflict
> with other locks obtained by PostgreSQL, but at least at the moment if
> someone is getting deadlocks with a DDL type operation, they can
> change their query or DDL script so that locks are taken in the same
> order.  If we allowed executor startup to do this then if someone
> comes complaining that PG18 deadlocks when PG17 didn't we'd just have
> to tell them to live with it.  There's a comment at the bottom of
> find_inheritance_children_extended() just above the qsort() which
> explains about the deadlocking issue.

The reason it's important there is that function is (sometimes)
used for lock modes that *are* exclusive.

> For #3, I've been thinking about what improvements we can do to make
> the executor more efficient. In [1], Andres talks about some very
> interesting things. In particular, in his email items 3) and 5) are
> relevant here. If we did move lots of executor startup code into the
> planner, I think it would be possible to one day get rid of executor
> startup and have the plan record how much memory is needed for the
> non-readonly part of the executor state and tag each plan node with
> the offset in bytes they should use for their portion of the executor
> working state.

I'm fairly skeptical about that idea.  The entire reason we have an
issue here is that we want to do runtime partition pruning, which
by definition can't be done at plan time.  So I doubt it's going
to play nice with what we are trying to accomplish in this thread.

Moreover, while "replace a bunch of small pallocs with one big one"
would save some palloc effort, what are you going to do to ensure
that that memory has the right initial contents?  I think this idea is
likely to make the executor a great deal more notationally complex
without actually buying all that much.  Maybe Andres can make it work,
but I don't want to contort other parts of the system design on the
purely hypothetical basis that this might happen.

> I think what Amit had before your objection was starting to turn into
> something workable and we should switch back to working on that.

The reason I posted this idea was that I didn't think the previously
existing patch looked promising at all.

regards, tom lane

Re: generic plans and "initial" pruning

2024-05-18 Thread David Rowley

On Fri, 20 Jan 2023 at 08:39, Tom Lane  wrote:
> I spent some time re-reading this whole thread, and the more I read
> the less happy I got.  We are adding a lot of complexity and introducing
> coding hazards that will surely bite somebody someday.  And after awhile
> I had what felt like an epiphany: the whole problem arises because the
> system is wrongly factored.  We should get rid of AcquireExecutorLocks
> altogether, allowing the plancache to hand back a generic plan that
> it's not certain of the validity of, and instead integrate the
> responsibility for acquiring locks into executor startup.  It'd have
> to be optional there, since we don't need new locks in the case of
> executing a just-planned plan; but we can easily add another eflags
> bit (EXEC_FLAG_GET_LOCKS or so).  Then there has to be a convention
> whereby the ExecInitNode traversal can return an indicator that
> "we failed because the plan is stale, please make a new plan".

I also reread the entire thread up to this point yesterday. I've also
been thinking about this recently as Amit has mentioned it to me a few
times over the past few months.

With the caveat of not yet having looked at the latest patch, my
thoughts are that having the executor startup responsible for taking
locks is a bad idea and I don't think we should go down this path. My
reasons are:

1. No ability to control the order that the locks are obtained. The
order in which the locks are taken will be at the mercy of the plan
the planner chooses.
2. It introduces lots of complexity regarding how to cleanly clean up
after a failed executor startup which is likely to make exec startup
slower and the code more complex
3. It puts us even further down the path of actually needing an
executor startup phase.

For #1, the locks taken for SELECT queries are less likely to conflict
with other locks obtained by PostgreSQL, but at least at the moment if
someone is getting deadlocks with a DDL type operation, they can
change their query or DDL script so that locks are taken in the same
order.  If we allowed executor startup to do this then if someone
comes complaining that PG18 deadlocks when PG17 didn't we'd just have
to tell them to live with it.  There's a comment at the bottom of
find_inheritance_children_extended() just above the qsort() which
explains about the deadlocking issue.

I don't have much extra to say about #2.  As mentioned, I've not
looked at the patch. On paper, it sounds possible, but it also sounds
bug-prone and ugly.

For #3, I've been thinking about what improvements we can do to make
the executor more efficient. In [1], Andres talks about some very
interesting things. In particular, in his email items 3) and 5) are
relevant here. If we did move lots of executor startup code into the
planner, I think it would be possible to one day get rid of executor
startup and have the plan record how much memory is needed for the
non-readonly part of the executor state and tag each plan node with
the offset in bytes they should use for their portion of the executor
working state. This would be a single memory allocation for the entire
plan.  The exact details are not important here, but I feel like if we
load up executor startup with more responsibilities, it'll just make
doing something like this harder.  The init run-time pruning code that
I worked on likely already has done that, but I don't think it's
closed the door on it as it might just mean allocating more executor
state memory than we need to. Providing the plan node records the
offset into that memory, I think it could be made to work, just with
the inefficiency of having a (possibly) large unused hole in that
state memory.

As far as I understand it, your objection to the original proposal is
just on the grounds of concerns about introducing hazards that could
turn into bugs.  I think we could come up with some way to make the
prior method of doing pruning before executor startup work. I think
what Amit had before your objection was starting to turn into
something workable and we should switch back to working on that.

David

[1] 
https://www.postgresql.org/message-id/20180525033538.6ypfwcqcxce6zkjj%40alap3.anarazel.de

Re: generic plans and "initial" pruning

2024-04-08 Thread Amit Langote

Hi Andrey,

On Sun, Mar 31, 2024 at 2:03 PM Andrey M. Borodin  wrote:
> > On 6 Dec 2023, at 23:52, Robert Haas  wrote:
> >
> >  I hope that it's at least somewhat useful.
>
> > On 5 Jan 2024, at 15:46, vignesh C  wrote:
> >
> > There is a leak reported
>
> Hi Amit,
>
> this is a kind reminder that some feedback on your patch[0] is waiting for 
> your reply.
> Thank you for your work!

Thanks for moving this to the next CF.

My apologies (especially to Robert) for not replying on this thread
for a long time.

I plan to start working on this soon.

--
Thanks, Amit Langote

Re: generic plans and "initial" pruning

2024-03-30 Thread Andrey M. Borodin

> On 6 Dec 2023, at 23:52, Robert Haas  wrote:
> 
>  I hope that it's at least somewhat useful.
> 

> On 5 Jan 2024, at 15:46, vignesh C  wrote:
> 
> There is a leak reported 

Hi Amit,

this is a kind reminder that some feedback on your patch[0] is waiting for your 
reply.
Thank you for your work!

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/47/3478/

Re: generic plans and "initial" pruning

2024-01-05 Thread vignesh C

On Mon, 20 Nov 2023 at 10:00, Amit Langote  wrote:
>
> On Thu, Sep 28, 2023 at 5:26 PM Amit Langote  wrote:
> > On Tue, Sep 26, 2023 at 10:06 PM Amit Langote  
> > wrote:
> > > After sleeping on this, I think we do need the checks after all the
> > > ExecInitNode() calls too, because we have many instances of the code
> > > like the following one:
> > >
> > > outerPlanState(gatherstate) = ExecInitNode(outerNode, estate, eflags);
> > > tupDesc = ExecGetResultType(outerPlanState(gatherstate));
> > > 
> > >
> > > If outerNode is a SeqScan and ExecInitSeqScan() returned early because
> > > ExecOpenScanRelation() detected that plan was invalidated, then
> > > tupDesc would be NULL in this case, causing the code to crash.
> > >
> > > Now one might say that perhaps we should only add the
> > > is-CachedPlan-valid test in the instances where there is an actual
> > > risk of such misbehavior, but that could lead to confusion, now or
> > > later.  It seems better to add them after every ExecInitNode() call
> > > while we're inventing the notion, because doing so relieves the
> > > authors of future enhancements of the ExecInit*() routines from
> > > worrying about any of this.
> > >
> > > Attached 0003 should show how that turned out.
> > >
> > > Updated 0002 as mentioned in the previous reply -- setting pointers to
> > > NULL after freeing them more consistently across various ExecEnd*()
> > > routines and using the `if (pointer != NULL)` style over the `if
> > > (pointer)` more consistently.
> > >
> > > Updated 0001's commit message to remove the mention of its relation to
> > > any future commits.  I intend to push it tomorrow.
> >
> > Pushed that one.  Here are the rebased patches.
> >
> > 0001 seems ready to me, but I'll wait a couple more days for others to
> > weigh in.  Just to highlight a kind of change that others may have
> > differing opinions on, consider this hunk from the patch:
> >
> > -   MemoryContextDelete(node->aggcontext);
> > +   if (node->aggcontext != NULL)
> > +   {
> > +   MemoryContextDelete(node->aggcontext);
> > +   node->aggcontext = NULL;
> > +   }
> > ...
> > +   ExecEndNode(outerPlanState(node));
> > +   outerPlanState(node) = NULL;
> >
> > So the patch wants to enhance the consistency of setting the pointer
> > to NULL after freeing part.  Robert mentioned his preference for doing
> > it in the patch, which I agree with.
>
> Rebased.

There is a leak reported at [1], details for the same is available at [2]:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/select_views.out
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/select_views.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/select_views.out
2023-12-19 23:00:04.677385000 +
+++ 
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/select_views.out
2023-12-19 23:06:26.870259000 +
@@ -1288,6 +1288,7 @@
   (102, '2011-10-12', 120),
   (102, '2011-10-28', 200),
   (103, '2011-10-15', 480);
+WARNING:  resource was not closed: relation "customer_pkey"
 CREATE VIEW my_property_normal AS
SELECT * FROM customer WHERE name = current_user;
 CREATE VIEW my_property_secure WITH (security_barrier) A

[1] - https://cirrus-ci.com/task/6494009196019712
[2] - 
https://api.cirrus-ci.com/v1/artifact/task/6494009196019712/testrun/build/testrun/regress-running/regress/regression.diffs

Regards,
Vingesh

Re: generic plans and "initial" pruning

2023-12-06 Thread Robert Haas

Reviewing 0001:

Perhaps ExecEndCteScan needs an adjustment. What if node->leader was never set?

Other than that, I think this is in good shape. Maybe there are other
things we'd want to adjust here, or maybe there aren't, but there
doesn't seem to be any good reason to bundle more changes into the
same patch.

Reviewing 0002 and beyond:

I think it's good that you have tried to divide up a big change into
little pieces, but I'm finding the result difficult to understand. It
doesn't really seem like each patch stands on its own. I keep flipping
between patches to try to understand why other patches are doing
things, which kind of defeats the purpose of splitting stuff up. For
example, 0002 adds a NodeTag field to QueryDesc, but it doesn't even
seem to initialize that field, let alone use it for anything. It adds
a CachedPlan pointer to QueryDesc too, and adapts CreateQueryDesc to
allow one as an argument, but none of the callers actually pass
anything. I suspect that that the first change (adding a NodeTag)
field is a bug, and that the second one is intentional, but it's hard
to tell without flipping through all of the other patches to see how
they build on what 0002 does. And even when something isn't a bug,
it's also hard to tell whether it's the right design, again because
you can't consider each patch in isolation. Ideally, splitting a patch
set should bring related changes together in a single patch and push
unrelated changes apart into different patches, but I don't really see
this particular split having that effect.

There is a chicken and egg problem here, to be fair. If we add code
that can make plan initialization fail without teaching the planner to
cope with failures, then we have broken the server, and if we do the
reverse, then we have a bunch of dead code that we can't test. Neither
is very satisfactory. But I still hope there's some better division
possible than what you have here currently. For instance, I wonder if
it would be possible to add all the stuff to cope with plan
initialization failing and then have a test patch that makes
initialization randomly fail with some probability (or maybe you can
even cause failures at specific points). Then you could test that
infrastructure by running the regression tests in a loop with various
values of the relevant setting.

Another overall comment that I have is that it doesn't feel like
there's enough high-level explanation of the design. I don't know how
much of that should go in comments vs. commit messages vs. a README
that accompanies the patch set vs. whatever else, and I strongly
suspect that some of the stuff that seems confusing now is actually
stuff that at one point I understood and have just forgotten about.
But rediscovering it shouldn't be quite so hard. For example, consider
the question "why are we storing the CachedPlan in the QueryDesc?" I
eventually figured out that it's so that ExecPlanStillValid can call
CachedPlanStillValid which can then consult the cached plan's is_valid
flag. But is that the only access to the CachedPlan that we ever
expect to occur via the QueryDesc? If not, what else is allowable? If
so, why not just store a Boolean in the QueryDesc and arrange for the
plancache to be able to flip it when invalidating? I'm not saying
that's a better design -- I'm saying that it looks hard to understand
your thought process from the patch set. And also, you know, assuming
the current design is correct, could there be some way of dividing up
the patch set so that this one change, where we add the CachedPlan to
the QueryDesc, isn't so spread out across the whole series?

Some more detailed review comments below. This isn't really a full
review because I don't understand the patches well enough for that,
but it's some stuff I noticed.

In 0002:

+ * result-rel info, etc.  Also, we don't pass the parent't copy of the

Typo.

+/*
+ * All the necessary locks must already have been taken when
+ * initializing the parent's copy of subplanstate, so the CachedPlan,
+ * if any, should not have become invalid during ExecInitNode().
+ */
+Assert(ExecPlanStillValid(rcestate));

This -- and the other similar instance -- feel very uncomfortable.
There's a lot of action at a distance here. If this assertion ever
failed, how would anyone ever figure out what went wrong? You wouldn't
for example know which object got invalidated, presumably
corresponding to a lock that you failed to take. Unless the problem
were easily reproducible in a test environment, trying to guess what
happened might be pretty awful; imagine seeing this assertion failure
in a customer log file and trying to back-track to the find the
underlying bug. A further problem is that what would actually happen
is you *wouldn't* see this in the customer log file, because
assertions wouldn't be enabled, so you'd just see queries occasionally
returning wrong answers, I guess? Or crashing in some other random
part of the

Re: generic plans and "initial" pruning

2023-09-25 Thread Amit Langote

On Wed, Sep 6, 2023 at 11:20 PM Robert Haas  wrote:
> On Wed, Sep 6, 2023 at 5:12 AM Amit Langote  wrote:
> > Attached updated patches.  Thanks for the review.
>
> I think 0001 looks ready to commit. I'm not sure that the commit
> message needs to mention future patches here, since this code cleanup
> seems like a good idea regardless, but if you feel otherwise, fair
> enough.

OK, I will remove the mention of future patches.

> On 0002, some questions:
>
> - In ExecEndLockRows, is the call to EvalPlanQualEnd a concern? i.e.
> Does that function need any adjustment?

I think it does with the patch as it stands.  It needs to have an
early exit at the top if parentestate is NULL, which it would be if
EvalPlanQualInit() wasn't called from an ExecInit*() function.

Though, as I answer below your question as to whether there is
actually any need to interrupt all of the ExecInit*() routines,
nothing needs to change in ExecEndLockRows().

> - In ExecEndMemoize, should there be a null-test around
> MemoryContextDelete(node->tableContext) as we have in
> ExecEndRecursiveUnion, ExecEndSetOp, etc.?

Oops, you're right.  Added.

> I wonder how we feel about setting pointers to NULL after freeing the
> associated data structures. The existing code isn't consistent about
> doing that, and making it do so would be a fairly large change that
> would bloat this patch quite a bit. On the other hand, I think it's a
> good practice as a general matter, and we do do it in some ExecEnd
> functions.

I agree that it might be worthwhile to take the opportunity and make
the code more consistent in this regard.  So, I've included those
changes too in 0002.

> On 0003, I have some doubt about whether we really have all the right
> design decisions in detail here:
>
> - Why have this weird rule where sometimes we return NULL and other
> times the planstate? Is there any point to such a coding rule? Why not
> just always return the planstate?
>
> - Is there any point to all of these early exit cases? For example, in
> ExecInitBitmapAnd, why exit early if initialization fails? Why not
> just plunge ahead and if initialization failed the caller will notice
> that and when we ExecEndNode some of the child node pointers will be
> NULL but who cares? The obvious disadvantage of this approach is that
> we're doing a bunch of unnecessary initialization, but we're also
> speeding up the common case where we don't need to abort by avoiding a
> branch that will rarely be taken. I'm not quite sure what the right
> thing to do is here.
>
> - The cases where we call ExecGetRangeTableRelation or
> ExecOpenScanRelation are a bit subtler ... maybe initialization that
> we're going to do later is going to barf if the tuple descriptor of
> the relation isn't what we thought it was going to be. In that case it
> becomes important to exit early. But if that's not actually a problem,
> then we could apply the same principle here also -- don't pollute the
> code with early-exit cases, just let it do its thing and sort it out
> later. Do you know what the actual problems would be here if we didn't
> exit early in these cases?
>
> - Depending on the answers to the above points, one thing we could
> think of doing is put an early exit case into ExecInitNode itself: if
> (unlikely(!ExecPlanStillValid(whatever)) return NULL. Maybe Andres or
> someone is going to argue that that checks too often and is thus too
> expensive, but it would be a lot more maintainable than having similar
> checks strewn throughout the ExecInit* functions. Perhaps it deserves
> some thought/benchmarking. More generally, if there's anything we can
> do to centralize these checks in fewer places, I think that would be
> worth considering. The patch isn't terribly large as it stands, so I
> don't necessarily think that this is a critical issue, but I'm just
> wondering if we can do better. I'm not even sure that it would be too
> expensive to just initialize the whole plan always, and then just do
> one test at the end. That's not OK if the changed tuple descriptor (or
> something else) is going to crash or error out in a funny way or
> something before initialization is completed, but if it's just going
> to result in burning a few CPU cycles in a corner case, I don't know
> if we should really care.

I thought about this some and figured that adding the
is-CachedPlan-still-valid tests in the following places should suffice
after all:

1. In InitPlan() right after the top-level ExecInitNode() calls
2. In ExecInit*() functions of Scan nodes, right after
ExecOpenScanRelation() calls

CachedPlans can only become invalid because of concurrent changes to
the inheritance child tables referenced in the plan.  Only the
following schema modifications of child tables are possible to be
performed concurrently:

* Addition of a column (allowed only if traditional inheritance child)
* Addition of an index
* Addition of a non-index constraint
* Dropping of a child table (allowed only if traditional

Re: generic plans and "initial" pruning

2023-09-06 Thread Robert Haas

On Wed, Sep 6, 2023 at 5:12 AM Amit Langote  wrote:
> Attached updated patches.  Thanks for the review.

I think 0001 looks ready to commit. I'm not sure that the commit
message needs to mention future patches here, since this code cleanup
seems like a good idea regardless, but if you feel otherwise, fair
enough.

On 0002, some questions:

- In ExecEndLockRows, is the call to EvalPlanQualEnd a concern? i.e.
Does that function need any adjustment?
- In ExecEndMemoize, should there be a null-test around
MemoryContextDelete(node->tableContext) as we have in
ExecEndRecursiveUnion, ExecEndSetOp, etc.?

I wonder how we feel about setting pointers to NULL after freeing the
associated data structures. The existing code isn't consistent about
doing that, and making it do so would be a fairly large change that
would bloat this patch quite a bit. On the other hand, I think it's a
good practice as a general matter, and we do do it in some ExecEnd
functions.

On 0003, I have some doubt about whether we really have all the right
design decisions in detail here:

- Why have this weird rule where sometimes we return NULL and other
times the planstate? Is there any point to such a coding rule? Why not
just always return the planstate?

- Is there any point to all of these early exit cases? For example, in
ExecInitBitmapAnd, why exit early if initialization fails? Why not
just plunge ahead and if initialization failed the caller will notice
that and when we ExecEndNode some of the child node pointers will be
NULL but who cares? The obvious disadvantage of this approach is that
we're doing a bunch of unnecessary initialization, but we're also
speeding up the common case where we don't need to abort by avoiding a
branch that will rarely be taken. I'm not quite sure what the right
thing to do is here.

- The cases where we call ExecGetRangeTableRelation or
ExecOpenScanRelation are a bit subtler ... maybe initialization that
we're going to do later is going to barf if the tuple descriptor of
the relation isn't what we thought it was going to be. In that case it
becomes important to exit early. But if that's not actually a problem,
then we could apply the same principle here also -- don't pollute the
code with early-exit cases, just let it do its thing and sort it out
later. Do you know what the actual problems would be here if we didn't
exit early in these cases?

- Depending on the answers to the above points, one thing we could
think of doing is put an early exit case into ExecInitNode itself: if
(unlikely(!ExecPlanStillValid(whatever)) return NULL. Maybe Andres or
someone is going to argue that that checks too often and is thus too
expensive, but it would be a lot more maintainable than having similar
checks strewn throughout the ExecInit* functions. Perhaps it deserves
some thought/benchmarking. More generally, if there's anything we can
do to centralize these checks in fewer places, I think that would be
worth considering. The patch isn't terribly large as it stands, so I
don't necessarily think that this is a critical issue, but I'm just
wondering if we can do better. I'm not even sure that it would be too
expensive to just initialize the whole plan always, and then just do
one test at the end. That's not OK if the changed tuple descriptor (or
something else) is going to crash or error out in a funny way or
something before initialization is completed, but if it's just going
to result in burning a few CPU cycles in a corner case, I don't know
if we should really care.

- The "At this point" comments don't give any rationale for why we
shouldn't have received any such invalidation messages. That  makes
them fairly useless; the Assert by itself clarifies that you think
that case shouldn't happen. The comment's job is to justify that
claim.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2023-09-05 Thread Robert Haas

On Tue, Sep 5, 2023 at 3:13 AM Amit Langote  wrote:
> Attached 0001 removes unnecessary cleanup calls from ExecEnd*() routines.

It also adds a few random Assert()s to verify that unrelated pointers
are not NULL. I suggest that it shouldn't do that.

The commit message doesn't mention the removal of the calls to
ExecDropSingleTupleTableSlot. It's not clear to me why that's OK and I
think it would be nice to mention it in the commit message, assuming
that it is in fact OK.

I suggest changing the subject line of the commit to something like
"Remove obsolete executor cleanup code."

> 0002 adds NULLness checks in ExecEnd*() routines on some pointers that
> may not be initialized by the corresponding ExecInit*() routines in
> the case where it returns early.

I think you should only add these where it's needed. For example, I
think list_free_deep(NIL) is fine.

The changes to ExecEndForeignScan look like they include stuff that
belongs in 0001.

Personally, I prefer explicit NULL-tests i.e. if (x != NULL) to
implicit ones like if (x), but opinions vary.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2023-08-28 Thread Robert Haas

On Fri, Aug 11, 2023 at 9:50 AM Amit Langote  wrote:
> After removing the unnecessary cleanup code from most node types’ ExecEnd* 
> functions, one thing I’m tempted to do is remove the functions that do 
> nothing else but recurse to close the outerPlan, innerPlan child nodes.  We 
> could instead have ExecEndNode() itself recurse to close outerPlan, innerPlan 
> child nodes at the top, which preserves the close-child-before-self behavior 
> for Gather* nodes, and close node type specific cleanup functions for nodes 
> that do have any local cleanup to do.  Perhaps, we could even use 
> planstate_tree_walker() called at the top instead of the usual bottom so that 
> nodes with a list of child subplans like Append also don’t need to have their 
> own ExecEnd* functions.

I think 0001 needs to be split up. Like, this is code cleanup:

-   /*
-* Free the exprcontext
-*/
-   ExecFreeExprContext(>ss.ps);

This is providing for NULL pointers where we don't currently:

-   list_free_deep(aggstate->hash_batches);
+   if (aggstate->hash_batches)
+   list_free_deep(aggstate->hash_batches);

And this is the early return mechanism per se:

+   if (!ExecPlanStillValid(estate))
+   return aggstate;

I think at least those 3 kinds of changes deserve to be in separate
patches with separate commit messages explaining the rationale behind
each e.g. "Remove unnecessary cleanup calls in ExecEnd* functions.
These calls are no longer required, because . Removing them
saves a few CPU cycles and simplifies planned refactoring, so do
that."

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2023-08-11 Thread Amit Langote

On Fri, Aug 11, 2023 at 14:31 Amit Langote  wrote:

> On Wed, Aug 9, 2023 at 1:05 AM Robert Haas  wrote:
> > On Tue, Aug 8, 2023 at 10:32 AM Amit Langote 
> wrote:
> > > But should ExecInitNode() subroutines return the partially initialized
> > > PlanState node or NULL on detecting invalidation?  If I'm
> > > understanding how you think this should be working correctly, I think
> > > you mean the former, because if it were the latter, ExecInitNode()
> > > would end up returning NULL at the top for the root and then there's
> > > nothing to pass to ExecEndNode(), so no way to clean up to begin with.
> > > In that case, I think we will need to adjust ExecEndNode() subroutines
> > > to add `if (node->ps.ps_ResultTupleSlot)` in the above code, for
> > > example.  That's something Tom had said he doesn't like very much [1].
> >
> > Yeah, I understood Tom's goal as being "don't return partially
> > initialized nodes."
> >
> > Personally, I'm not sure that's an important goal. In fact, I don't
> > even think it's a desirable one. It doesn't look difficult to audit
> > the end-node functions for cases where they'd fail if a particular
> > pointer were NULL instead of pointing to some real data, and just
> > fixing all such cases to have NULL-tests looks like purely mechanical
> > work that we are unlikely to get wrong. And at least some cases
> > wouldn't require any changes at all.
> >
> > If we don't do that, the complexity doesn't go away. It just moves
> > someplace else. Presumably what we do in that case is have
> > ExecInitNode functions undo any initialization that they've already
> > done before returning NULL. There are basically two ways to do that.
> > Option one is to add code at the point where they return early to
> > clean up anything they've already initialized, but that code is likely
> > to substantially duplicate whatever the ExecEndNode function already
> > knows how to do, and it's very easy for logic like this to get broken
> > if somebody rearranges an ExecInitNode function down the road.
>
> Yeah, I too am not a fan of making ExecInitNode() clean up partially
> initialized nodes.
>
> > Option
> > two is to rearrange the ExecInitNode functions now, to open relations
> > or recurse at the beginning, so that we discover the need to fail
> > before we initialize anything. That restricts our ability to further
> > rearrange the functions in future somewhat, but more importantly,
> > IMHO, it introduces more risk right now. Checking that the ExecEndNode
> > function will not fail if some pointers are randomly null is a lot
> > easier than checking that changing the order of operations in an
> > ExecInitNode function breaks nothing.
> >
> > I'm not here to say that we can't do one of those things. But I think
> > adding null-tests to ExecEndNode functions looks like *far* less work
> > and *way* less risk.
>
> +1
>
> > There's a second issue here, too, which is when we abort ExecInitNode
> > partway through, how do we signal that? You're rightly pointing out
> > here that if we do that by returning NULL, then we don't do it by
> > returning a pointer to the partially initialized node that we just
> > created, which means that we either need to store those partially
> > initialized nodes in a separate data structure as you propose to do in
> > 0001,
> >
> > or else we need to pick a different signalling convention. We
> > could change (a) ExecInitNode to have an additional argument, bool
> > *kaboom, or (b) we could make it return bool and return the node
> > pointer via a new additional argument, or (c) we could put a Boolean
> > flag into the estate and let the function signal failure by flipping
> > the value of the flag.
>
> The failure can already be detected by seeing that
> ExecPlanIsValid(estate) is false.  The question is what ExecInitNode()
> or any of its subroutines should return once it is.  I think the
> following convention works:
>
> Return partially initialized state from ExecInit* function where we
> detect the invalidation after calling ExecInitNode() on a child plan,
> so that ExecEndNode() can recurse to clean it up.
>
> Return NULL from ExecInit* functions where we detect the invalidation
> after opening and locking a relation but before calling ExecInitNode()
> to initialize a child plan if there's one at all.  Even if we may set
> things like ExprContext, TupleTableSlot fields, they are cleaned up
> independently of the plan tree anyway via the cleanup called with
> es_exprcontexts, es_tupleTable, respectively.  I even noticed bits
> like this in ExecEnd* functions:
>
> -   /*
> -* Free the exprcontext(s) ... now dead code, see ExecFreeExprContext
> -*/
> -#ifdef NOT_USED
> -   ExecFreeExprContext(>ss.ps);
> -   if (node->ioss_RuntimeContext)
> -   FreeExprContext(node->ioss_RuntimeContext, true);
> -#endif
>
> So, AFAICS, ExprContext, TupleTableSlot cleanup in ExecNode* functions
> is unnecessary but remain around because nobody cared about and got
> around to

Re: generic plans and "initial" pruning

2023-08-08 Thread Robert Haas

On Tue, Aug 8, 2023 at 10:32 AM Amit Langote  wrote:
> But should ExecInitNode() subroutines return the partially initialized
> PlanState node or NULL on detecting invalidation?  If I'm
> understanding how you think this should be working correctly, I think
> you mean the former, because if it were the latter, ExecInitNode()
> would end up returning NULL at the top for the root and then there's
> nothing to pass to ExecEndNode(), so no way to clean up to begin with.
> In that case, I think we will need to adjust ExecEndNode() subroutines
> to add `if (node->ps.ps_ResultTupleSlot)` in the above code, for
> example.  That's something Tom had said he doesn't like very much [1].

Yeah, I understood Tom's goal as being "don't return partially
initialized nodes."

Personally, I'm not sure that's an important goal. In fact, I don't
even think it's a desirable one. It doesn't look difficult to audit
the end-node functions for cases where they'd fail if a particular
pointer were NULL instead of pointing to some real data, and just
fixing all such cases to have NULL-tests looks like purely mechanical
work that we are unlikely to get wrong. And at least some cases
wouldn't require any changes at all.

If we don't do that, the complexity doesn't go away. It just moves
someplace else. Presumably what we do in that case is have
ExecInitNode functions undo any initialization that they've already
done before returning NULL. There are basically two ways to do that.
Option one is to add code at the point where they return early to
clean up anything they've already initialized, but that code is likely
to substantially duplicate whatever the ExecEndNode function already
knows how to do, and it's very easy for logic like this to get broken
if somebody rearranges an ExecInitNode function down the road. Option
two is to rearrange the ExecInitNode functions now, to open relations
or recurse at the beginning, so that we discover the need to fail
before we initialize anything. That restricts our ability to further
rearrange the functions in future somewhat, but more importantly,
IMHO, it introduces more risk right now. Checking that the ExecEndNode
function will not fail if some pointers are randomly null is a lot
easier than checking that changing the order of operations in an
ExecInitNode function breaks nothing.

I'm not here to say that we can't do one of those things. But I think
adding null-tests to ExecEndNode functions looks like *far* less work
and *way* less risk.

There's a second issue here, too, which is when we abort ExecInitNode
partway through, how do we signal that? You're rightly pointing out
here that if we do that by returning NULL, then we don't do it by
returning a pointer to the partially initialized node that we just
created, which means that we either need to store those partially
initialized nodes in a separate data structure as you propose to do in
0001, or else we need to pick a different signalling convention. We
could change (a) ExecInitNode to have an additional argument, bool
*kaboom, or (b) we could make it return bool and return the node
pointer via a new additional argument, or (c) we could put a Boolean
flag into the estate and let the function signal failure by flipping
the value of the flag. If we do any of those things, then as far as I
can see 0001 is unnecessary. If we do none of them but also avoid
creating partially initialized nodes by one of the two techniques
mentioned two paragraphs prior, then 0001 is also unnecessary. If we
do none of them but do create partially initialized nodes, then we
need 0001.

So if this were a restaurant menu, then it might look like this:

Prix Fixe Menu (choose one from each)

First Course - How do we clean up after partial initialization?
(1) ExecInitNode functions produce partially initialized nodes
(2) ExecInitNode functions get refactored so that the stuff that can
cause early exit always happens first, so that no cleanup is ever
needed
(3) ExecInitNode functions do any required cleanup in situ

Second Course - How do we signal that initialization stopped early?
(A) Return NULL.
(B) Add a bool * out-parmeter to ExecInitNode.
(C) Add a Node * out-parameter to ExecInitNode and change the return
value to bool.
(D) Add a bool to the EState.
(E) Something else, maybe.

I think that we need 0001 if we choose specifically (1) and (A). My
gut feeling is that the least-invasive way to do this project is to
choose (1) and (D). My second choice would be (1) and (C), and my
third choice would be (1) and (A). If I can't have (1), I think I
prefer (2) over (3), but I also believe I prefer hiding in a deep hole
to either of them. Maybe I'm not seeing the whole picture correctly
here, but both (2) and (3) look awfully painful to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2023-08-08 Thread Amit Langote

On Tue, Aug 8, 2023 at 12:36 AM Robert Haas  wrote:
> On Thu, Aug 3, 2023 at 4:37 AM Amit Langote  wrote:
> > Here's a patch set where the refactoring to move the ExecutorStart()
> > calls to be closer to GetCachedPlan() (for the call sites that use a
> > CachedPlan) is extracted into a separate patch, 0002.  Its commit
> > message notes an aspect of this refactoring that I feel a bit nervous
> > about -- needing to also move the CommandCounterIncrement() call from
> > the loop in PortalRunMulti() to PortalStart() which now does
> > ExecutorStart() for the PORTAL_MULTI_QUERY case.
>
> I spent some time today reviewing 0001. Here are a few thoughts and
> notes about things that I looked at.

Thanks for taking a look at this.

> First, I wondered whether it was really adequate for ExecEndPlan() to
> just loop over estate->es_plan_nodes and call it good. Put
> differently, is it possible that we could ever have more than one
> relevant EState, say for a subplan or an EPQ execution or something,
> so that this loop wouldn't cover everything? I found nothing to make
> me think that this is a real danger.

Check.

> Second, I wondered whether the ordering of cleanup operations could be
> an issue. Right now, a node can position cleanup code before, after,
> or both before and after recursing to child nodes, whereas with this
> design change, the cleanup code will always be run before recursing to
> child nodes.

Because a node is appended to es_planstate_nodes at the end of
ExecInitNode(), child nodes get added before their parent nodes.  So
the children are cleaned up first.

> Here, I think we have problems. Both ExecGather and
> ExecEndGatherMerge intentionally clean up the children before the
> parent, so that the child shutdown happens before
> ExecParallelCleanup(). Based on the comment and commit
> acf555bc53acb589b5a2827e65d655fa8c9adee0, this appears to be
> intentional, and you can sort of see why from looking at the stuff
> that happens in ExecParallelCleanup(). If the instrumentation data
> vanishes before the child nodes have a chance to clean things up,
> maybe EXPLAIN ANALYZE won't reflect that instrumentation any more. If
> the DSA vanishes, maybe we'll crash if we try to access it. If we
> actually reach DestroyParallelContext(), we're just going to start
> killing the workers. None of that sounds like what we want.
>
> The good news, of a sort, is that I think this might be the only case
> of this sort of problem. Most nodes recurse at the end, after doing
> all the cleanup, so the behavior won't change. Moreover, even if it
> did, most cleanup operations look pretty localized -- they affect only
> the node itself, and not its children. A somewhat interesting case is
> nodes associated with subplans. Right now, because of the coding of
> ExecEndPlan, nodes associated with subplans are all cleaned up at the
> very end, after everything that's not inside of a subplan. But with
> this change, they'd get cleaned up in the order of initialization,
> which actually seems more natural, as long as it doesn't break
> anything, which I think it probably won't, since as I mention in most
> cases node cleanup looks quite localized, i.e. it doesn't care whether
> it happens before or after the cleanup of other nodes.
>
> I think something will have to be done about the parallel query stuff,
> though. I'm not sure exactly what. It is a little weird that Gather
> and Gather Merge treat starting and killing workers as a purely
> "private matter" that they can decide to handle without the executor
> overall being very much aware of it. So maybe there's a way that some
> of the cleanup logic here could be hoisted up into the general
> executor machinery, that is, first end all the nodes, and then go
> back, and end all the parallelism using, maybe, another list inside of
> the estate. However, I think that the existence of ExecShutdownNode()
> is a complication here -- we need to make sure that we don't break
> either the case where that happen before overall plan shutdown, or the
> case where it doesn't.

Given that children are closed before parent, the order of operations
in ExecEndGather[Merge] is unchanged.

> Third, a couple of minor comments on details of how you actually made
> these changes in the patch set. Personally, I would remove all of the
> "is closed separately" comments that you added. I think it's a
> violation of the general coding principle that you should make the
> code look like it's always been that way. Sure, in the immediate
> future, people might wonder why you don't need to recurse, but 5 or 10
> years from now that's just going to be clutter. Second, in the cases
> where the ExecEndNode functions end up completely empty, I would
> suggest just removing the functions entirely and making the switch
> that dispatches on the node type have a switch case that lists all the
> nodes that don't need a callback here and say /* Nothing do for these
> node types */ break;. This will save a

Re: generic plans and "initial" pruning

2023-08-07 Thread Robert Haas

On Mon, Aug 7, 2023 at 11:44 AM Tom Lane  wrote:
> Right, I doubt that changing that is going to work out well.
> Hash joins might have issues with it too.

I thought about the case, because Hash and Hash Join are such closely
intertwined nodes, but I don't see any problem there. It doesn't
really look like it would matter in what order things got cleaned up.
Unless I'm missing something, all of the data structures are just
independent things that we have to get rid of sometime.

> Could it work to make the patch force child cleanup before parent,
> instead of after?  Or would that break other places?

To me, it seems like the overwhelming majority of the code simply
doesn't care. You could pick an order out of a hat and it would be
100% OK. But I haven't gone and looked through it with this specific
idea in mind.

> On the whole though I think it's probably a good idea to leave
> parent nodes in control of the timing, so I kind of side with
> your later comment about whether we want to change this at all.

My overall feeling here is that what Gather and Gather Merge is doing
is pretty weird. I think I kind of knew that at the time this was all
getting implemented and reviewed, but I wasn't keen to introduce more
infrastructure changes than necessary given that parallel query, as a
project, was still pretty new and I didn't want to give other hackers
more reasons to be unhappy with what was already a lot of very
wide-ranging change to the system. A good number of years having gone
by now, and other people having worked on that code some more, I'm not
too worried about someone calling for a wholesale revert of parallel
query. However, there's a second problem here as well, which is that
I'm still not sure what the right thing to do is. We've fiddled around
with the shutdown sequence for parallel query a number of times now,
and I think there's still stuff that doesn't work quite right,
especially around getting all of the instrumentation data back to the
leader. I haven't spent enough time on this recently enough to be sure
what if any problems remain, though.

So on the one hand, I don't really like the fact that we have an
ad-hoc recursion arrangement here, instead of using
planstate_tree_walker or, as Amit proposes, a List. Giving subordinate
nodes control over the ordering when they don't really need it just
means we have more code with more possibility for bugs and less
certainty about whether the theoretical flexibility is doing anything
in practice. But on the other hand, because we know that at least for
the Gather/GatherMerge case it seems like it probably matters
somewhat, it definitely seems appealing not to change anything as part
of this patch set that we don't really have to.

I've had it firmly in my mind here that we were going to need to
change something somehow -- I mean, the possibility of returning in
the middle of node initialization seems like a pretty major change to
the way this stuff works, and it seems hard for me to believe that we
can just do that and not have to adjust any code anywhere else. Can it
really be true that we can do that and yet not end up creating any
states anywhere with which the current cleanup code is unprepared to
cope? Maybe, but it would seem like rather good luck if that's how it
shakes out. Still, at the moment, I'm having a hard time understanding
what this particular change buys us.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2023-08-07 Thread Tom Lane

Robert Haas  writes:
> Second, I wondered whether the ordering of cleanup operations could be
> an issue. Right now, a node can position cleanup code before, after,
> or both before and after recursing to child nodes, whereas with this
> design change, the cleanup code will always be run before recursing to
> child nodes. Here, I think we have problems. Both ExecGather and
> ExecEndGatherMerge intentionally clean up the children before the
> parent, so that the child shutdown happens before
> ExecParallelCleanup(). Based on the comment and commit
> acf555bc53acb589b5a2827e65d655fa8c9adee0, this appears to be
> intentional, and you can sort of see why from looking at the stuff
> that happens in ExecParallelCleanup().

Right, I doubt that changing that is going to work out well.
Hash joins might have issues with it too.

Could it work to make the patch force child cleanup before parent,
instead of after?  Or would that break other places?

On the whole though I think it's probably a good idea to leave
parent nodes in control of the timing, so I kind of side with
your later comment about whether we want to change this at all.

regards, tom lane

Re: generic plans and "initial" pruning

2023-08-07 Thread Robert Haas

On Thu, Aug 3, 2023 at 4:37 AM Amit Langote  wrote:
> Here's a patch set where the refactoring to move the ExecutorStart()
> calls to be closer to GetCachedPlan() (for the call sites that use a
> CachedPlan) is extracted into a separate patch, 0002.  Its commit
> message notes an aspect of this refactoring that I feel a bit nervous
> about -- needing to also move the CommandCounterIncrement() call from
> the loop in PortalRunMulti() to PortalStart() which now does
> ExecutorStart() for the PORTAL_MULTI_QUERY case.

I spent some time today reviewing 0001. Here are a few thoughts and
notes about things that I looked at.

First, I wondered whether it was really adequate for ExecEndPlan() to
just loop over estate->es_plan_nodes and call it good. Put
differently, is it possible that we could ever have more than one
relevant EState, say for a subplan or an EPQ execution or something,
so that this loop wouldn't cover everything? I found nothing to make
me think that this is a real danger.

Second, I wondered whether the ordering of cleanup operations could be
an issue. Right now, a node can position cleanup code before, after,
or both before and after recursing to child nodes, whereas with this
design change, the cleanup code will always be run before recursing to
child nodes. Here, I think we have problems. Both ExecGather and
ExecEndGatherMerge intentionally clean up the children before the
parent, so that the child shutdown happens before
ExecParallelCleanup(). Based on the comment and commit
acf555bc53acb589b5a2827e65d655fa8c9adee0, this appears to be
intentional, and you can sort of see why from looking at the stuff
that happens in ExecParallelCleanup(). If the instrumentation data
vanishes before the child nodes have a chance to clean things up,
maybe EXPLAIN ANALYZE won't reflect that instrumentation any more. If
the DSA vanishes, maybe we'll crash if we try to access it. If we
actually reach DestroyParallelContext(), we're just going to start
killing the workers. None of that sounds like what we want.

The good news, of a sort, is that I think this might be the only case
of this sort of problem. Most nodes recurse at the end, after doing
all the cleanup, so the behavior won't change. Moreover, even if it
did, most cleanup operations look pretty localized -- they affect only
the node itself, and not its children. A somewhat interesting case is
nodes associated with subplans. Right now, because of the coding of
ExecEndPlan, nodes associated with subplans are all cleaned up at the
very end, after everything that's not inside of a subplan. But with
this change, they'd get cleaned up in the order of initialization,
which actually seems more natural, as long as it doesn't break
anything, which I think it probably won't, since as I mention in most
cases node cleanup looks quite localized, i.e. it doesn't care whether
it happens before or after the cleanup of other nodes.

I think something will have to be done about the parallel query stuff,
though. I'm not sure exactly what. It is a little weird that Gather
and Gather Merge treat starting and killing workers as a purely
"private matter" that they can decide to handle without the executor
overall being very much aware of it. So maybe there's a way that some
of the cleanup logic here could be hoisted up into the general
executor machinery, that is, first end all the nodes, and then go
back, and end all the parallelism using, maybe, another list inside of
the estate. However, I think that the existence of ExecShutdownNode()
is a complication here -- we need to make sure that we don't break
either the case where that happen before overall plan shutdown, or the
case where it doesn't.

Third, a couple of minor comments on details of how you actually made
these changes in the patch set. Personally, I would remove all of the
"is closed separately" comments that you added. I think it's a
violation of the general coding principle that you should make the
code look like it's always been that way. Sure, in the immediate
future, people might wonder why you don't need to recurse, but 5 or 10
years from now that's just going to be clutter. Second, in the cases
where the ExecEndNode functions end up completely empty, I would
suggest just removing the functions entirely and making the switch
that dispatches on the node type have a switch case that lists all the
nodes that don't need a callback here and say /* Nothing do for these
node types */ break;. This will save a few CPU cycles and I think it
will be easier to read as well.

Fourth, I wonder whether we really need this patch at all. I initially
thought we did, because if we abandon the initialization of a plan
partway through, then we end up with a plan that is in a state that
previously would never have occurred, and we still have to be able to
clean it up. However, perhaps it's a difference without a distinction.
Say we have a partial plan tree, where not all of the PlanState nodes
ever got created. We then just call

Re: generic plans and "initial" pruning

2023-07-18 Thread Thom Brown

On Tue, 18 Jul 2023, 08:26 Amit Langote,  wrote:

> Hi Thom,
>
> On Tue, Jul 18, 2023 at 1:33 AM Thom Brown  wrote:
> > On Thu, 13 Jul 2023 at 13:59, Amit Langote 
> wrote:
> > > In an absolutely brown-paper-bag moment, I realized that I had not
> > > updated src/backend/executor/README to reflect the changes to the
> > > executor's control flow that this patch makes.   That is, after
> > > scrapping the old design back in January whose details *were*
> > > reflected in the patches before that redesign.
> > >
> > > Anyway, the attached fixes that.
> > >
> > > Tom, do you think you have bandwidth in the near future to give this
> > > another look?  I think I've addressed the comments that you had given
> > > back in April, though as mentioned in the previous message, there may
> > > still be some funny-looking aspects still remaining.  In any case, I
> > > have no intention of pressing ahead with the patch without another
> > > committer having had a chance to sign off on it.
> >
> > I've only just started taking a look at this, and my first test drive
> > yields very impressive results:
> >
> > 8192 partitions (3 runs, 1 rows)
> > Head 391.294989 382.622481 379.252236
> > Patched 13088.145995 13406.135531 13431.828051
>
> Just to be sure, did you use pgbench --Mprepared with plan_cache_mode
> = force_generic_plan in postgresql.conf?
>

I did.

For full disclosure, I also had max_locks_per_transaction set to 1.

>
> > Looking at your changes to README, I would like to suggest rewording
> > the following:
> >
> > +table during planning.  This means that inheritance child tables, which
> are
> > +added to the query's range table during planning, if they are present
> in a
> > +cached plan tree would not have been locked.
> >
> > To:
> >
> > This means that inheritance child tables present in a cached plan
> > tree, which are added to the query's range table during planning,
> > would not have been locked.
> >
> > Also, further down:
> >
> > s/intiatialize/initialize/
> >
> > I'll carry on taking a closer look and see if I can break it.
>
> Thanks for looking.  I've fixed these issues in the attached updated
> patch.  I've also changed the position of a newly added paragraph in
> src/backend/executor/README so that it doesn't break the flow of the
> existing text.
>

Thanks.

Thom

>

Re: generic plans and "initial" pruning

2023-07-17 Thread Thom Brown

On Thu, 13 Jul 2023 at 13:59, Amit Langote  wrote:
> In an absolutely brown-paper-bag moment, I realized that I had not
> updated src/backend/executor/README to reflect the changes to the
> executor's control flow that this patch makes.   That is, after
> scrapping the old design back in January whose details *were*
> reflected in the patches before that redesign.
>
> Anyway, the attached fixes that.
>
> Tom, do you think you have bandwidth in the near future to give this
> another look?  I think I've addressed the comments that you had given
> back in April, though as mentioned in the previous message, there may
> still be some funny-looking aspects still remaining.  In any case, I
> have no intention of pressing ahead with the patch without another
> committer having had a chance to sign off on it.

I've only just started taking a look at this, and my first test drive
yields very impressive results:

8192 partitions (3 runs, 1 rows)
Head 391.294989 382.622481 379.252236
Patched 13088.145995 13406.135531 13431.828051

Looking at your changes to README, I would like to suggest rewording
the following:

+table during planning.  This means that inheritance child tables, which are
+added to the query's range table during planning, if they are present in a
+cached plan tree would not have been locked.

To:

This means that inheritance child tables present in a cached plan
tree, which are added to the query's range table during planning,
would not have been locked.

Also, further down:

s/intiatialize/initialize/

I'll carry on taking a closer look and see if I can break it.

Thom

Re: generic plans and "initial" pruning

2023-07-03 Thread Daniel Gustafsson

> On 8 Jun 2023, at 16:23, Amit Langote  wrote:
> 
> Here is a new version.

The local planstate variable in the hunk below is shadowing the function
parameter planstate which cause a compiler warning:

@@ -1495,18 +1556,15 @@ ExecEndPlan(PlanState *planstate, EState *estate)
ListCell   *l;
 
/*
-* shut down the node-type-specific query processing
-*/
-   ExecEndNode(planstate);
-
-   /*
-* for subplans too
+* Shut down the node-type-specific query processing for all nodes that
+* were initialized during InitPlan(), both in the main plan tree and 
those
+* in subplans (es_subplanstates), if any.
 */
-   foreach(l, estate->es_subplanstates)
+   foreach(l, estate->es_inited_plannodes)
{
-   PlanState  *subplanstate = (PlanState *) lfirst(l);
+   PlanState  *planstate = (PlanState *) lfirst(l);

--
Daniel Gustafsson

Re: generic plans and "initial" pruning

2023-04-05 Thread Amit Langote

On Tue, Apr 4, 2023 at 10:29 PM Amit Langote 
wrote:
> On Tue, Apr 4, 2023 at 6:41 AM Tom Lane  wrote:
> > A few concrete thoughts:
> >
> > * I understand that your plan now is to acquire locks on all the
> > originally-named tables, then do permissions checks (which will
> > involve only those tables), then dynamically lock just inheritance and
> > partitioning child tables as we descend the plan tree.
>
> Actually, with the current implementation of the patch, *all* of the
> relations mentioned in the plan tree would get locked during the
> ExecInitNode() traversal of the plan tree (and of those in
> plannedstmt->subplans), not just the inheritance child tables.
> Locking of non-child tables done by the executor after this patch is
> duplicative with AcquirePlannerLocks(), so that's something to be
> improved.
>
> > That seems
> > more or less okay to me, but it could be reflected better in the
> > structure of the patch perhaps.
> >
> > * In particular I don't much like the "viewRelations" list, which
> > seems like a wart; those ought to be handled more nearly the same way
> > as other RTEs.  (One concrete reason why is that this scheme is going
> > to result in locking views in a different order than they were locked
> > during original parsing, which perhaps could contribute to deadlocks.)
> > Maybe we should store an integer list of which RTIs need to be locked
> > in the early phase?  Building that in the parser/rewriter would provide
> > a solid guide to the original locking order, so we'd be trivially sure
> > of duplicating that.  (It might be close enough to follow the RT list
> > order, which is basically what AcquireExecutorLocks does today, but
> > this'd be more certain to do the right thing.)  I'm less concerned
> > about lock order for child tables because those are just going to
> > follow the inheritance or partitioning structure.
>
> What you've described here sounds somewhat like what I had implemented
> in the patch versions till v31, though it used a bitmapset named
> minLockRelids that is initialized by setrefs.c.  Your idea of
> initializing a list before planning seems more appealing offhand than
> the code I had added in setrefs.c to populate that minLockRelids
> bitmapset, which would be bms_add_range(1, list_lenth(finalrtable)),
> followed by bms_del_members(set-of-child-rel-rtis).
>
> I'll give your idea a try.

After sleeping on this, I think we perhaps don't need to remember
originally-named relations if only for the purpose of locking them for
execution.  That's because, for a reused (cached) plan,
AcquirePlannerLocks() would have taken those locks anyway.

AcquirePlannerLocks() doesn't lock inheritance children because they would
be added to the range table by the planner, so they should be locked
separately for execution, if needed.  I thought taking the execution-time
locks only when inside ExecInit[Merge]Append would work, but then we have
cases where single-child Append/MergeAppend are stripped of the
Append/MergeAppend nodes by setrefs.c.  Maybe we need a place to remember
such child relations, that is, only in the cases where Append/MergeAppend
elision occurs, in something maybe esoteric-sounding like
PlannedStmt.elidedAppendChildRels or something?

Another set of child relations that are not covered by Append/MergeAppend
child nodes is non-leaf partitions.  I've proposed adding a List of
Bitmapset field to Append/MergeAppend named 'allpartrelids' as part of this
patchset (patch 0001) to track those for execution-time locking.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2023-04-04 Thread Amit Langote

On Tue, Apr 4, 2023 at 6:41 AM Tom Lane  wrote:
> Amit Langote  writes:
> > [ v38 patchset ]
>
> I spent a little bit of time looking through this, and concluded that
> it's not something I will be wanting to push into v16 at this stage.
> The patch doesn't seem very close to being committable on its own
> terms, and even if it was now is not a great time in the dev cycle
> to be making significant executor API changes.  Too much risk of
> having to thrash the API during beta, or even change it some more
> in v17.  I suggest that we push this forward to the next CF with the
> hope of landing it early in v17.

OK, thanks a lot for your feedback.

> A few concrete thoughts:
>
> * I understand that your plan now is to acquire locks on all the
> originally-named tables, then do permissions checks (which will
> involve only those tables), then dynamically lock just inheritance and
> partitioning child tables as we descend the plan tree.

Actually, with the current implementation of the patch, *all* of the
relations mentioned in the plan tree would get locked during the
ExecInitNode() traversal of the plan tree (and of those in
plannedstmt->subplans), not just the inheritance child tables.
Locking of non-child tables done by the executor after this patch is
duplicative with AcquirePlannerLocks(), so that's something to be
improved.

> That seems
> more or less okay to me, but it could be reflected better in the
> structure of the patch perhaps.
>
> * In particular I don't much like the "viewRelations" list, which
> seems like a wart; those ought to be handled more nearly the same way
> as other RTEs.  (One concrete reason why is that this scheme is going
> to result in locking views in a different order than they were locked
> during original parsing, which perhaps could contribute to deadlocks.)
> Maybe we should store an integer list of which RTIs need to be locked
> in the early phase?  Building that in the parser/rewriter would provide
> a solid guide to the original locking order, so we'd be trivially sure
> of duplicating that.  (It might be close enough to follow the RT list
> order, which is basically what AcquireExecutorLocks does today, but
> this'd be more certain to do the right thing.)  I'm less concerned
> about lock order for child tables because those are just going to
> follow the inheritance or partitioning structure.

What you've described here sounds somewhat like what I had implemented
in the patch versions till v31, though it used a bitmapset named
minLockRelids that is initialized by setrefs.c.  Your idea of
initializing a list before planning seems more appealing offhand than
the code I had added in setrefs.c to populate that minLockRelids
bitmapset, which would be bms_add_range(1, list_lenth(finalrtable)),
followed by bms_del_members(set-of-child-rel-rtis).

I'll give your idea a try.

> * I don't understand the need for changes like this:
>
> /* clean up tuple table */
> -   ExecClearTuple(node->ps.ps_ResultTupleSlot);
> +   if (node->ps.ps_ResultTupleSlot)
> +   ExecClearTuple(node->ps.ps_ResultTupleSlot);
>
> ISTM that the process ought to involve taking a lock (if needed)
> before we have built any execution state for a given plan node,
> and if we find we have to fail, returning NULL instead of a
> partially-valid planstate node.  Otherwise, considerations of how
> to handle partially-valid nodes are going to metastasize into all
> sorts of places, almost certainly including EXPLAIN for instance.
> I think we ought to be able to limit the damage to "parent nodes
> might have NULL child links that you wouldn't have expected".
> That wouldn't faze ExecEndNode at all, nor most other code.

Hmm, yes, taking a lock before allocating any of the stuff to add into
the planstate seems like it's much easier to reason about than the
alternative I've implemented.

> * More attention is needed to comments.  For example, in a couple of
> places in plancache.c you have removed function header comments
> defining API details and not replaced them with any info about the new
> details, despite the fact that those details are more complex than the
> old.

OK, yeah, maybe I've added a bunch of explanations in execMain.c that
should perhaps have been in plancache.c.

> > It seems I hadn't noted in the ExecEndNode()'s comment that all node
> > types' recursive subroutines need to  handle the change made by this
> > patch that the corresponding ExecInitNode() subroutine may now return
> > early without having initialized all state struct fields.
> > Also noted in the documentation for CustomScan and ForeignScan that
> > the Begin*Scan callback may not have been called at all, so the
> > End*Scan should handle that gracefully.
>
> Yeah, I think we need to avoid adding such requirements.  It's the
> sort of thing that would far too easily get past developer testing
> and only fail once in a blue moon in the field.

OK, got it.

--
Thanks, Amit Langote
EDB:

Re: generic plans and "initial" pruning

2023-04-03 Thread Tom Lane

Amit Langote  writes:
> [ v38 patchset ]

I spent a little bit of time looking through this, and concluded that
it's not something I will be wanting to push into v16 at this stage.
The patch doesn't seem very close to being committable on its own
terms, and even if it was now is not a great time in the dev cycle
to be making significant executor API changes.  Too much risk of
having to thrash the API during beta, or even change it some more
in v17.  I suggest that we push this forward to the next CF with the
hope of landing it early in v17.

A few concrete thoughts:

* I understand that your plan now is to acquire locks on all the
originally-named tables, then do permissions checks (which will
involve only those tables), then dynamically lock just inheritance and
partitioning child tables as we descend the plan tree.  That seems
more or less okay to me, but it could be reflected better in the
structure of the patch perhaps.

* In particular I don't much like the "viewRelations" list, which
seems like a wart; those ought to be handled more nearly the same way
as other RTEs.  (One concrete reason why is that this scheme is going
to result in locking views in a different order than they were locked
during original parsing, which perhaps could contribute to deadlocks.)
Maybe we should store an integer list of which RTIs need to be locked
in the early phase?  Building that in the parser/rewriter would provide
a solid guide to the original locking order, so we'd be trivially sure
of duplicating that.  (It might be close enough to follow the RT list
order, which is basically what AcquireExecutorLocks does today, but
this'd be more certain to do the right thing.)  I'm less concerned
about lock order for child tables because those are just going to
follow the inheritance or partitioning structure.

* I don't understand the need for changes like this:

/* clean up tuple table */
-   ExecClearTuple(node->ps.ps_ResultTupleSlot);
+   if (node->ps.ps_ResultTupleSlot)
+   ExecClearTuple(node->ps.ps_ResultTupleSlot);

ISTM that the process ought to involve taking a lock (if needed)
before we have built any execution state for a given plan node,
and if we find we have to fail, returning NULL instead of a
partially-valid planstate node.  Otherwise, considerations of how
to handle partially-valid nodes are going to metastasize into all
sorts of places, almost certainly including EXPLAIN for instance.
I think we ought to be able to limit the damage to "parent nodes
might have NULL child links that you wouldn't have expected".
That wouldn't faze ExecEndNode at all, nor most other code.

* More attention is needed to comments.  For example, in a couple of
places in plancache.c you have removed function header comments
defining API details and not replaced them with any info about the new
details, despite the fact that those details are more complex than the
old.

> It seems I hadn't noted in the ExecEndNode()'s comment that all node
> types' recursive subroutines need to  handle the change made by this
> patch that the corresponding ExecInitNode() subroutine may now return
> early without having initialized all state struct fields.
> Also noted in the documentation for CustomScan and ForeignScan that
> the Begin*Scan callback may not have been called at all, so the
> End*Scan should handle that gracefully.

Yeah, I think we need to avoid adding such requirements.  It's the
sort of thing that would far too easily get past developer testing
and only fail once in a blue moon in the field.

regards, tom lane

Re: generic plans and "initial" pruning

2023-03-02 Thread Amit Langote

On Wed, Feb 8, 2023 at 7:31 PM Amit Langote  wrote:
> On Tue, Feb 7, 2023 at 23:38 Andres Freund  wrote:
>> The tests seem to frequently hang on freebsd:
>> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3478
>
> Thanks for the heads up.  I’ve noticed this one too, though couldn’t find the 
> testrun artifacts like I could get for some other failures (on other cirrus 
> machines).  Has anyone else been a similar situation?

I think I have figured out what might be going wrong on that cfbot
animal after building with the same CPPFLAGS as that animal locally.
I had forgotten to update _out/_readRangeTblEntry() to account for the
patch's change that a view's RTE_SUBQUERY now also preserves relkind
in addition to relid and rellockmode for the locking consideration.

Also, I noticed that a multi-query Portal execution with rules was
failing (thanks to a regression test added in a7d71c41db) because of
the snapshot used for the 2nd query onward not being updated for
command ID change under patched model of multi-query Portal execution.
To wit, under the patched model, all queries in the multi-query Portal
case undergo ExecutorStart() before any of it is run with
ExecutorRun().  The patch hadn't changed things however to update the
snapshot's command ID for the 2nd query onwards, which caused the
aforementioned test case to fail.

This new model does however mean that the 2nd query onwards must use
PushCopiedSnapshot() given the current requirement of
UpdateActiveSnapshotCommandId() that the snapshot passed to it must
not be referenced anywhere else.  The new model basically requires
that each query's QueryDesc points to its own copy of the
ActiveSnapshot.  That may not be a thing in favor of the patched model
though.  For now, I haven't been able to come up with a better
alternative.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

v34-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patch
Description: Binary data

Re: generic plans and "initial" pruning

2023-02-08 Thread Amit Langote

On Tue, Feb 7, 2023 at 23:38 Andres Freund  wrote:

> Hi,
>
> On 2023-02-03 22:01:09 +0900, Amit Langote wrote:
> > I've added a test case under src/modules/delay_execution by adding a
> > new ExecutorStart_hook that works similarly as
> > delay_execution_planner().  The test works by allowing a concurrent
> > session to drop an object being referenced in a cached plan being
> > initialized while the ExecutorStart_hook waits to get an advisory
> > lock.  The concurrent drop of the referenced object is detected during
> > ExecInitNode() and thus triggers replanning of the cached plan.
> >
> > I also fixed a bug in the ExplainExecuteQuery() while testing and some
> comments.
>
> The tests seem to frequently hang on freebsd:
>
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3478


Thanks for the heads up.  I’ve noticed this one too, though couldn’t find
the testrun artifacts like I could get for some other failures (on other
cirrus machines).  Has anyone else been a similar situation?

>
> 

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2023-02-07 Thread Andres Freund

Hi,

On 2023-02-03 22:01:09 +0900, Amit Langote wrote:
> I've added a test case under src/modules/delay_execution by adding a
> new ExecutorStart_hook that works similarly as
> delay_execution_planner().  The test works by allowing a concurrent
> session to drop an object being referenced in a cached plan being
> initialized while the ExecutorStart_hook waits to get an advisory
> lock.  The concurrent drop of the referenced object is detected during
> ExecInitNode() and thus triggers replanning of the cached plan.
> 
> I also fixed a bug in the ExplainExecuteQuery() while testing and some 
> comments.

The tests seem to frequently hang on freebsd:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3478

Greetings,

Andres Freund

Re: generic plans and "initial" pruning

2023-02-03 Thread Amit Langote

On Thu, Feb 2, 2023 at 11:49 PM Amit Langote  wrote:
> On Fri, Jan 27, 2023 at 4:01 PM Amit Langote  wrote:
> > I didn't actually go with calling the plancache on every lock taken on
> > a relation, that is, in ExecGetRangeTableRelation().  One thing about
> > doing it that way that I didn't quite like (or didn't see a clean
> > enough way to code) is the need to complicate the ExecInitNode()
> > traversal for handling the abrupt suspension of the ongoing setup of
> > the PlanState tree.
>
> OK, I gave this one more try and attached is what I came up with.
>
> This adds a ExecPlanStillValid(), which is called right after anything
> that may in turn call ExecGetRangeTableRelation() which has been
> taught to lock a relation if EXEC_FLAG_GET_LOCKS has been passed in
> EState.es_top_eflags.  That includes all ExecInitNode() calls, and a
> few other functions that call ExecGetRangeTableRelation() directly,
> such as ExecOpenScanRelation().  If ExecPlanStillValid() returns
> false, that is, if EState.es_cachedplan is found to have been
> invalidated after a lock being taken by ExecGetRangeTableRelation(),
> whatever funcion called it must return immediately and so must its
> caller and so on.  ExecEndPlan() seems to be able to clean up after a
> partially finished attempt of initializing a PlanState tree in this
> way.  Maybe my preliminary testing didn't catch cases where pointers
> to resources that are normally put into the nodes of a PlanState tree
> are now left dangling, because a partially built PlanState tree is not
> accessible to ExecEndPlan; QueryDesc.planstate would remain NULL in
> such cases.  Maybe there's only es_tupleTable and es_relations that
> needs to be explicitly released and the rest is taken care of by
> resetting the ExecutorState context.

In the attached updated patch, I've made the functions that check
ExecPlanStillValid() to return NULL (if returning something) instead
of returning partially initialized structs.  Those partially
initialized structs were not being subsequently looked at anyway.

> On testing, I'm afraid we're going to need something like
> src/test/modules/delay_execution to test that concurrent changes to
> relation(s) in PlannedStmt.relationOids that occur somewhere between
> RevalidateCachedQuery() and InitPlan() result in the latter to be
> aborted and that it is handled correctly.  It seems like it is only
> the locking of partitions (that are not present in an unplanned Query
> and thus not protected by AcquirePlannerLocks()) that can trigger
> replanning of a CachedPlan, so any tests we write should involve
> partitions.  Should this try to test as many plan shapes as possible
> though given the uncertainty around ExecEndPlan() robustness or should
> manual auditing suffice to be sure that nothing's broken?

I've added a test case under src/modules/delay_execution by adding a
new ExecutorStart_hook that works similarly as
delay_execution_planner().  The test works by allowing a concurrent
session to drop an object being referenced in a cached plan being
initialized while the ExecutorStart_hook waits to get an advisory
lock.  The concurrent drop of the referenced object is detected during
ExecInitNode() and thus triggers replanning of the cached plan.

I also fixed a bug in the ExplainExecuteQuery() while testing and some comments.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v33-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patch
Description: Binary data

Re: generic plans and "initial" pruning

2023-02-02 Thread Amit Langote

On Fri, Jan 27, 2023 at 4:01 PM Amit Langote  wrote:
> On Fri, Jan 20, 2023 at 12:52 PM Amit Langote  wrote:
> > Alright, I'll try to get something out early next week.  Thanks for
> > all the pointers.
>
> Sorry for the delay.  Attached is what I've come up with so far.
>
> I didn't actually go with calling the plancache on every lock taken on
> a relation, that is, in ExecGetRangeTableRelation().  One thing about
> doing it that way that I didn't quite like (or didn't see a clean
> enough way to code) is the need to complicate the ExecInitNode()
> traversal for handling the abrupt suspension of the ongoing setup of
> the PlanState tree.

OK, I gave this one more try and attached is what I came up with.

This adds a ExecPlanStillValid(), which is called right after anything
that may in turn call ExecGetRangeTableRelation() which has been
taught to lock a relation if EXEC_FLAG_GET_LOCKS has been passed in
EState.es_top_eflags.  That includes all ExecInitNode() calls, and a
few other functions that call ExecGetRangeTableRelation() directly,
such as ExecOpenScanRelation().  If ExecPlanStillValid() returns
false, that is, if EState.es_cachedplan is found to have been
invalidated after a lock being taken by ExecGetRangeTableRelation(),
whatever funcion called it must return immediately and so must its
caller and so on.  ExecEndPlan() seems to be able to clean up after a
partially finished attempt of initializing a PlanState tree in this
way.  Maybe my preliminary testing didn't catch cases where pointers
to resources that are normally put into the nodes of a PlanState tree
are now left dangling, because a partially built PlanState tree is not
accessible to ExecEndPlan; QueryDesc.planstate would remain NULL in
such cases.  Maybe there's only es_tupleTable and es_relations that
needs to be explicitly released and the rest is taken care of by
resetting the ExecutorState context.

On testing, I'm afraid we're going to need something like
src/test/modules/delay_execution to test that concurrent changes to
relation(s) in PlannedStmt.relationOids that occur somewhere between
RevalidateCachedQuery() and InitPlan() result in the latter to be
aborted and that it is handled correctly.  It seems like it is only
the locking of partitions (that are not present in an unplanned Query
and thus not protected by AcquirePlannerLocks()) that can trigger
replanning of a CachedPlan, so any tests we write should involve
partitions.  Should this try to test as many plan shapes as possible
though given the uncertainty around ExecEndPlan() robustness or should
manual auditing suffice to be sure that nothing's broken?

On possibly needing to move permission checking to occur *after*
taking locks, I realized that we don't really need to, because no
relation that needs its permissions should be unlocked by the time we
get to ExecCheckPermissions(); note we only check permissions of
tables that are present in the original parse tree and
RevalidateCachedQuery() should have locked those.  I found a couple of
exceptions to that invariant in that views sometimes appear not to be
in the set of relations that RevalidateCachedQuery() locks.  So, I
invented PlannedStmt.viewRelations, a list of RT indexes of view RTEs
that is populated in setrefs.c. ExecLockViewRelations() called before
ExecCheckPermissions() locks those.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

v32-0001-Move-AcquireExecutorLocks-s-responsibility-into-.patch
Description: Binary data

Re: generic plans and "initial" pruning

2023-01-19 Thread Amit Langote

On Fri, Jan 20, 2023 at 12:58 PM Tom Lane  wrote:
> Amit Langote  writes:
> > On Fri, Jan 20, 2023 at 12:31 PM Tom Lane  wrote:
> >> It might be possible to incorporate this pointer into PlannedStmt
> >> instead of passing it separately.
>
> > Yeah, that would be less churn.  Though, I wonder if you still hold
> > that PlannedStmt should not be scribbled upon outside the planner as
> > you said upthread [1]?
>
> Well, the whole point of that rule is that the executor can't modify
> a plancache entry.  If the plancache itself sets a field in such an
> entry, that doesn't seem problematic from here.
>
> But there's other possibilities if that bothers you; QueryDesc
> could hold the field, for example.  Also, I bet we'd want to copy
> it into EState for the main initialization recursion.

QueryDesc sounds good to me, and yes, also a copy in EState in any case.

So I started looking at the call sites of CreateQueryDesc() and
stopped to look at ExecParallelGetQueryDesc().  AFAICS, we wouldn't
need to pass the CachedPlan to a parallel worker's rerun of
InitPlan(), because 1) it doesn't make sense to call the plancache in
a parallel worker, 2) the leader should already have taken all the
locks in necessary for executing a given plan subnode that it intends
to pass to a worker in ExecInitGather().  Does that make sense?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2023-01-19 Thread Tom Lane

Amit Langote  writes:
> On Fri, Jan 20, 2023 at 12:31 PM Tom Lane  wrote:
>> It might be possible to incorporate this pointer into PlannedStmt
>> instead of passing it separately.

> Yeah, that would be less churn.  Though, I wonder if you still hold
> that PlannedStmt should not be scribbled upon outside the planner as
> you said upthread [1]?

Well, the whole point of that rule is that the executor can't modify
a plancache entry.  If the plancache itself sets a field in such an
entry, that doesn't seem problematic from here.

But there's other possibilities if that bothers you; QueryDesc
could hold the field, for example.  Also, I bet we'd want to copy
it into EState for the main initialization recursion.

regards, tom lane

Re: generic plans and "initial" pruning

2023-01-19 Thread Amit Langote

On Fri, Jan 20, 2023 at 12:31 PM Tom Lane  wrote:
> Amit Langote  writes:
> > On Fri, Jan 20, 2023 at 4:39 AM Tom Lane  wrote:
> >> I had what felt like an epiphany: the whole problem arises because the
> >> system is wrongly factored.  We should get rid of AcquireExecutorLocks
> >> altogether, allowing the plancache to hand back a generic plan that
> >> it's not certain of the validity of, and instead integrate the
> >> responsibility for acquiring locks into executor startup.
>
> > Interesting.  The current implementation relies on
> > PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
> > perhaps there will have to be some sharing of state between the
> > plancache and the executor for this to work?
>
> Yeah.  Thinking a little harder, I think this would have to involve
> passing a CachedPlan pointer to the executor, and what the executor
> would do after acquiring each lock is to ask the plancache "hey, do
> you still think this CachedPlan entry is valid?".  In the case where
> there's a problem, the AcceptInvalidationMessages call involved in
> lock acquisition would lead to a cache inval that clears the validity
> flag on the CachedPlan entry, and this would provide an inexpensive
> way to check if that happened.

OK, thanks, this is useful.

> It might be possible to incorporate this pointer into PlannedStmt
> instead of passing it separately.

Yeah, that would be less churn.  Though, I wonder if you still hold
that PlannedStmt should not be scribbled upon outside the planner as
you said upthread [1]?

> >> * In a successfully built execution state tree, there will simply
> >> not be any nodes corresponding to pruned-away, never-locked subplans.
>
> > I think this is true with the patch as proposed too, but I was still a
> > bit worried about what an ExecutorStart_hook may be doing with an
> > uninitialized plan tree.  Maybe we're mandating that the hook must
> > call standard_ExecutorStart() and only work with the finished
> > PlanState tree?
>
> It would certainly be incumbent on any such hook to not touch
> not-yet-locked parts of the plan tree.  I'm not particularly concerned
> about that sort of requirements change, because we'd be breaking APIs
> all through this area in any case.

OK.  Perhaps something that should be documented around ExecutorStart().

> >> * In some cases (views, at least) we need to acquire lock on relations
> >> that aren't directly reflected anywhere in the plan tree.  So there'd
> >> have to be a separate mechanism for getting those locks and rechecking
> >> validity afterward.  A list of relevant relation OIDs might be enough
> >> for that.
>
> > Hmm, a list of only the OIDs wouldn't preserve the lock mode,
>
> Good point.  I wonder if we could integrate this with the
> RTEPermissionInfo data structure?

You mean adding a rellockmode field to RTEPermissionInfo?

> > Would you like me to hack up a PoC or are you already on that?
>
> I'm not planning to work on this myself, I was hoping you would.

Alright, I'll try to get something out early next week.  Thanks for
all the pointers.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/922566.1648784745%40sss.pgh.pa.us

Re: generic plans and "initial" pruning

2023-01-19 Thread Tom Lane

Amit Langote  writes:
> On Fri, Jan 20, 2023 at 4:39 AM Tom Lane  wrote:
>> I had what felt like an epiphany: the whole problem arises because the
>> system is wrongly factored.  We should get rid of AcquireExecutorLocks
>> altogether, allowing the plancache to hand back a generic plan that
>> it's not certain of the validity of, and instead integrate the
>> responsibility for acquiring locks into executor startup.

> Interesting.  The current implementation relies on
> PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
> perhaps there will have to be some sharing of state between the
> plancache and the executor for this to work?

Yeah.  Thinking a little harder, I think this would have to involve
passing a CachedPlan pointer to the executor, and what the executor
would do after acquiring each lock is to ask the plancache "hey, do
you still think this CachedPlan entry is valid?".  In the case where
there's a problem, the AcceptInvalidationMessages call involved in
lock acquisition would lead to a cache inval that clears the validity
flag on the CachedPlan entry, and this would provide an inexpensive
way to check if that happened.

It might be possible to incorporate this pointer into PlannedStmt
instead of passing it separately.

>> * In a successfully built execution state tree, there will simply
>> not be any nodes corresponding to pruned-away, never-locked subplans.

> I think this is true with the patch as proposed too, but I was still a
> bit worried about what an ExecutorStart_hook may be doing with an
> uninitialized plan tree.  Maybe we're mandating that the hook must
> call standard_ExecutorStart() and only work with the finished
> PlanState tree?

It would certainly be incumbent on any such hook to not touch
not-yet-locked parts of the plan tree.  I'm not particularly concerned
about that sort of requirements change, because we'd be breaking APIs
all through this area in any case.

>> * In some cases (views, at least) we need to acquire lock on relations
>> that aren't directly reflected anywhere in the plan tree.  So there'd
>> have to be a separate mechanism for getting those locks and rechecking
>> validity afterward.  A list of relevant relation OIDs might be enough
>> for that.

> Hmm, a list of only the OIDs wouldn't preserve the lock mode,

Good point.  I wonder if we could integrate this with the
RTEPermissionInfo data structure?

> Would you like me to hack up a PoC or are you already on that?

I'm not planning to work on this myself, I was hoping you would.

regards, tom lane

Re: generic plans and "initial" pruning

2023-01-19 Thread Amit Langote

On Fri, Jan 20, 2023 at 4:39 AM Tom Lane  wrote:
> I spent some time re-reading this whole thread, and the more I read
> the less happy I got.

Thanks a lot for your time on this.

>  We are adding a lot of complexity and introducing
> coding hazards that will surely bite somebody someday.  And after awhile
> I had what felt like an epiphany: the whole problem arises because the
> system is wrongly factored.  We should get rid of AcquireExecutorLocks
> altogether, allowing the plancache to hand back a generic plan that
> it's not certain of the validity of, and instead integrate the
> responsibility for acquiring locks into executor startup.  It'd have
> to be optional there, since we don't need new locks in the case of
> executing a just-planned plan; but we can easily add another eflags
> bit (EXEC_FLAG_GET_LOCKS or so).  Then there has to be a convention
> whereby the ExecInitNode traversal can return an indicator that
> "we failed because the plan is stale, please make a new plan".

Interesting.  The current implementation relies on
PlanCacheRelCallback() marking a generic CachedPlan as invalid, so
perhaps there will have to be some sharing of state between the
plancache and the executor for this to work?

> There are a couple reasons why this feels like a good idea:
>
> * There's no need for worry about keeping the locking decisions in sync
> with what executor startup does.
>
> * We don't need to add the overhead proposed in the current patch to
> pass forward data about what got locked/pruned.  While that overhead
> is hopefully less expensive than the locks it saved acquiring, it's
> still overhead (and in some cases the patch will fail to save acquiring
> any locks, making it certainly a net negative).
>
> * In a successfully built execution state tree, there will simply
> not be any nodes corresponding to pruned-away, never-locked subplans.
> As long as code like EXPLAIN follows the state tree and doesn't poke
> into plan nodes that have no matching state, it's secure against the
> sort of problems that Robert worried about upthread.

I think this is true with the patch as proposed too, but I was still a
bit worried about what an ExecutorStart_hook may be doing with an
uninitialized plan tree.  Maybe we're mandating that the hook must
call standard_ExecutorStart() and only work with the finished
PlanState tree?

> While I've not attempted to write any code for this, I can also
> think of a few issues that'd have to be resolved:
>
> * We'd be pushing the responsibility for looping back and re-planning
> out to fairly high-level calling code.  There are only half a dozen
> callers of GetCachedPlan, so there's not that many places to be
> touched; but in some of those places the subsequent executor-start call
> is not close by, so that the necessary refactoring might be pretty
> painful.  I doubt there's anything insurmountable, but we'd definitely
> be changing some fundamental APIs.

Yeah.  I suppose mostly the same place that the current patch is
touching to pass around the PartitionPruneResult nodes.

> * In some cases (views, at least) we need to acquire lock on relations
> that aren't directly reflected anywhere in the plan tree.  So there'd
> have to be a separate mechanism for getting those locks and rechecking
> validity afterward.  A list of relevant relation OIDs might be enough
> for that.

Hmm, a list of only the OIDs wouldn't preserve the lock mode, so maybe
a list or bitmapset of the RTIs, something along the lines of
PlannedStmt.minLockRelids in the patch?

It perhaps even makes sense to make a special list in PlannedStmt for
only the views?

> * We currently do ExecCheckPermissions() before initializing the
> plan state tree.  It won't do to check permissions on relations we
> haven't yet locked, so that responsibility would have to be moved.
> Maybe that could also be integrated into the initialization recursion?
> Not sure.

Ah, I remember mentioning moving that into ExecGetRangeTableRelation()
[1], but I guess that misses relations that are not referenced in the
plan tree, such as views.  Though maybe that's not a problem if we
track views separately as mentioned above.

> * In the existing usage of AcquireExecutorLocks, if we do decide
> that the plan is stale then we are able to release all the locks
> we got before we go off and replan.  I'm not certain if that behavior
> needs to be preserved, but if it does then that would require some
> additional bookkeeping in the executor.

I think maybe we'll want to continue to release the existing locks,
because if we don't, it's possible we may keep some locks uselessly if
replanning might lock a different set of relations.

> * This approach is optimizing on the assumption that we usually
> won't need to replan, because if we do then we might waste a fair
> amount of executor startup overhead before discovering we have
> to throw all that state away.  I think that's clearly the right
> way to bet, but perhaps somebody else has a

Re: generic plans and "initial" pruning

2023-01-19 Thread Tom Lane

I spent some time re-reading this whole thread, and the more I read
the less happy I got.  We are adding a lot of complexity and introducing
coding hazards that will surely bite somebody someday.  And after awhile
I had what felt like an epiphany: the whole problem arises because the
system is wrongly factored.  We should get rid of AcquireExecutorLocks
altogether, allowing the plancache to hand back a generic plan that
it's not certain of the validity of, and instead integrate the
responsibility for acquiring locks into executor startup.  It'd have
to be optional there, since we don't need new locks in the case of
executing a just-planned plan; but we can easily add another eflags
bit (EXEC_FLAG_GET_LOCKS or so).  Then there has to be a convention
whereby the ExecInitNode traversal can return an indicator that
"we failed because the plan is stale, please make a new plan".

There are a couple reasons why this feels like a good idea:

* There's no need for worry about keeping the locking decisions in sync
with what executor startup does.

* We don't need to add the overhead proposed in the current patch to
pass forward data about what got locked/pruned.  While that overhead
is hopefully less expensive than the locks it saved acquiring, it's
still overhead (and in some cases the patch will fail to save acquiring
any locks, making it certainly a net negative).

* In a successfully built execution state tree, there will simply
not be any nodes corresponding to pruned-away, never-locked subplans.
As long as code like EXPLAIN follows the state tree and doesn't poke
into plan nodes that have no matching state, it's secure against the
sort of problems that Robert worried about upthread.

While I've not attempted to write any code for this, I can also
think of a few issues that'd have to be resolved:

* We'd be pushing the responsibility for looping back and re-planning
out to fairly high-level calling code.  There are only half a dozen
callers of GetCachedPlan, so there's not that many places to be
touched; but in some of those places the subsequent executor-start call
is not close by, so that the necessary refactoring might be pretty
painful.  I doubt there's anything insurmountable, but we'd definitely
be changing some fundamental APIs.

* In some cases (views, at least) we need to acquire lock on relations
that aren't directly reflected anywhere in the plan tree.  So there'd
have to be a separate mechanism for getting those locks and rechecking
validity afterward.  A list of relevant relation OIDs might be enough
for that.

* We currently do ExecCheckPermissions() before initializing the
plan state tree.  It won't do to check permissions on relations we
haven't yet locked, so that responsibility would have to be moved.
Maybe that could also be integrated into the initialization recursion?
Not sure.

* In the existing usage of AcquireExecutorLocks, if we do decide
that the plan is stale then we are able to release all the locks
we got before we go off and replan.  I'm not certain if that behavior
needs to be preserved, but if it does then that would require some
additional bookkeeping in the executor.

* This approach is optimizing on the assumption that we usually
won't need to replan, because if we do then we might waste a fair
amount of executor startup overhead before discovering we have
to throw all that state away.  I think that's clearly the right
way to bet, but perhaps somebody else has a different view.

Thoughts?

regards, tom lane

Re: generic plans and "initial" pruning

2022-12-21 Thread Tom Lane

Alvaro Herrera  writes:
> This version of the patch looks not entirely unreasonable to me.  I'll
> set this as Ready for Committer in case David or Tom or someone else
> want to have a look and potentially commit it.

I will have a look during the January CF.

regards, tom lane

Re: generic plans and "initial" pruning

2022-12-21 Thread Amit Langote

On Wed, Dec 21, 2022 at 7:18 PM Alvaro Herrera  wrote:
> This version of the patch looks not entirely unreasonable to me.  I'll
> set this as Ready for Committer in case David or Tom or someone else
> want to have a look and potentially commit it.

Thank you, Alvaro.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-12-21 Thread Alvaro Herrera

This version of the patch looks not entirely unreasonable to me.  I'll
set this as Ready for Committer in case David or Tom or someone else
want to have a look and potentially commit it.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: generic plans and "initial" pruning

2022-12-15 Thread Amit Langote

On Wed, Dec 14, 2022 at 5:35 PM Amit Langote  wrote:
> I have moved the original functionality of GetCachedPlan() to
> GetCachedPlanInternal(), turning the former into a sort of controller
> as described shortly.  The latter's CheckCachedPlan() part now only
> locks the "minimal" set of, non-prunable, relations, making a note of
> whether the plan contains any prunable subnodes and thus prunable
> relations whose locking is deferred to the caller, GetCachedPlan().
> GetCachedPlan(), as a sort of controller as mentioned before, does the
> pruning if needed on the minimally valid plan returned by
> GetCachedPlanInternal(), locks the partitions that survive, and redoes
> the whole thing if the locking of partitions invalidates the plan.

After sleeping on it, I realized this doesn't have to be that
complicated.   Rather than turn GetCachedPlan() into a wrapper for
handling deferred partition locking as outlined above, I could have
changed it more simply as follows to get the same thing done:

if (!customplan)
{
-   if (CheckCachedPlan(plansource))
+   boolhasUnlockedParts = false;
+
+   if (CheckCachedPlan(plansource, ) &&
+   hasUnlockedParts &&
+   CachedPlanLockPartitions(plansource, boundParams, owner, extra))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;

Attached updated patch does it like that.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v30-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patch
Description: Binary data


v30-0001-Preparatory-refactoring-before-reworking-CachedP.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-12-14 Thread Amit Langote

On Tue, Dec 13, 2022 at 2:24 AM Alvaro Herrera  wrote:
> On 2022-Dec-12, Amit Langote wrote:
> > I started feeling like putting all the new logic being added
> > by this patch into plancache.c at the heart of GetCachedPlan() and
> > tweaking its API in kind of unintuitive ways may not have been such a
> > good idea to begin with.  So I started thinking again about your
> > GetRunnablePlan() wrapper idea and thought maybe we could do something
> > with it.  Let's say we name it GetCachedPlanLockPartitions() and put
> > the logic that does initial pruning with the new
> > ExecutorDoInitialPruning() in it, instead of in the normal
> > GetCachedPlan() path.  Any callers that call GetCachedPlan() instead
> > call GetCachedPlanLockPartitions() with either the List ** parameter
> > as now or some container struct if that seems better.  Whether
> > GetCachedPlanLockPartitions() needs to do anything other than return
> > the CachedPlan returned by GetCachedPlan() can be decided by the
> > latter setting, say, CachedPlan.has_unlocked_partitions.  That will be
> > done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
> > any of the PlannedStmts it sees, locking only the
> > PlannedStmt.minLockRelids set (which is all relations where no pruning
> > is needed!), leaving the partition locking to
> > GetCachedPlanLockPartitions().
>
> Hmm.  This doesn't sound totally unreasonable, except to the point David
> was making that perhaps we may want this container struct to accomodate
> other things in the future than just the partition pruning results, so I
> think its name (and that of the function that produces it) ought to be a
> little more generic than that.
>
> (I think this also answers your question on whether a List ** is better
> than a container struct.)

OK, so here's a WIP attempt at that.

I have moved the original functionality of GetCachedPlan() to
GetCachedPlanInternal(), turning the former into a sort of controller
as described shortly.  The latter's CheckCachedPlan() part now only
locks the "minimal" set of, non-prunable, relations, making a note of
whether the plan contains any prunable subnodes and thus prunable
relations whose locking is deferred to the caller, GetCachedPlan().
GetCachedPlan(), as a sort of controller as mentioned before, does the
pruning if needed on the minimally valid plan returned by
GetCachedPlanInternal(), locks the partitions that survive, and redoes
the whole thing if the locking of partitions invalidates the plan.

The pruning results are returned through the new output parameter of
GetCachedPlan() of type CachedPlanExtra.  I named it so after much
consideration, because all the new logic that produces stuff to put
into it is a part of the plancache module and has to do with
manipulating a CachedPlan.  (I had considered CachedPlanExecInfo to
indicate that it contains information that is to be forwarded to the
executor, though that just didn't seem to fit in plancache.h.)

I have broken out a few things into a preparatory patch 0001.  Mainly,
it invents PlannedStmt.minLockRelids to replace the
AcquireExecutorLocks()'s current loop over the range table to figure
out the relations to lock.  I also threw in a couple of pruning
related non-functional changes in there to make it easier to read the
0002, which is the main patch.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

v29-0001-Preparatory-refactoring-before-reworking-CachedP.patch
Description: Binary data

v29-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-12-12 Thread Alvaro Herrera

On 2022-Dec-12, Amit Langote wrote:

> I started feeling like putting all the new logic being added
> by this patch into plancache.c at the heart of GetCachedPlan() and
> tweaking its API in kind of unintuitive ways may not have been such a
> good idea to begin with.  So I started thinking again about your
> GetRunnablePlan() wrapper idea and thought maybe we could do something
> with it.  Let's say we name it GetCachedPlanLockPartitions() and put
> the logic that does initial pruning with the new
> ExecutorDoInitialPruning() in it, instead of in the normal
> GetCachedPlan() path.  Any callers that call GetCachedPlan() instead
> call GetCachedPlanLockPartitions() with either the List ** parameter
> as now or some container struct if that seems better.  Whether
> GetCachedPlanLockPartitions() needs to do anything other than return
> the CachedPlan returned by GetCachedPlan() can be decided by the
> latter setting, say, CachedPlan.has_unlocked_partitions.  That will be
> done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
> any of the PlannedStmts it sees, locking only the
> PlannedStmt.minLockRelids set (which is all relations where no pruning
> is needed!), leaving the partition locking to
> GetCachedPlanLockPartitions().

Hmm.  This doesn't sound totally unreasonable, except to the point David
was making that perhaps we may want this container struct to accomodate
other things in the future than just the partition pruning results, so I
think its name (and that of the function that produces it) ought to be a
little more generic than that.

(I think this also answers your question on whether a List ** is better
than a container struct.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)

Re: generic plans and "initial" pruning

2022-12-12 Thread Amit Langote

On Fri, Dec 9, 2022 at 8:37 PM Alvaro Herrera  wrote:
> On 2022-Dec-09, Amit Langote wrote:
>
> > Pruning will be done afresh on every fetch of a given cached plan when
> > CheckCachedPlan() is called on it, so the part_prune_results_list part
> > will be discarded and rebuilt as many times as the plan is executed.
> > You'll find a description around CachedPlanSavePartitionPruneResults()
> > that's in v12.
>
> I see.
>
> In that case, a separate container struct seems warranted.

I thought about this today and played around with some container struct ideas.

Though, I started feeling like putting all the new logic being added
by this patch into plancache.c at the heart of GetCachedPlan() and
tweaking its API in kind of unintuitive ways may not have been such a
good idea to begin with.  So I started thinking again about your
GetRunnablePlan() wrapper idea and thought maybe we could do something
with it.  Let's say we name it GetCachedPlanLockPartitions() and put
the logic that does initial pruning with the new
ExecutorDoInitialPruning() in it, instead of in the normal
GetCachedPlan() path.  Any callers that call GetCachedPlan() instead
call GetCachedPlanLockPartitions() with either the List ** parameter
as now or some container struct if that seems better.  Whether
GetCachedPlanLockPartitions() needs to do anything other than return
the CachedPlan returned by GetCachedPlan() can be decided by the
latter setting, say, CachedPlan.has_unlocked_partitions.  That will be
done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
any of the PlannedStmts it sees, locking only the
PlannedStmt.minLockRelids set (which is all relations where no pruning
is needed!), leaving the partition locking to
GetCachedPlanLockPartitions().  If the CachedPlan is invalidated
during the partition locking phase, it calls GetCachedPlan() again;
maybe some refactoring is needed to avoid too much useless work in
such cases.

Thoughts?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-12-09 Thread Alvaro Herrera

On 2022-Dec-09, Amit Langote wrote:

> Pruning will be done afresh on every fetch of a given cached plan when
> CheckCachedPlan() is called on it, so the part_prune_results_list part
> will be discarded and rebuilt as many times as the plan is executed.
> You'll find a description around CachedPlanSavePartitionPruneResults()
> that's in v12.

I see.

In that case, a separate container struct seems warranted.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees."  (E. Dijkstra)

Re: generic plans and "initial" pruning

2022-12-09 Thread Amit Langote

On Fri, Dec 9, 2022 at 7:49 PM Alvaro Herrera  wrote:
> On 2022-Dec-09, Amit Langote wrote:
> > On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera  
> > wrote:
> > > Remind me again why is part_prune_results_list not part of struct
> > > CachedPlan then?  I tried to understand that based on comments upthread,
> > > but I was unable to find anything.
> >
> > > (My first reaction to your above comment was "well, rename GetCachedPlan
> > > then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
> > > in any way a structure that must be "immutable" in the way parser output
> > > is.  Looking at the comment at the top of plancache.c it appears to me
> > > that it isn't, but maybe I'm missing something.)
> >
> > CachedPlan *is* supposed to be read-only per the comment above
> > CachedPlanSource definition:
> >
> >  * ...If we are using a generic
> >  * cached plan then it is meant to be re-used across multiple executions, so
> >  * callers must always treat CachedPlans as read-only.
>
> I read that as implying that the part_prune_results_list must remain
> intact as long as no invalidations occur.  Does part_prune_result_list
> really change as a result of something other than a sinval event?
> Keep in mind that if a sinval message that touches one of the relations
> in the plan arrives, then we'll discard it and generate it afresh.  I
> don't see that the part_prune_results_list would change otherwise, but
> maybe I misunderstand?

Pruning will be done afresh on every fetch of a given cached plan when
CheckCachedPlan() is called on it, so the part_prune_results_list part
will be discarded and rebuilt as many times as the plan is executed.
You'll find a description around CachedPlanSavePartitionPruneResults()
that's in v12.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-12-09 Thread Alvaro Herrera

On 2022-Dec-09, Amit Langote wrote:

> On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera  wrote:

> > Remind me again why is part_prune_results_list not part of struct
> > CachedPlan then?  I tried to understand that based on comments upthread,
> > but I was unable to find anything.
> 
> It used to be part of CachedPlan for a brief period of time (in patch
> v12 I posted in [1]), but David, in his reply to [1], said he wasn't
> so sure that it belonged there.

I'm not sure I necessarily agree with that.  I'll have a look at v12 to
try and understand what was David so unhappy about.

> > (My first reaction to your above comment was "well, rename GetCachedPlan
> > then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
> > in any way a structure that must be "immutable" in the way parser output
> > is.  Looking at the comment at the top of plancache.c it appears to me
> > that it isn't, but maybe I'm missing something.)
> 
> CachedPlan *is* supposed to be read-only per the comment above
> CachedPlanSource definition:
> 
>  * ...If we are using a generic
>  * cached plan then it is meant to be re-used across multiple executions, so
>  * callers must always treat CachedPlans as read-only.

I read that as implying that the part_prune_results_list must remain
intact as long as no invalidations occur.  Does part_prune_result_list
really change as a result of something other than a sinval event?
Keep in mind that if a sinval message that touches one of the relations
in the plan arrives, then we'll discard it and generate it afresh.  I
don't see that the part_prune_results_list would change otherwise, but
maybe I misunderstand?

> FYI, there was even an idea of putting a PartitionPruneResults for a
> given PlannedStmt into the PlannedStmt itself [2], but PlannedStmt is
> supposed to be read-only too [3].

Hmm, I'm not familiar with PlannedStmt lifetime, but I'm definitely not
betting that Tom is wrong about this.

> Maybe we need some new overarching context when invoking plancache, if
> Portal can't already be it, whose struct can be passed to
> GetCachedPlan() to put the pruning results in?  Perhaps,
> GetRunnablePlan() that you floated could be a wrapper for
> GetCachedPlan(), owning that new context.

Perhaps that is a solution.  I'm not sure.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"

Re: generic plans and "initial" pruning

2022-12-09 Thread Amit Langote

On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera  wrote:
> On 2022-Dec-09, Amit Langote wrote:
> > On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera  
> > wrote:
> > > I find the API of GetCachedPlans a little weird after this patch.
>
> > David, in his Apr 7 reply on this thread, also sounded to suggest
> > something similar.
> >
> > Hmm, I was / am not so sure if GetCachedPlan() should return something
> > that is not CachedPlan.  An idea I had today was to replace the
> > part_prune_results_list output List parameter with, say,
> > QueryInitPruningResult, or something like that and put the current
> > list into that struct.   Was looking at QueryEnvironment to come up
> > with *that* name.  Any thoughts?
>
> Remind me again why is part_prune_results_list not part of struct
> CachedPlan then?  I tried to understand that based on comments upthread,
> but I was unable to find anything.

It used to be part of CachedPlan for a brief period of time (in patch
v12 I posted in [1]), but David, in his reply to [1], said he wasn't
so sure that it belonged there.

> (My first reaction to your above comment was "well, rename GetCachedPlan
> then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
> in any way a structure that must be "immutable" in the way parser output
> is.  Looking at the comment at the top of plancache.c it appears to me
> that it isn't, but maybe I'm missing something.)

CachedPlan *is* supposed to be read-only per the comment above
CachedPlanSource definition:

 * ...If we are using a generic
 * cached plan then it is meant to be re-used across multiple executions, so
 * callers must always treat CachedPlans as read-only.

FYI, there was even an idea of putting a PartitionPruneResults for a
given PlannedStmt into the PlannedStmt itself [2], but PlannedStmt is
supposed to be read-only too [3].

Maybe we need some new overarching context when invoking plancache, if
Portal can't already be it, whose struct can be passed to
GetCachedPlan() to put the pruning results in?  Perhaps,
GetRunnablePlan() that you floated could be a wrapper for
GetCachedPlan(), owning that new context.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] 
https://www.postgresql.org/message-id/CA%2BHiwqH4qQ_YVROr7TY0jSCuGn0oHhH79_DswOdXWN5UnMCBtQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAApHDvp_DjVVkgSV24%2BUF7p_yKWeepgoo%2BW2SWLLhNmjwHTVYQ%40mail.gmail.com
[3] https://www.postgresql.org/message-id/922566.1648784745%40sss.pgh.pa.us

Re: generic plans and "initial" pruning

2022-12-09 Thread Alvaro Herrera

On 2022-Dec-09, Amit Langote wrote:

> On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera  wrote:
> > I find the API of GetCachedPlans a little weird after this patch.

> David, in his Apr 7 reply on this thread, also sounded to suggest
> something similar.
> 
> Hmm, I was / am not so sure if GetCachedPlan() should return something
> that is not CachedPlan.  An idea I had today was to replace the
> part_prune_results_list output List parameter with, say,
> QueryInitPruningResult, or something like that and put the current
> list into that struct.   Was looking at QueryEnvironment to come up
> with *that* name.  Any thoughts?

Remind me again why is part_prune_results_list not part of struct
CachedPlan then?  I tried to understand that based on comments upthread,
but I was unable to find anything.

(My first reaction to your above comment was "well, rename GetCachedPlan
then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
in any way a structure that must be "immutable" in the way parser output
is.  Looking at the comment at the top of plancache.c it appears to me
that it isn't, but maybe I'm missing something.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The Postgresql hackers have what I call a "NASA space shot" mentality.
 Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)

Re: generic plans and "initial" pruning

2022-12-09 Thread Amit Langote

Thanks for the review.

On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera  wrote:
> I find the API of GetCachedPlans a little weird after this patch.  I
> think it may be better to have it return a pointer of a new struct --
> one that contains both the CachedPlan pointer and the list of pruning
> results.  (As I understand, the sole caller that isn't interested in the
> pruning results, SPI_plan_get_cached_plan, can be explained by the fact
> that it knows there won't be any.  So I don't think we need to worry
> about this case?)

David, in his Apr 7 reply on this thread, also sounded to suggest
something similar.

Hmm, I was / am not so sure if GetCachedPlan() should return something
that is not CachedPlan.  An idea I had today was to replace the
part_prune_results_list output List parameter with, say,
QueryInitPruningResult, or something like that and put the current
list into that struct.   Was looking at QueryEnvironment to come up
with *that* name.  Any thoughts?

> And I think you should make that struct also be the last argument of
> PortalDefineQuery, so you don't need the separate
> PortalStorePartitionPruneResults function -- because as far as I can
> tell, the callers that pass a non-NULL pointer there are the exactly
> same that later call PortalStorePartitionPruneResults.

Yes, it would be better to not need PortalStorePartitionPruneResults.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-12-06 Thread Alvaro Herrera

I find the API of GetCachedPlans a little weird after this patch.  I
think it may be better to have it return a pointer of a new struct --
one that contains both the CachedPlan pointer and the list of pruning
results.  (As I understand, the sole caller that isn't interested in the
pruning results, SPI_plan_get_cached_plan, can be explained by the fact
that it knows there won't be any.  So I don't think we need to worry
about this case?)

And I think you should make that struct also be the last argument of
PortalDefineQuery, so you don't need the separate
PortalStorePartitionPruneResults function -- because as far as I can
tell, the callers that pass a non-NULL pointer there are the exactly
same that later call PortalStorePartitionPruneResults.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

Re: generic plans and "initial" pruning

2022-12-04 Thread Amit Langote

On Mon, Dec 5, 2022 at 12:00 PM Amit Langote  wrote:
> On Fri, Dec 2, 2022 at 7:40 PM Amit Langote  wrote:
> > Thought it might be good for PartitionPruneResult to also have
> > root_parent_relids that matches with the corresponding
> > PartitionPruneInfo.  ExecInitPartitionPruning() does a sanity check
> > that the root_parent_relids of a given pair of PartitionPrune{Info |
> > Result} match.
> >
> > Posting the patch separately as the attached 0002, just in case you
> > might think that the extra cross-checking would be an overkill.
>
> Rebased over 92c4dafe1eed and fixed some factual mistakes in the
> comment above ExecutorDoInitialPruning().

Sorry, I had forgotten to git-add hunks including some cosmetic
changes in that one.  Here's another version.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v28-0002-Add-root_parent_relids-to-PartitionPruneResult.patch
Description: Binary data


v28-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-12-04 Thread Amit Langote

On Fri, Dec 2, 2022 at 7:40 PM Amit Langote  wrote:
> On Thu, Dec 1, 2022 at 9:43 PM Amit Langote  wrote:
> > On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera  
> > wrote:
> > > On 2022-Dec-01, Amit Langote wrote:
> > > > Hmm, how about keeping the [Merge]Append's parent relation's RT index
> > > > in the PartitionPruneInfo and passing it down to
> > > > ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> > > > cross-checking?  Both Append and MergeAppend already have a
> > > > 'apprelids' field that we can save a copy of in the
> > > > PartitionPruneInfo.  Tried that in the attached delta patch.
> > >
> > > Ah yeah, that sounds about what I was thinking.  I've merged that in and
> > > pushed to github, which had a strange pg_upgrade failure on Windows
> > > mentioning log files that were not captured by the CI tooling.  So I
> > > pushed another one trying to grab those files, in case it wasn't an
> > > one-off failure.  It's running now:
> > >   https://cirrus-ci.com/task/5857239638999040
> > >
> > > If all goes well with this run, I'll get this 0001 pushed.
> >
> > Thanks for pushing 0001.
> >
> > Rebased 0002 attached.
>
> Thought it might be good for PartitionPruneResult to also have
> root_parent_relids that matches with the corresponding
> PartitionPruneInfo.  ExecInitPartitionPruning() does a sanity check
> that the root_parent_relids of a given pair of PartitionPrune{Info |
> Result} match.
>
> Posting the patch separately as the attached 0002, just in case you
> might think that the extra cross-checking would be an overkill.

Rebased over 92c4dafe1eed and fixed some factual mistakes in the
comment above ExecutorDoInitialPruning().

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v27-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data


v27-0002-Add-root_parent_relids-to-PartitionPruneResult.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-12-02 Thread Amit Langote

On Thu, Dec 1, 2022 at 9:43 PM Amit Langote  wrote:
> On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera  wrote:
> > On 2022-Dec-01, Amit Langote wrote:
> > > Hmm, how about keeping the [Merge]Append's parent relation's RT index
> > > in the PartitionPruneInfo and passing it down to
> > > ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> > > cross-checking?  Both Append and MergeAppend already have a
> > > 'apprelids' field that we can save a copy of in the
> > > PartitionPruneInfo.  Tried that in the attached delta patch.
> >
> > Ah yeah, that sounds about what I was thinking.  I've merged that in and
> > pushed to github, which had a strange pg_upgrade failure on Windows
> > mentioning log files that were not captured by the CI tooling.  So I
> > pushed another one trying to grab those files, in case it wasn't an
> > one-off failure.  It's running now:
> >   https://cirrus-ci.com/task/5857239638999040
> >
> > If all goes well with this run, I'll get this 0001 pushed.
>
> Thanks for pushing 0001.
>
> Rebased 0002 attached.

Thought it might be good for PartitionPruneResult to also have
root_parent_relids that matches with the corresponding
PartitionPruneInfo.  ExecInitPartitionPruning() does a sanity check
that the root_parent_relids of a given pair of PartitionPrune{Info |
Result} match.

Posting the patch separately as the attached 0002, just in case you
might think that the extra cross-checking would be an overkill.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v26-0002-Add-root_parent_relids-to-PartitionPruneResult.patch
Description: Binary data


v26-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-12-01 Thread Amit Langote

On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera  wrote:
> On 2022-Dec-01, Amit Langote wrote:
> > Hmm, how about keeping the [Merge]Append's parent relation's RT index
> > in the PartitionPruneInfo and passing it down to
> > ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> > cross-checking?  Both Append and MergeAppend already have a
> > 'apprelids' field that we can save a copy of in the
> > PartitionPruneInfo.  Tried that in the attached delta patch.
>
> Ah yeah, that sounds about what I was thinking.  I've merged that in and
> pushed to github, which had a strange pg_upgrade failure on Windows
> mentioning log files that were not captured by the CI tooling.  So I
> pushed another one trying to grab those files, in case it wasn't an
> one-off failure.  It's running now:
>   https://cirrus-ci.com/task/5857239638999040
>
> If all goes well with this run, I'll get this 0001 pushed.

Thanks for pushing 0001.

Rebased 0002 attached.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v25-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-12-01 Thread Alvaro Herrera

On 2022-Dec-01, Amit Langote wrote:

> Hmm, how about keeping the [Merge]Append's parent relation's RT index
> in the PartitionPruneInfo and passing it down to
> ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> cross-checking?  Both Append and MergeAppend already have a
> 'apprelids' field that we can save a copy of in the
> PartitionPruneInfo.  Tried that in the attached delta patch.

Ah yeah, that sounds about what I was thinking.  I've merged that in and
pushed to github, which had a strange pg_upgrade failure on Windows
mentioning log files that were not captured by the CI tooling.  So I
pushed another one trying to grab those files, in case it wasn't an
one-off failure.  It's running now:
  https://cirrus-ci.com/task/5857239638999040

If all goes well with this run, I'll get this 0001 pushed.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)

Re: generic plans and "initial" pruning

2022-11-30 Thread Amit Langote

Hi Alvaro,

Thanks for looking at this one.

On Thu, Dec 1, 2022 at 3:12 AM Alvaro Herrera  wrote:
> Looking at 0001, I wonder if we should have a crosscheck that a
> PartitionPruneInfo you got from following an index is indeed constructed
> for the relation that you think it is: previously, you were always sure
> that the prune struct is for this node, because you followed a pointer
> that was set up in the node itself.  Now you only have an index, and you
> have to trust that the index is correct.

Yeah, a crosscheck sounds like a good idea.

> I'm not sure how to implement this, or even if it's doable at all.
> Keeping the OID of the partitioned table in the PartitionPruneInfo
> struct is easy, but I don't know how to check it in ExecInitMergeAppend
> and ExecInitAppend.

Hmm, how about keeping the [Merge]Append's parent relation's RT index
in the PartitionPruneInfo and passing it down to
ExecInitPartitionPruning() from ExecInit[Merge]Append() for
cross-checking?  Both Append and MergeAppend already have a
'apprelids' field that we can save a copy of in the
PartitionPruneInfo.  Tried that in the attached delta patch.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

PartitionPruneInfo-relids.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-11-30 Thread Alvaro Herrera

Looking at 0001, I wonder if we should have a crosscheck that a
PartitionPruneInfo you got from following an index is indeed constructed
for the relation that you think it is: previously, you were always sure
that the prune struct is for this node, because you followed a pointer
that was set up in the node itself.  Now you only have an index, and you
have to trust that the index is correct.

I'm not sure how to implement this, or even if it's doable at all.
Keeping the OID of the partitioned table in the PartitionPruneInfo
struct is easy, but I don't know how to check it in ExecInitMergeAppend
and ExecInitAppend.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)

Re: generic plans and "initial" pruning

2022-11-07 Thread Amit Langote

On Thu, Oct 27, 2022 at 11:41 AM Amit Langote  wrote:
> On Mon, Oct 17, 2022 at 6:29 PM Amit Langote  wrote:
> > On Wed, Oct 12, 2022 at 4:36 PM Amit Langote  
> > wrote:
> > > On Fri, Jul 29, 2022 at 1:20 PM Amit Langote  
> > > wrote:
> > > > On Thu, Jul 28, 2022 at 1:27 AM Robert Haas  
> > > > wrote:
> > > > > 0001 adds es_part_prune_result but does not use it, so maybe the
> > > > > introduction of that field should be deferred until it's needed for
> > > > > something.
> > > >
> > > > Oops, looks like a mistake when breaking the patch.  Will move that bit 
> > > > to 0002.
> > >
> > > Fixed that and also noticed that I had defined PartitionPruneResult in
> > > the wrong header (execnodes.h).  That led to PartitionPruneResult
> > > nodes not being able to be written and read, because
> > > src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
> > > routines for the nodes defined in execnodes.h.  I moved its definition
> > > to plannodes.h, even though it is not actually the planner that
> > > instantiates those; no other include/nodes header sounds better.
> > >
> > > One more thing I realized is that Bitmapsets added to the List
> > > PartitionPruneResult.valid_subplan_offs_list are not actually
> > > read/write-able.  That's a problem that I also faced in [1], so I
> > > proposed a patch there to make Bitmapset a read/write-able Node and
> > > mark (only) the Bitmapsets that are added into read/write-able node
> > > trees with the corresponding NodeTag.  I'm including that patch here
> > > as well (0002) for the main patch to work (pass
> > > -DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
> > > to discuss it in its own thread?
> >
> > Had second thoughts on the use of List of Bitmapsets for this, such
> > that the make-Bitmapset-Nodes patch is no longer needed.
> >
> > I had defined PartitionPruneResult such that it stood for the results
> > of pruning for all PartitionPruneInfos contained in
> > PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
> > can use partition pruning in a given plan).  So, it had a List of
> > Bitmapset.  I think it's perhaps better for PartitionPruneResult to
> > cover only one PartitionPruneInfo and thus need only a Bitmapset and
> > not a List thereof, which I have implemented in the attached updated
> > patch 0002.  So, instead of needing to pass around a
> > PartitionPruneResult with each PlannedStmt, this now passes a List of
> > PartitionPruneResult with an entry for each in
> > PlannedStmt.partPruneInfos.
>
> Rebased over 3b2db22fe.

Updated 0002 to cope with AssertArg() being removed from the tree.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v24-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data


v24-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-10-26 Thread Amit Langote

On Mon, Oct 17, 2022 at 6:29 PM Amit Langote  wrote:
> On Wed, Oct 12, 2022 at 4:36 PM Amit Langote  wrote:
> > On Fri, Jul 29, 2022 at 1:20 PM Amit Langote  
> > wrote:
> > > On Thu, Jul 28, 2022 at 1:27 AM Robert Haas  wrote:
> > > > 0001 adds es_part_prune_result but does not use it, so maybe the
> > > > introduction of that field should be deferred until it's needed for
> > > > something.
> > >
> > > Oops, looks like a mistake when breaking the patch.  Will move that bit 
> > > to 0002.
> >
> > Fixed that and also noticed that I had defined PartitionPruneResult in
> > the wrong header (execnodes.h).  That led to PartitionPruneResult
> > nodes not being able to be written and read, because
> > src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
> > routines for the nodes defined in execnodes.h.  I moved its definition
> > to plannodes.h, even though it is not actually the planner that
> > instantiates those; no other include/nodes header sounds better.
> >
> > One more thing I realized is that Bitmapsets added to the List
> > PartitionPruneResult.valid_subplan_offs_list are not actually
> > read/write-able.  That's a problem that I also faced in [1], so I
> > proposed a patch there to make Bitmapset a read/write-able Node and
> > mark (only) the Bitmapsets that are added into read/write-able node
> > trees with the corresponding NodeTag.  I'm including that patch here
> > as well (0002) for the main patch to work (pass
> > -DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
> > to discuss it in its own thread?
>
> Had second thoughts on the use of List of Bitmapsets for this, such
> that the make-Bitmapset-Nodes patch is no longer needed.
>
> I had defined PartitionPruneResult such that it stood for the results
> of pruning for all PartitionPruneInfos contained in
> PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
> can use partition pruning in a given plan).  So, it had a List of
> Bitmapset.  I think it's perhaps better for PartitionPruneResult to
> cover only one PartitionPruneInfo and thus need only a Bitmapset and
> not a List thereof, which I have implemented in the attached updated
> patch 0002.  So, instead of needing to pass around a
> PartitionPruneResult with each PlannedStmt, this now passes a List of
> PartitionPruneResult with an entry for each in
> PlannedStmt.partPruneInfos.

Rebased over 3b2db22fe.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v23-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch
Description: Binary data


v23-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-10-17 Thread Amit Langote

On Wed, Oct 12, 2022 at 4:36 PM Amit Langote  wrote:
> On Fri, Jul 29, 2022 at 1:20 PM Amit Langote  wrote:
> > On Thu, Jul 28, 2022 at 1:27 AM Robert Haas  wrote:
> > > 0001 adds es_part_prune_result but does not use it, so maybe the
> > > introduction of that field should be deferred until it's needed for
> > > something.
> >
> > Oops, looks like a mistake when breaking the patch.  Will move that bit to 
> > 0002.
>
> Fixed that and also noticed that I had defined PartitionPruneResult in
> the wrong header (execnodes.h).  That led to PartitionPruneResult
> nodes not being able to be written and read, because
> src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
> routines for the nodes defined in execnodes.h.  I moved its definition
> to plannodes.h, even though it is not actually the planner that
> instantiates those; no other include/nodes header sounds better.
>
> One more thing I realized is that Bitmapsets added to the List
> PartitionPruneResult.valid_subplan_offs_list are not actually
> read/write-able.  That's a problem that I also faced in [1], so I
> proposed a patch there to make Bitmapset a read/write-able Node and
> mark (only) the Bitmapsets that are added into read/write-able node
> trees with the corresponding NodeTag.  I'm including that patch here
> as well (0002) for the main patch to work (pass
> -DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
> to discuss it in its own thread?

Had second thoughts on the use of List of Bitmapsets for this, such
that the make-Bitmapset-Nodes patch is no longer needed.

I had defined PartitionPruneResult such that it stood for the results
of pruning for all PartitionPruneInfos contained in
PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
can use partition pruning in a given plan).  So, it had a List of
Bitmapset.  I think it's perhaps better for PartitionPruneResult to
cover only one PartitionPruneInfo and thus need only a Bitmapset and
not a List thereof, which I have implemented in the attached updated
patch 0002.  So, instead of needing to pass around a
PartitionPruneResult with each PlannedStmt, this now passes a List of
PartitionPruneResult with an entry for each in
PlannedStmt.partPruneInfos.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

v22-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch
Description: Binary data

v22-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-07-29 Thread Robert Haas

On Fri, Jul 29, 2022 at 12:47 PM Tom Lane  wrote:
> > I'm just uncertain whether what Amit has implemented is the
> > least-annoying way to go about it... any thoughts on that,
> > specifically as it pertains to this patch?
>
> I haven't looked at this patch at all.  I'll try to make some
> time for it, but probably not today.

OK, thanks. The preliminary patch I'm talking about here is pretty
short, so you could probably look at that part of it, at least, in
some relatively small amount of time. And I think it's also in pretty
reasonable shape apart from this issue. But, as usual, there's the
question of how well one can evaluate a preliminary patch without
reviewing the full patch in detail.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-07-29 Thread Tom Lane

Robert Haas  writes:
> ... it's
> always struck me as a little unfortunate that we basically test
> whether a var is equal by testing whether the varno and varlevelsup
> are equal. That only works if you assume that you can never end up
> comparing two vars from thoroughly unrelated parts of the tree, such
> that the subquery one level up from one might be different from the
> subquery one level up from the other.

Yeah, that's always bothered me a little as well.  I've yet to see a
case where it causes a problem in practice.  But I think that if, say,
we were to try to do any sort of cross-query-level optimization, then
the ambiguity could rise up to bite us.  That might be a situation
where a flat rangetable would be worth the trouble.

> I'm just uncertain whether what Amit has implemented is the
> least-annoying way to go about it... any thoughts on that,
> specifically as it pertains to this patch?

I haven't looked at this patch at all.  I'll try to make some
time for it, but probably not today.

regards, tom lane

Re: generic plans and "initial" pruning

2022-07-29 Thread Robert Haas

On Fri, Jul 29, 2022 at 11:04 AM Tom Lane  wrote:
> We could probably make that work, but I'm skeptical that it would
> really be an improvement overall, for a couple of reasons.
>
> (1) The need for merge-rangetables-and-renumber-Vars logic doesn't
> go away.  It just moves from setrefs.c to the rewriter, which would
> have to do it when expanding views.  This would be a net loss
> performance-wise, I think, because setrefs.c can do it as part of a
> parsetree scan that it has to perform anyway for other housekeeping
> reasons; but the rewriter would need a brand new pass over the tree.
> Admittedly that pass would only happen for view replacement, but
> it's still not open-and-shut that there'd be a performance win.
>
> (2) The need for varlevelsup and similar fields doesn't go away,
> I think, because we need those for semantic purposes such as
> discovering the query level that aggregates are associated with.
> That means that subquery flattening still has to make a pass over
> the tree to touch every Var's varlevelsup; so not having to adjust
> varno at the same time would save little.
>
> I'm not sure whether I think it's a net plus or net minus that
> varno would become effectively independent of varlevelsup.
> It'd be different from the way we think of them now, for sure,
> and I think it'd take awhile to flush out bugs arising from such
> a redefinition.

Interesting. Thanks for your thoughts. I guess it's not as clear-cut
as I thought, but I still can't help feeling like we're doing an awful
lot of expensive rearrangement at the end of query planning.

I kind of wonder whether varlevelsup is the wrong idea. Like, suppose
we instead handed out subquery identifiers serially, sort of like what
we do with SubTransactionId values. Then instead of testing whether
varlevelsup>0 you test whether varsubqueryid==mysubqueryid. If you
flatten a query into its parent, you still need to adjust every var
that refers to the dead subquery, but you don't need to adjust vars
that refer to subqueries underneath it. Their level changes, but their
identity doesn't. Maybe that doesn't really help that much, but it's
always struck me as a little unfortunate that we basically test
whether a var is equal by testing whether the varno and varlevelsup
are equal. That only works if you assume that you can never end up
comparing two vars from thoroughly unrelated parts of the tree, such
that the subquery one level up from one might be different from the
subquery one level up from the other.

> > I don't really expect that we're ever going to change this -- and
> > certainly not on this thread. The idea of running around and replacing
> > RT indexes all over the tree is deeply embedded in the system. But are
> > we really sure we want to add a second kind of index that we have to
> > run around and adjust at the same time?
>
> You probably want to avert your eyes from [1], then ;-).  Although
> I'm far from convinced that the cross-list index fields currently
> proposed there are actually necessary; the cost to adjust them
> during rangetable merging could outweigh any benefit.

I really like the idea of that patch overall, actually; I think
permissions checking is a good example of something that shouldn't
require walking the whole query tree but currently does. And actually,
I think the same thing is true here: we shouldn't need to walk the
whole query tree to find the pruning information, but right now we do.
I'm just uncertain whether what Amit has implemented is the
least-annoying way to go about it... any thoughts on that,
specifically as it pertains to this patch?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-07-29 Thread Tom Lane

Robert Haas  writes:
> That's not quite my question, though. Why do we ever build a non-flat
> range table in the first place? Like, instead of assigning indexes
> relative to the current subquery level, why not just assign them
> relative to the whole query from the start?

We could probably make that work, but I'm skeptical that it would
really be an improvement overall, for a couple of reasons.

(1) The need for merge-rangetables-and-renumber-Vars logic doesn't
go away.  It just moves from setrefs.c to the rewriter, which would
have to do it when expanding views.  This would be a net loss
performance-wise, I think, because setrefs.c can do it as part of a
parsetree scan that it has to perform anyway for other housekeeping
reasons; but the rewriter would need a brand new pass over the tree.
Admittedly that pass would only happen for view replacement, but
it's still not open-and-shut that there'd be a performance win.

(2) The need for varlevelsup and similar fields doesn't go away,
I think, because we need those for semantic purposes such as
discovering the query level that aggregates are associated with.
That means that subquery flattening still has to make a pass over
the tree to touch every Var's varlevelsup; so not having to adjust
varno at the same time would save little.

I'm not sure whether I think it's a net plus or net minus that
varno would become effectively independent of varlevelsup.
It'd be different from the way we think of them now, for sure,
and I think it'd take awhile to flush out bugs arising from such
a redefinition.

> I don't really expect that we're ever going to change this -- and
> certainly not on this thread. The idea of running around and replacing
> RT indexes all over the tree is deeply embedded in the system. But are
> we really sure we want to add a second kind of index that we have to
> run around and adjust at the same time?

You probably want to avert your eyes from [1], then ;-).  Although
I'm far from convinced that the cross-list index fields currently
proposed there are actually necessary; the cost to adjust them
during rangetable merging could outweigh any benefit.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/ca+hiwqgjjdmuhdsfv-u2qhkjjt9st7xh9jxc_irsaq1taus...@mail.gmail.com

Re: generic plans and "initial" pruning

2022-07-29 Thread Robert Haas

On Fri, Jul 29, 2022 at 12:55 AM Tom Lane  wrote:
> It would not be profitable to flatten the range table before we've
> done remove_useless_joins.  We'd end up with useless entries from
> subqueries that ultimately aren't there.  We could perhaps do it
> after we finish that phase, but I don't really see the point: it
> wouldn't be better than what we do now, just the same work at a
> different time.

That's not quite my question, though. Why do we ever build a non-flat
range table in the first place? Like, instead of assigning indexes
relative to the current subquery level, why not just assign them
relative to the whole query from the start? It can't really be that
we've done it this way because of remove_useless_joins(), because
we've been building separate range tables and later flattening them
for longer than join removal has existed as a feature.

What bugs me is that it's very much not free. By building a bunch of
separate range tables and combining them later, we generate extra
work: we have to go back and adjust RT indexes after-the-fact. We pay
that overhead for every query, not just the ones that end up with some
unused entries in the range table. And why would it matter if we did
end up with some useless entries in the range table, anyway? If
there's some semantic difference, we could add a flag to mark those
entries as needing to be ignored, which seems way better than crawling
all over the whole tree adjusting RTIs everywhere.

I don't really expect that we're ever going to change this -- and
certainly not on this thread. The idea of running around and replacing
RT indexes all over the tree is deeply embedded in the system. But are
we really sure we want to add a second kind of index that we have to
run around and adjust at the same time?

If we are, so be it, I guess. It just looks really ugly and unnecessary to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-07-28 Thread Tom Lane

Amit Langote  writes:
> On Thu, Jul 28, 2022 at 1:27 AM Robert Haas  wrote:
>> I wonder whether it's really necessary to added the PartitionPruneInfo
>> objects to a list in PlannerInfo first and then roll them up into
>> PlannerGlobal later. I know we do that for range table entries, but
>> I've never quite understood why we do it that way instead of creating
>> a flat range table in PlannerGlobal from the start. And so by
>> extension I wonder whether this table couldn't be flat from the start
>> also.

> Tom may want to correct me but my understanding of why the planner
> waits till the end of planning to start populating the PlannerGlobal
> range table is that it is not until then that we know which subqueries
> will be scanned by the final plan tree, so also whose range table
> entries will be included in the range table passed to the executor.

It would not be profitable to flatten the range table before we've
done remove_useless_joins.  We'd end up with useless entries from
subqueries that ultimately aren't there.  We could perhaps do it
after we finish that phase, but I don't really see the point: it
wouldn't be better than what we do now, just the same work at a
different time.

regards, tom lane

Re: generic plans and "initial" pruning

2022-07-28 Thread Amit Langote

On Thu, Jul 28, 2022 at 1:27 AM Robert Haas  wrote:
> On Tue, Jul 26, 2022 at 11:01 PM Amit Langote  wrote:
> > Needed to be rebased again, over 2d04277121f this time.

Thanks for looking.

> 0001 adds es_part_prune_result but does not use it, so maybe the
> introduction of that field should be deferred until it's needed for
> something.

Oops, looks like a mistake when breaking the patch.  Will move that bit to 0002.

> I wonder whether it's really necessary to added the PartitionPruneInfo
> objects to a list in PlannerInfo first and then roll them up into
> PlannerGlobal later. I know we do that for range table entries, but
> I've never quite understood why we do it that way instead of creating
> a flat range table in PlannerGlobal from the start. And so by
> extension I wonder whether this table couldn't be flat from the start
> also.

Tom may want to correct me but my understanding of why the planner
waits till the end of planning to start populating the PlannerGlobal
range table is that it is not until then that we know which subqueries
will be scanned by the final plan tree, so also whose range table
entries will be included in the range table passed to the executor.  I
can see that subquery pull-up causes a pulled-up subquery's range
table entries to be added into the parent's query's and all its nodes
changed using OffsetVarNodes() to refer to the new RT indexes.  But
for subqueries that are not pulled up, their subplans' nodes (present
in PlannerGlboal.subplans) would still refer to the original RT
indexes (per range table in the corresponding PlannerGlobal.subroot),
which must be fixed and the end of planning is the time to do so.  Or
maybe that could be done when build_subplan() creates a subplan and
adds it to PlannerGlobal.subplans, but for some reason it's not?

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-07-27 Thread Robert Haas

On Tue, Jul 26, 2022 at 11:01 PM Amit Langote  wrote:
> Needed to be rebased again, over 2d04277121f this time.

0001 adds es_part_prune_result but does not use it, so maybe the
introduction of that field should be deferred until it's needed for
something.

I wonder whether it's really necessary to added the PartitionPruneInfo
objects to a list in PlannerInfo first and then roll them up into
PlannerGlobal later. I know we do that for range table entries, but
I've never quite understood why we do it that way instead of creating
a flat range table in PlannerGlobal from the start. And so by
extension I wonder whether this table couldn't be flat from the start
also.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-07-26 Thread Amit Langote

On Wed, Jul 13, 2022 at 4:03 PM Amit Langote  wrote:
> On Wed, Jul 13, 2022 at 3:40 PM Amit Langote  wrote:
> > Rebased over 964d01ae90c.
>
> Sorry, left some pointless hunks in there while rebasing.  Fixed in
> the attached.

Needed to be rebased again, over 2d04277121f this time.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v20-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch
Description: Binary data


v20-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-07-13 Thread Amit Langote

On Wed, Jul 13, 2022 at 3:40 PM Amit Langote  wrote:
> Rebased over 964d01ae90c.

Sorry, left some pointless hunks in there while rebasing.  Fixed in
the attached.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v19-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch
Description: Binary data


v19-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-07-13 Thread Amit Langote

Rebased over 964d01ae90c.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v18-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch
Description: Binary data


v18-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-07-05 Thread Jacob Champion

On Fri, May 27, 2022 at 1:09 AM Amit Langote  wrote:
> 0001 contains the mechanical changes of moving PartitionPruneInfo out
> of Append/MergeAppend into a list in PlannedStmt.
>
> 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
> only unpruned partitions".

This patchset will need to be rebased over 835d476fd21; looks like
just a cosmetic change.

--Jacob

Re: generic plans and "initial" pruning

2022-05-27 Thread Zhihong Yu

On Fri, May 27, 2022 at 1:10 AM Amit Langote 
wrote:

> On Mon, Apr 11, 2022 at 12:53 PM Zhihong Yu  wrote:
> > On Sun, Apr 10, 2022 at 8:05 PM Amit Langote 
> wrote:
> >> Sending v15 that fixes that to keep the cfbot green for now.
> >
> > Hi,
> >
> > +   /* RT index of the partitione table. */
> >
> > partitione -> partitioned
>
> Thanks, fixed.
>
> Also, I broke this into patches:
>
> 0001 contains the mechanical changes of moving PartitionPruneInfo out
> of Append/MergeAppend into a list in PlannedStmt.
>
> 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
> only unpruned partitions".
>
> --
> Thanks, Amit Langote
> EDB: http://www.enterprisedb.com

Hi,
In the description:

is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the

I think the second `made available` is redundant (can be omitted).

+ * Initial pruning is performed here if needed (unless it has already been
done
+ * by ExecDoInitialPruning()), and in that case only the surviving
subplans'

I wonder if there is a typo above - I don't find ExecDoInitialPruning
either in PG codebase or in the patches (except for this in the comment).
I think it should be ExecutorDoInitialPruning.

+* bit from it just above to prevent empty tail bits resulting in

I searched in the code base but didn't find mentioning of `empty tail bit`.
Do you mind explaining a bit about it ?

Cheers

Re: generic plans and "initial" pruning

2022-04-10 Thread Zhihong Yu

On Sun, Apr 10, 2022 at 8:05 PM Amit Langote 
wrote:

> On Fri, Apr 8, 2022 at 8:45 PM Amit Langote 
> wrote:
> > Most looked fine changes to me except a couple of typos, so I've
> > adopted those into the attached new version, even though I know it's
> > too late to try to apply it.
> >
> > + * XXX is it worth doing a bms_copy() on glob->minLockRelids if
> > + * glob->containsInitialPruning is true?. I'm slighly worried that the
> > + * Bitmapset could have a very long empty tail resulting in excessive
> > + * looping during AcquireExecutorLocks().
> > + */
> >
> > I guess I trust your instincts about bitmapset operation efficiency
> > and what you've written here makes sense.  It's typical for leaf
> > partitions to have been appended toward the tail end of rtable and I'd
> > imagine their indexes would be in the tail words of minLockRelids.  If
> > copying the bitmapset removes those useless words, I don't see why we
> > shouldn't do that.  So added:
> >
> > + /*
> > + * It seems worth doing a bms_copy() on glob->minLockRelids if we
> deleted
> > + * bit from it just above to prevent empty tail bits resulting in
> > + * inefficient looping during AcquireExecutorLocks().
> > + */
> > + if (glob->containsInitialPruning)
> > + glob->minLockRelids = bms_copy(glob->minLockRelids)
> >
> > Not 100% about the comment I wrote.
>
> And the quoted code change missed a semicolon in the v14 that I
> hurriedly sent on Friday.   (Had apparently forgotten to `git add` the
> hunk to fix that).
>
> Sending v15 that fixes that to keep the cfbot green for now.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Hi,

+   /* RT index of the partitione table. */

partitione -> partitioned

Cheers

Re: generic plans and "initial" pruning

2022-04-10 Thread Amit Langote

On Fri, Apr 8, 2022 at 8:45 PM Amit Langote  wrote:
> Most looked fine changes to me except a couple of typos, so I've
> adopted those into the attached new version, even though I know it's
> too late to try to apply it.
>
> + * XXX is it worth doing a bms_copy() on glob->minLockRelids if
> + * glob->containsInitialPruning is true?. I'm slighly worried that the
> + * Bitmapset could have a very long empty tail resulting in excessive
> + * looping during AcquireExecutorLocks().
> + */
>
> I guess I trust your instincts about bitmapset operation efficiency
> and what you've written here makes sense.  It's typical for leaf
> partitions to have been appended toward the tail end of rtable and I'd
> imagine their indexes would be in the tail words of minLockRelids.  If
> copying the bitmapset removes those useless words, I don't see why we
> shouldn't do that.  So added:
>
> + /*
> + * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
> + * bit from it just above to prevent empty tail bits resulting in
> + * inefficient looping during AcquireExecutorLocks().
> + */
> + if (glob->containsInitialPruning)
> + glob->minLockRelids = bms_copy(glob->minLockRelids)
>
> Not 100% about the comment I wrote.

And the quoted code change missed a semicolon in the v14 that I
hurriedly sent on Friday.   (Had apparently forgotten to `git add` the
hunk to fix that).

Sending v15 that fixes that to keep the cfbot green for now.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v15-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-04-08 Thread Amit Langote

Hi David,

On Fri, Apr 8, 2022 at 8:16 PM David Rowley  wrote:
> On Fri, 8 Apr 2022 at 17:49, Amit Langote  wrote:
> > Attached updated patch with these changes.
> Thanks for making the changes.  I started looking over this patch but
> really feel like it needs quite a few more iterations of what we've
> just been doing to get it into proper committable shape. There seems
> to be only about 40 mins to go before the freeze, so it seems very
> unrealistic that it could be made to work.

Yeah, totally understandable.

> I started trying to take a serious look at it this evening, but I feel
> like I just failed to get into it deep enough to make any meaningful
> improvements.  I'd need more time to study the problem before I could
> build up a proper opinion on how exactly I think it should work.
>
> Anyway. I've attached a small patch that's just a few things I
> adjusted or questions while reading over your v13 patch.  Some of
> these are just me questioning your code (See XXX comments) and some I
> think are improvements. Feel free to take the hunks that you see fit
> and drop anything you don't.

Thanks a lot for compiling those.

Most looked fine changes to me except a couple of typos, so I've
adopted those into the attached new version, even though I know it's
too late to try to apply it.  Re the XXX comments:

+ /* XXX why would pprune->rti_map[i] ever be zero here??? */

Yeah, no there can't be, was perhaps being overly paraioid.

+ * XXX is it worth doing a bms_copy() on glob->minLockRelids if
+ * glob->containsInitialPruning is true?. I'm slighly worried that the
+ * Bitmapset could have a very long empty tail resulting in excessive
+ * looping during AcquireExecutorLocks().
+ */

I guess I trust your instincts about bitmapset operation efficiency
and what you've written here makes sense.  It's typical for leaf
partitions to have been appended toward the tail end of rtable and I'd
imagine their indexes would be in the tail words of minLockRelids.  If
copying the bitmapset removes those useless words, I don't see why we
shouldn't do that.  So added:

+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bit from it just above to prevent empty tail bits resulting in
+ * inefficient looping during AcquireExecutorLocks().
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids)

Not 100% about the comment I wrote.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v14-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-04-08 Thread David Rowley

On Fri, 8 Apr 2022 at 17:49, Amit Langote  wrote:
> Attached updated patch with these changes.

Thanks for making the changes.  I started looking over this patch but
really feel like it needs quite a few more iterations of what we've
just been doing to get it into proper committable shape. There seems
to be only about 40 mins to go before the freeze, so it seems very
unrealistic that it could be made to work.

I started trying to take a serious look at it this evening, but I feel
like I just failed to get into it deep enough to make any meaningful
improvements.  I'd need more time to study the problem before I could
build up a proper opinion on how exactly I think it should work.

Anyway. I've attached a small patch that's just a few things I
adjusted or questions while reading over your v13 patch.  Some of
these are just me questioning your code (See XXX comments) and some I
think are improvements. Feel free to take the hunks that you see fit
and drop anything you don't.

David
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 05cc99df8f..5ee978937d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -121,6 +121,8 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan 
*planTree);
  *
  * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
  * drive the pruning will be locked before doing the pruning.
+ *
+ * 
  */
 PartitionPruneResult *
 ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 3037742b8d..e9ca6bc55f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1707,6 +1707,7 @@ ExecInitPartitionPruning(PlanState *planstate,
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,

  n_total_subplans - 1);
+   return prunestate;
}
 
/*
@@ -1714,14 +1715,15 @@ ExecInitPartitionPruning(PlanState *planstate,
 * that were removed above due to initial pruning.  No need to do this 
if
 * no steps were removed.
 */
-   if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+   if (prunestate &&
+   bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
 * We can safely skip this when !do_exec_prune, even though that
 * leaves invalid data in prunestate, because that data won't be
 * consulted again (cf initial Assert in 
ExecFindMatchingSubPlans).
 */
-   if (prunestate && prunestate->do_exec_prune)
+   if (prunestate->do_exec_prune)
PartitionPruneFixSubPlanMap(prunestate,

*initially_valid_subplans,

n_total_subplans);
@@ -1751,7 +1753,8 @@ ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, 
ParamListInfo params,
Bitmapset*valid_subplan_offs;
 
/*
-* A temporary context to allocate stuff needded to run the pruning 
steps.
+* A temporary context to for memory allocations required while 
execution
+* partition pruning steps.
 */
tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
   
"initial pruning working data",
@@ -1765,11 +1768,12 @@ ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, 
ParamListInfo params,
pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
 
/*
-* We don't yet have a PlanState for the parent plan node, so must 
create
-* a standalone ExprContext to evaluate pruning expressions, equipped 
with
-* the information about the EXTERN parameters that the caller passed 
us.
-* Note that that's okay because the initial pruning steps do not 
contain
-* anything that requires the execution to have started.
+* We don't yet have a PlanState for the parent plan node, so we must
+* create a standalone ExprContext to evaluate pruning expressions,
+* equipped with the information about the EXTERN parameters that the
+* caller passed us.  Note that that's okay because the initial pruning
+* steps do not contain anything that requires the execution to have
+* started.
 */
econtext = CreateStandaloneExprContext();
econtext->ecxt_param_list_info = params;
@@ -1874,7 +1878,6 @@ CreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo

Re: generic plans and "initial" pruning

2022-04-07 Thread Amit Langote

On Thu, Apr 7, 2022 at 9:41 PM David Rowley  wrote:
> On Thu, 7 Apr 2022 at 20:28, Amit Langote  wrote:
> > Here's an updated version.  In Particular, I removed
> > part_prune_results list from PortalData, in favor of anything that
> > needs to look at the list can instead get it from the CachedPlan
> > (PortalData.cplan).  This makes things better in 2 ways:
>
> Thanks for making those changes.
>
> I'm not overly familiar with the data structures we use for planning
> around plans between the planner and executor, but storing the pruning
> results in CachedPlan seems pretty bad. I see you've stashed it in
> there and invented a new memory context to stop leaks into the cache
> memory.
>
> Since I'm not overly familiar with these structures, I'm trying to
> imagine why you made that choice and the best I can come up with was
> that it was the most convenient thing you had to hand inside
> CheckCachedPlan().

Yeah, it's that way because it felt convenient, though I have wondered
if a simpler scheme that doesn't require any changes to the CachedPlan
data structure might be better after all.  Your pointing it out has
made me think a bit harder on that.

> I don't really have any great ideas right now on how to make this
> better. I wonder if GetCachedPlan() should be changed to return some
> struct that wraps up the CachedPlan with some sort of executor prep
> info struct that we can stash the list of PartitionPruneResults in,
> and perhaps something else one day.

I think what might be better to do now is just add an output List
parameter to GetCachedPlan() to add the PartitionPruneResult node to
instead of stashing them into CachedPlan as now.  IMHO, we should
leave inventing a new generic struct to the next project that will
make it necessary to return more information from GetCachedPlan() to
its users.  I find it hard to convincingly describe what the new
generic struct really is if we invent it *now*, when it's going to
carry a single list whose purpose is pretty narrow.

So, I've implemented this by making the callers of GetCachedPlan()
pass a list to add the PartitionPruneResults that may be produced.
Most callers can put that into the Portal for passing that to other
modules, so I have reinstated PortalData.part_prune_results.  As for
its memory management, the list and the PartitionPruneResults therein
will be allocated in a context that holds the Portal itself.

> Some lesser important stuff that I think could be done better.
>
> * Are you also able to put meaningful comments on the
> PartitionPruneResult struct in execnodes.h?
>
> * In create_append_plan() and create_merge_append_plan() you have the
> same code to set the part_prune_index. Why not just move all that code
> into make_partition_pruneinfo() and have make_partition_pruneinfo()
> return the index and append to the PlannerInfo.partPruneInfos List?

That sounds better, so done.

> * Why not forboth() here?
>
> i = 0;
> foreach(stmtlist_item, portal->stmts)
> {
> PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
> PartitionPruneResult *part_prune_result = part_prune_results ?
>   list_nth(part_prune_results, i) :
>   NULL;
>
> i++;

Because the PartitionPruneResult list may not always be available.  To
wit, it's only available when it is GetCachedPlan() that gave the
portal its plan.  I know this is a bit ugly, but it seems better than
fixing all users of Portal to build a dummy list, not that it is
totally avoidable even in the current implementation.

> * It would be good if ReleaseExecutorLocks() already knew the RTIs
> that were locked. Maybe the executor prep info struct I mentioned
> above could also store the RTIs that have been locked already and
> allow ReleaseExecutorLocks() to just iterate over those to release the
> locks.

Rewrote this such that ReleaseExecutorLocks() just receives a list of
per-PlannedStmt bitmapsets containing the RT indexes of only the
locked entries in that plan.

Attached updated patch with these changes.

--
Amit Langote
EDB: http://www.enterprisedb.com

v13-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-04-07 Thread David Rowley

On Thu, 7 Apr 2022 at 20:28, Amit Langote  wrote:
> Here's an updated version.  In Particular, I removed
> part_prune_results list from PortalData, in favor of anything that
> needs to look at the list can instead get it from the CachedPlan
> (PortalData.cplan).  This makes things better in 2 ways:

Thanks for making those changes.

I'm not overly familiar with the data structures we use for planning
around plans between the planner and executor, but storing the pruning
results in CachedPlan seems pretty bad. I see you've stashed it in
there and invented a new memory context to stop leaks into the cache
memory.

Since I'm not overly familiar with these structures, I'm trying to
imagine why you made that choice and the best I can come up with was
that it was the most convenient thing you had to hand inside
CheckCachedPlan().

I don't really have any great ideas right now on how to make this
better. I wonder if GetCachedPlan() should be changed to return some
struct that wraps up the CachedPlan with some sort of executor prep
info struct that we can stash the list of PartitionPruneResults in,
and perhaps something else one day.

Some lesser important stuff that I think could be done better.

* Are you also able to put meaningful comments on the
PartitionPruneResult struct in execnodes.h?

* In create_append_plan() and create_merge_append_plan() you have the
same code to set the part_prune_index. Why not just move all that code
into make_partition_pruneinfo() and have make_partition_pruneinfo()
return the index and append to the PlannerInfo.partPruneInfos List?

* Why not forboth() here?

i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
PartitionPruneResult *part_prune_result = part_prune_results ?
  list_nth(part_prune_results, i) :
  NULL;

i++;

* It would be good if ReleaseExecutorLocks() already knew the RTIs
that were locked. Maybe the executor prep info struct I mentioned
above could also store the RTIs that have been locked already and
allow ReleaseExecutorLocks() to just iterate over those to release the
locks.

David

Re: generic plans and "initial" pruning

2022-04-07 Thread Amit Langote

On Wed, Apr 6, 2022 at 4:20 PM Amit Langote  wrote:
> And here is a version like that that passes make check-world.  Maybe
> still a WIP as I think comments could use more editing.
>
> Here's how the new implementation works:
>
> AcquireExecutorLocks() calls ExecutorDoInitialPruning(), which in turn
> iterates over a list of PartitionPruneInfos in a given PlannedStmt
> coming from a CachedPlan.  For each PartitionPruneInfo,
> ExecPartitionDoInitialPruning() is called, which sets up
> PartitionPruneState and performs initial pruning steps present in the
> PartitionPruneInfo.  The resulting bitmapsets of valid subplans, one
> for each PartitionPruneInfo, are collected in a list and added to a
> result node called PartitionPruneResult.  It represents the result of
> performing initial pruning on all PartitionPruneInfos found in a plan.
> A list of PartitionPruneResults is passed along with the PlannedStmt
> to the executor, which is referenced when initializing
> Append/MergeAppend nodes.
>
> PlannedStmt.minLockRelids defined by the planner contains the RT
> indexes of all the entries in the range table minus those of the leaf
> partitions whose subplans are subject to removal due to initial
> pruning.  AcquireExecutoLocks() adds back the RT indexes of only those
> leaf partitions whose subplans survive ExecutorDoInitialPruning().  To
> get the leaf partition RT indexes from the PartitionPruneInfo, a new
> rti_map array is added to PartitionedRelPruneInfo.
>
> There's only one patch this time.  Patches that added partitioned_rels
> and plan_tree_walker() are no longer necessary.

Here's an updated version.  In Particular, I removed
part_prune_results list from PortalData, in favor of anything that
needs to look at the list can instead get it from the CachedPlan
(PortalData.cplan).  This makes things better in 2 ways:

* All the changes that were needed to produce the list to be pass to
PortalDefineQuery() are now unnecessary (especially ugly ones were
those made to pg_plan_queries()'s interface)

* The cases in which the PartitionPruneResult being added to a
QueryDesc can be assumed to be valid is more clearly define now; it's
the cases where the portal's CachedPlan is also valid, that is, if the
accompanying PlannedStmt is a cached one.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v12-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-04-06 Thread Amit Langote

On Fri, Apr 1, 2022 at 5:36 PM Amit Langote  wrote:
> On Fri, Apr 1, 2022 at 5:20 PM David Rowley  wrote:
> > On Fri, 1 Apr 2022 at 19:58, Amit Langote  wrote:
> > > Yes, the ExecLockRelsInfo node in the current patch, that first gets
> > > added to the QueryDesc and subsequently to the EState of the query,
> > > serves as that stashing place.  Not sure if you've looked at
> > > ExecLockRelInfo in detail in your review of the patch so far, but it
> > > carries the initial pruning result in what are called
> > > PlanInitPruningOutput nodes, which are stored in a list in
> > > ExecLockRelsInfo and their offsets in the list are in turn stored in
> > > an adjacent array that contains an element for every plan node in the
> > > tree.  If we go with a PlannedStmt.partpruneinfos list, then maybe we
> > > don't need to have that array, because the Append/MergeAppend nodes
> > > would be carrying those offsets by themselves.
> >
> > I saw it, just not in great detail. I saw that you had an array that
> > was indexed by the plan node's ID.  I thought that wouldn't be so good
> > with large complex plans that we often get with partitioning
> > workloads.  That's why I mentioned using another index that you store
> > in Append/MergeAppend that starts at 0 and increments by 1 for each
> > node that has a PartitionPruneInfo made for it during create_plan.
> >
> > > Maybe a different name for ExecLockRelsInfo would be better?
> > >
> > > Also, given Tom's apparent dislike for carrying that in PlannedStmt,
> > > maybe the way I have it now is fine?
> >
> > I think if you change how it's indexed and the other stuff then we can
> > have another look.  I think the patch will be much easier to review
> > once the ParitionPruneInfos are moved into PlannedStmt.
>
> Will do, thanks.

And here is a version like that that passes make check-world.  Maybe
still a WIP as I think comments could use more editing.

Here's how the new implementation works:

AcquireExecutorLocks() calls ExecutorDoInitialPruning(), which in turn
iterates over a list of PartitionPruneInfos in a given PlannedStmt
coming from a CachedPlan.  For each PartitionPruneInfo,
ExecPartitionDoInitialPruning() is called, which sets up
PartitionPruneState and performs initial pruning steps present in the
PartitionPruneInfo.  The resulting bitmapsets of valid subplans, one
for each PartitionPruneInfo, are collected in a list and added to a
result node called PartitionPruneResult.  It represents the result of
performing initial pruning on all PartitionPruneInfos found in a plan.
A list of PartitionPruneResults is passed along with the PlannedStmt
to the executor, which is referenced when initializing
Append/MergeAppend nodes.

PlannedStmt.minLockRelids defined by the planner contains the RT
indexes of all the entries in the range table minus those of the leaf
partitions whose subplans are subject to removal due to initial
pruning.  AcquireExecutoLocks() adds back the RT indexes of only those
leaf partitions whose subplans survive ExecutorDoInitialPruning().  To
get the leaf partition RT indexes from the PartitionPruneInfo, a new
rti_map array is added to PartitionedRelPruneInfo.

There's only one patch this time.  Patches that added partitioned_rels
and plan_tree_walker() are no longer necessary.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

v11-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-04-05 Thread Amit Langote

On Tue, Apr 5, 2022 at 7:00 PM Alvaro Herrera  wrote:
> On 2022-Apr-05, Amit Langote wrote:
> > While at it, maybe it's better to rename ExecInitPruningContext() to
> > InitPartitionPruneContext(), which I've done in the attached updated
> > patch.
>
> Good call.  I had changed that name too, but yours seems a better
> choice.
>
> I made a few other cosmetic changes and pushed.

Thanks!

>  I'm afraid this will
> cause a few conflicts with your 0004 -- hopefully these should mostly be
> minor.
>
> One change that's not completely cosmetic is a change in the test on
> whether to call PartitionPruneFixSubPlanMap or not.  Originally it was:
>
> if (partprune->do_exec_prune &&
> bms_num_members( ... ))
> do_stuff();
>
> which meant that bms_num_members() is only evaluated if do_exec_prune.
> However, the do_exec_prune bit is an optimization (we can skip doing
> that stuff if it's not going to be used), but the other test is more
> strict: the stuff is completely irrelevant if no plans have been
> removed, since the data structure does not need fixing.  So I changed it
> to be like this
>
> if (bms_num_members( .. ))
> {
> /* can skip if it's pointless */
> if (do_exec_prune)
> do_stuff();
> }
>
> I think that it is clearer to the human reader this way; and I think a
> smart compiler may realize that the test can be reversed and avoid
> counting bits when it's pointless.
>
> So your 0004 patch should add the new condition to the outer if(), since
> it's a critical consideration rather than an optimization:
> if (partprune && bms_num_members())
> {
> /* can skip if pointless */
> if (do_exec_prune)
> do_stuff()
> }
>
> Now, if we disagree and think that counting bits in the BMS when it's
> going to be discarded by do_exec_prune being false, then we can flip
> that back as originally and a more explicit comment.  With no evidence,
> I doubt it matters.

I agree that counting bits in the outer condition makes this easier to
read, so see no problem with keeping it that way.

Will post the rebased main patch soon, whose rewrite I'm close to
being done with.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-04-05 Thread Alvaro Herrera

On 2022-Apr-05, Amit Langote wrote:

> While at it, maybe it's better to rename ExecInitPruningContext() to
> InitPartitionPruneContext(), which I've done in the attached updated
> patch.

Good call.  I had changed that name too, but yours seems a better
choice.

I made a few other cosmetic changes and pushed.  I'm afraid this will
cause a few conflicts with your 0004 -- hopefully these should mostly be
minor.

One change that's not completely cosmetic is a change in the test on
whether to call PartitionPruneFixSubPlanMap or not.  Originally it was:

if (partprune->do_exec_prune &&
bms_num_members( ... ))
do_stuff();

which meant that bms_num_members() is only evaluated if do_exec_prune.
However, the do_exec_prune bit is an optimization (we can skip doing
that stuff if it's not going to be used), but the other test is more
strict: the stuff is completely irrelevant if no plans have been
removed, since the data structure does not need fixing.  So I changed it
to be like this

if (bms_num_members( .. ))
{
/* can skip if it's pointless */
if (do_exec_prune)
do_stuff();
}

I think that it is clearer to the human reader this way; and I think a
smart compiler may realize that the test can be reversed and avoid
counting bits when it's pointless.

So your 0004 patch should add the new condition to the outer if(), since
it's a critical consideration rather than an optimization:
if (partprune && bms_num_members())
{
/* can skip if pointless */
if (do_exec_prune)
do_stuff()
}

Now, if we disagree and think that counting bits in the BMS when it's
going to be discarded by do_exec_prune being false, then we can flip
that back as originally and a more explicit comment.  With no evidence,
I doubt it matters.

Thanks for the patch!  I think the new coding is indeed a bit easier to
follow.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
 really, I see PHP as like a strange amalgamation of C, Perl, Shell
 inflex: you know that "amalgam" means "mixture with mercury",
   more or less, right?
 i.e., "deadly poison"

Re: generic plans and "initial" pruning

2022-04-04 Thread Amit Langote

On Mon, Apr 4, 2022 at 9:55 PM Amit Langote  wrote:
> On Sun, Apr 3, 2022 at 8:33 PM Alvaro Herrera  wrote:
> > I think the names ExecCreatePartitionPruneState and
> > ExecInitPartitionPruning are too confusingly similar.  Maybe the former
> > should be renamed to somehow make it clear that it is a subroutine for
> > the former.
>
> Ah, yes.  I've taken out the "Exec" from the former.

While at it, maybe it's better to rename ExecInitPruningContext() to
InitPartitionPruneContext(), which I've done in the attached updated
patch.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v10-0001-Some-refactoring-of-runtime-pruning-code.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-04-04 Thread Amit Langote

Thanks for the review.

On Sun, Apr 3, 2022 at 8:33 PM Alvaro Herrera  wrote:
> I noticed a definitional problem in 0001 that's also a bug in some
> conditions -- namely that the bitmapset "validplans" is never explicitly
> initialized to NIL.  In the original coding, the BMS was always returned
> from somewhere; in the new code, it is passed from an uninitialized
> stack variable into the new ExecInitPartitionPruning function, which
> then proceeds to add new members to it without initializing it first.

Hmm, the following blocks in ExecInitPartitionPruning() define
*initially_valid_subplans:

/*
 * Perform an initial partition prune pass, if required.
 */
if (prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
*initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
}
else
{
/* We'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
  n_total_subplans - 1);
}

AFAICS, both assign *initially_valid_subplans a value whose
computation is not dependent on reading it first, so I don't see a
problem.

Am I missing something?

> Indeed that function's header comment explicitly indicates that it is
> not initialized:
>
> + * Initial pruning can be done immediately, so it is done here if needed and
> + * the set of surviving partition subplans' indexes are added to the output
> + * parameter *initially_valid_subplans.
>
> even though this is not fully correct, because when 
> prunestate->do_initial_prune
> is false, then the BMS *is* initialized.
>
> I have no opinion on where to initialize it, but it needs to be done
> somewhere and the comment needs to agree.

I can see that the comment is insufficient, so I've expanded it as follows:

- * Initial pruning can be done immediately, so it is done here if needed and
- * the set of surviving partition subplans' indexes are added to the output
- * parameter *initially_valid_subplans.
+ * On return, *initially_valid_subplans is assigned the set of indexes of
+ * child subplans that must be initialized along with the parent plan node.
+ * Initial pruning is performed here if needed and in that case only the
+ * surviving subplans' indexes are added.

> I think the names ExecCreatePartitionPruneState and
> ExecInitPartitionPruning are too confusingly similar.  Maybe the former
> should be renamed to somehow make it clear that it is a subroutine for
> the former.

Ah, yes.  I've taken out the "Exec" from the former.

> At the top of the file, there's a new comment that reads:
>
>   * ExecInitPartitionPruning:
>   * Creates the PartitionPruneState required by each of the two pruning
>   * functions.
>
> What are "the two pruning functions"?  I think here you mean "Append"
> and "MergeAppend".  Maybe spell that out explicitly.

Actually it meant: ExecFindInitiaMatchingSubPlans() and
ExecFindMatchingSubPlans().  They perform "initial" and "exec" set of
pruning steps, respectively.

I realized that both functions have identical bodies at this point,
except that they pass 'true' and 'false', respectively, for
initial_prune argument of the sub-routine
find_matching_subplans_recurse(), which is where the pruning using the
appropriate set of steps contained in PartitionPruneState
(initial_pruning_steps or exec_pruning_steps) actually occurs.  So,
I've updated the patch to just retain the latter, adding an
initial_prune parameter to it to pass to the aforementioned
find_matching_subplans_recurse().

I've also updated the run-time pruning module comment to describe this change:

  * ExecFindMatchingSubPlans:
- * Returns indexes of matching subplans after evaluating all available
- * expressions, that is, using execution pruning steps.  This function can
- * can only be called during execution and must be called again each time
- * the value of a Param listed in PartitionPruneState's 'execparamids'
- * changes.
+ * Returns indexes of matching subplans after evaluating the expressions
+ * that are safe to evaluate at a given point.  This function is first
+ * called during ExecInitPartitionPruning() to find the initially
+ * matching subplans based on performing the initial pruning steps and
+ * then must be called again each time the value of a Param listed in
+ * PartitionPruneState's 'execparamids' changes.

> I think this comment needs to be reworded:
>
> + * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
> + * changed to index range 0..num(initially_valid_subplans).

Assuming you meant to ask to write this without the odd notation, I've
expanded the comment as follows:

- * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
- * changed to index range 0..num(initially_valid_subplans).
+ * Current values of the indexes present in

Re: generic plans and "initial" pruning

2022-04-03 Thread Alvaro Herrera

I noticed a definitional problem in 0001 that's also a bug in some
conditions -- namely that the bitmapset "validplans" is never explicitly
initialized to NIL.  In the original coding, the BMS was always returned
from somewhere; in the new code, it is passed from an uninitialized
stack variable into the new ExecInitPartitionPruning function, which
then proceeds to add new members to it without initializing it first.
Indeed that function's header comment explicitly indicates that it is
not initialized:

+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans.

even though this is not fully correct, because when prunestate->do_initial_prune
is false, then the BMS *is* initialized.

I have no opinion on where to initialize it, but it needs to be done
somewhere and the comment needs to agree.


I think the names ExecCreatePartitionPruneState and
ExecInitPartitionPruning are too confusingly similar.  Maybe the former
should be renamed to somehow make it clear that it is a subroutine for
the former.


At the top of the file, there's a new comment that reads:

  * ExecInitPartitionPruning:
  * Creates the PartitionPruneState required by each of the two pruning
  * functions.

What are "the two pruning functions"?  I think here you mean "Append"
and "MergeAppend".  Maybe spell that out explicitly.


I think this comment needs to be reworded:

+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
+ * changed to index range 0..num(initially_valid_subplans).

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: generic plans and "initial" pruning

2022-04-01 Thread Amit Langote

On Fri, Apr 1, 2022 at 5:20 PM David Rowley  wrote:
> On Fri, 1 Apr 2022 at 19:58, Amit Langote  wrote:
> > Yes, the ExecLockRelsInfo node in the current patch, that first gets
> > added to the QueryDesc and subsequently to the EState of the query,
> > serves as that stashing place.  Not sure if you've looked at
> > ExecLockRelInfo in detail in your review of the patch so far, but it
> > carries the initial pruning result in what are called
> > PlanInitPruningOutput nodes, which are stored in a list in
> > ExecLockRelsInfo and their offsets in the list are in turn stored in
> > an adjacent array that contains an element for every plan node in the
> > tree.  If we go with a PlannedStmt.partpruneinfos list, then maybe we
> > don't need to have that array, because the Append/MergeAppend nodes
> > would be carrying those offsets by themselves.
>
> I saw it, just not in great detail. I saw that you had an array that
> was indexed by the plan node's ID.  I thought that wouldn't be so good
> with large complex plans that we often get with partitioning
> workloads.  That's why I mentioned using another index that you store
> in Append/MergeAppend that starts at 0 and increments by 1 for each
> node that has a PartitionPruneInfo made for it during create_plan.
>
> > Maybe a different name for ExecLockRelsInfo would be better?
> >
> > Also, given Tom's apparent dislike for carrying that in PlannedStmt,
> > maybe the way I have it now is fine?
>
> I think if you change how it's indexed and the other stuff then we can
> have another look.  I think the patch will be much easier to review
> once the ParitionPruneInfos are moved into PlannedStmt.

Will do, thanks.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-04-01 Thread David Rowley

On Fri, 1 Apr 2022 at 19:58, Amit Langote  wrote:
> Yes, the ExecLockRelsInfo node in the current patch, that first gets
> added to the QueryDesc and subsequently to the EState of the query,
> serves as that stashing place.  Not sure if you've looked at
> ExecLockRelInfo in detail in your review of the patch so far, but it
> carries the initial pruning result in what are called
> PlanInitPruningOutput nodes, which are stored in a list in
> ExecLockRelsInfo and their offsets in the list are in turn stored in
> an adjacent array that contains an element for every plan node in the
> tree.  If we go with a PlannedStmt.partpruneinfos list, then maybe we
> don't need to have that array, because the Append/MergeAppend nodes
> would be carrying those offsets by themselves.

I saw it, just not in great detail. I saw that you had an array that
was indexed by the plan node's ID.  I thought that wouldn't be so good
with large complex plans that we often get with partitioning
workloads.  That's why I mentioned using another index that you store
in Append/MergeAppend that starts at 0 and increments by 1 for each
node that has a PartitionPruneInfo made for it during create_plan.

> Maybe a different name for ExecLockRelsInfo would be better?
>
> Also, given Tom's apparent dislike for carrying that in PlannedStmt,
> maybe the way I have it now is fine?

I think if you change how it's indexed and the other stuff then we can
have another look.  I think the patch will be much easier to review
once the ParitionPruneInfos are moved into PlannedStmt.

David

Re: generic plans and "initial" pruning

2022-04-01 Thread Amit Langote

On Fri, Apr 1, 2022 at 12:45 PM Tom Lane  wrote:
> Amit Langote  writes:
> > On Fri, Apr 1, 2022 at 10:32 AM David Rowley  wrote:
> >> 1. You've changed the signature of various functions by adding
> >> ExecLockRelsInfo *execlockrelsinfo.  I'm wondering why you didn't just
> >> put the ExecLockRelsInfo as a new field in PlannedStmt?
>
> > I'm worried about that churn myself and did consider this idea, though
> > I couldn't shake the feeling that it's maybe wrong to put something in
> > PlannedStmt that the planner itself doesn't produce.
>
> PlannedStmt is part of the plan tree, which MUST be read-only to
> the executor.  This is not negotiable.  However, there's other
> places that this data could be put, such as QueryDesc.
> Or for that matter, couldn't the data structure be created by
> the planner?  (It looks like David is proposing exactly that
> further down.)

The data structure in question is for storing the results of
performing initial partition pruning on a generic plan, which the
proposes to do in plancache.c -- inside the body of
AcquireExecutorLocks()'s loop over PlannedStmts -- so, it's hard to
see it as a product of the planner. :-(

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-04-01 Thread Amit Langote

On Fri, Apr 1, 2022 at 1:08 PM David Rowley  wrote:
> On Fri, 1 Apr 2022 at 16:09, Amit Langote  wrote:
> > definition of PlannedStmt says this:
> >
> > /* 
> >  *  PlannedStmt node
> >  *
> >  * The output of the planner
> >
> > With the ideas that you've outlined below, perhaps we can frame most
> > of the things that the patch wants to do as the planner and the
> > plancache changes.  If we twist the above definition a bit to say what
> > the plancache does in this regard is part of planning, maybe it makes
> > sense to add the initial pruning related fields (nodes, outputs) into
> > PlannedStmt.
>
> How about the PartitionPruneInfos go into PlannedStmt as a List
> indexed in the way I mentioned and the cache of the results of pruning
> in EState?
>
> I think that leaves you adding  List *partpruneinfos,  Bitmapset
> *minimumlockrtis to PlannedStmt and the thing you have to cache the
> pruning results into EState.   I'm not very clear on where you should
> stash the results of run-time pruning in the meantime before you can
> put them in EState.  You might need to invent some intermediate struct
> that gets passed around that you can scribble down some details you're
> going to need during execution.

Yes, the ExecLockRelsInfo node in the current patch, that first gets
added to the QueryDesc and subsequently to the EState of the query,
serves as that stashing place.  Not sure if you've looked at
ExecLockRelInfo in detail in your review of the patch so far, but it
carries the initial pruning result in what are called
PlanInitPruningOutput nodes, which are stored in a list in
ExecLockRelsInfo and their offsets in the list are in turn stored in
an adjacent array that contains an element for every plan node in the
tree.  If we go with a PlannedStmt.partpruneinfos list, then maybe we
don't need to have that array, because the Append/MergeAppend nodes
would be carrying those offsets by themselves.

Maybe a different name for ExecLockRelsInfo would be better?

Also, given Tom's apparent dislike for carrying that in PlannedStmt,
maybe the way I have it now is fine?

> > One question is whether the planner should always pay the overhead of
> > initializing this bitmapset?  I mean it's only worthwhile if
> > AcquireExecutorLocks() is going to be involved, that is, the plan will
> > be cached and reused.
>
> Maybe the Bitmapset for the minimal locks needs to be built with
> bms_add_range(NULL, 0, list_length(rtable));  then do
> bms_del_members() on the relevant RTIs you find in the listed
> PartitionPruneInfos.  That way it's very simple and cheap to do when
> there are no PartitionPruneInfos.

Ah, okay.  Looking at make_partition_pruneinfo(), I think I see a way
to delete the RTIs of prunable relations -- construct a
all_matched_leaf_part_relids in parallel to allmatchedsubplans and
delete those from the initial set.

> > > 4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
> > > revived here.  Why don't you just add a partitioned_relids to
> > > PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
> > > you a Relids of them. PartitionedRelPruneInfo already has an rtindex
> > > field, so you just need to bms_add_member whatever that rtindex is.
> >
> > Hmm, not all Append/MergeAppend nodes in the plan tree may have
> > make_partition_pruneinfo() called on them though.
>
> For Append/MergeAppends without run-time pruning you'll want to add
> the RTIs to the minimal locking set of RTIs to go into PlannedStmt.
> The only things you want to leave out of that are RTIs for the RTEs
> that you might run-time prune away during AcquireExecutorLocks().

Yeah, I see it now.

Thanks.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-03-31 Thread David Rowley

On Fri, 1 Apr 2022 at 16:09, Amit Langote  wrote:
> definition of PlannedStmt says this:
>
> /* 
>  *  PlannedStmt node
>  *
>  * The output of the planner
>
> With the ideas that you've outlined below, perhaps we can frame most
> of the things that the patch wants to do as the planner and the
> plancache changes.  If we twist the above definition a bit to say what
> the plancache does in this regard is part of planning, maybe it makes
> sense to add the initial pruning related fields (nodes, outputs) into
> PlannedStmt.

How about the PartitionPruneInfos go into PlannedStmt as a List
indexed in the way I mentioned and the cache of the results of pruning
in EState?

I think that leaves you adding  List *partpruneinfos,  Bitmapset
*minimumlockrtis to PlannedStmt and the thing you have to cache the
pruning results into EState.   I'm not very clear on where you should
stash the results of run-time pruning in the meantime before you can
put them in EState.  You might need to invent some intermediate struct
that gets passed around that you can scribble down some details you're
going to need during execution.

> One question is whether the planner should always pay the overhead of
> initializing this bitmapset?  I mean it's only worthwhile if
> AcquireExecutorLocks() is going to be involved, that is, the plan will
> be cached and reused.

Maybe the Bitmapset for the minimal locks needs to be built with
bms_add_range(NULL, 0, list_length(rtable));  then do
bms_del_members() on the relevant RTIs you find in the listed
PartitionPruneInfos.  That way it's very simple and cheap to do when
there are no PartitionPruneInfos.

> > 4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
> > revived here.  Why don't you just add a partitioned_relids to
> > PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
> > you a Relids of them. PartitionedRelPruneInfo already has an rtindex
> > field, so you just need to bms_add_member whatever that rtindex is.
>
> Hmm, not all Append/MergeAppend nodes in the plan tree may have
> make_partition_pruneinfo() called on them though.

For Append/MergeAppends without run-time pruning you'll want to add
the RTIs to the minimal locking set of RTIs to go into PlannedStmt.
The only things you want to leave out of that are RTIs for the RTEs
that you might run-time prune away during AcquireExecutorLocks().

David

Re: generic plans and "initial" pruning

2022-03-31 Thread Tom Lane

Amit Langote  writes:
> On Fri, Apr 1, 2022 at 10:32 AM David Rowley  wrote:
>> 1. You've changed the signature of various functions by adding
>> ExecLockRelsInfo *execlockrelsinfo.  I'm wondering why you didn't just
>> put the ExecLockRelsInfo as a new field in PlannedStmt?

> I'm worried about that churn myself and did consider this idea, though
> I couldn't shake the feeling that it's maybe wrong to put something in
> PlannedStmt that the planner itself doesn't produce.

PlannedStmt is part of the plan tree, which MUST be read-only to
the executor.  This is not negotiable.  However, there's other
places that this data could be put, such as QueryDesc.
Or for that matter, couldn't the data structure be created by
the planner?  (It looks like David is proposing exactly that
further down.)

regards, tom lane

Re: generic plans and "initial" pruning

2022-03-31 Thread Amit Langote

Thanks a lot for looking into this.

On Fri, Apr 1, 2022 at 10:32 AM David Rowley  wrote:
> I've been looking over the v8 patch and I'd like to propose semi-baked
> ideas to improve things.  I'd need to go and write them myself to
> fully know if they'd actually work ok.
>
> 1. You've changed the signature of various functions by adding
> ExecLockRelsInfo *execlockrelsinfo.  I'm wondering why you didn't just
> put the ExecLockRelsInfo as a new field in PlannedStmt?
>
> I think the above gets around messing the signatures of
> CreateQueryDesc(), ExplainOnePlan(), pg_plan_queries(),
> PortalDefineQuery(), ProcessQuery() It would get rid of your change of
> foreach to forboth in execute_sql_string() / PortalRunMulti() and gets
> rid of a number of places where your carrying around a variable named
> execlockrelsinfo_list. It would also make the patch significantly
> easier to review as you'd be touching far fewer files.

I'm worried about that churn myself and did consider this idea, though
I couldn't shake the feeling that it's maybe wrong to put something in
PlannedStmt that the planner itself doesn't produce.  I mean the
definition of PlannedStmt says this:

/* 
 *  PlannedStmt node
 *
 * The output of the planner

With the ideas that you've outlined below, perhaps we can frame most
of the things that the patch wants to do as the planner and the
plancache changes.  If we twist the above definition a bit to say what
the plancache does in this regard is part of planning, maybe it makes
sense to add the initial pruning related fields (nodes, outputs) into
PlannedStmt.

> 2. I don't really like the way you've gone about most of the patch...
>
> The way I imagine this working is that during create_plan() we visit
> all nodes that have run-time pruning then inside create_append_plan()
> and create_merge_append_plan() we'd tag those onto a new field in
> PlannerGlobal  That way you can store the PartitionPruneInfos in the
> new PlannedStmt field in standard_planner() after the
> makeNode(PlannedStmt).
>
> Instead of storing the PartitionPruneInfo in the Append / MergeAppend
> struct, you'd just add a new index field to those structs. The index
> would start with 0 for the 0th PartitionPruneInfo. You'd basically
> just know the index by assigning
> list_length(root->glob->partitionpruneinfos).
>
> You'd then assign the root->glob->partitionpruneinfos to
> PlannedStmt.partitionpruneinfos and anytime you needed to do run-time
> pruning during execution, you'd need to use the Append / MergeAppend's
> partition_prune_info_idx to lookup the PartitionPruneInfo in some new
> field you add to EState to store those.  You'd leave that index as -1
> if there's no PartitionPruneInfo for the Append / MergeAppend node.
>
> When you do AcquireExecutorLocks(), you'd iterate over the
> PlannedStmt's PartitionPruneInfo to figure out which subplans to
> prune. You'd then have an array sized
> list_length(plannedstmt->runtimepruneinfos) where you'd store the
> result.  When the Append/MergeAppend node starts up you just check if
> the part_prune_info_idx >= 0 and if there's a non-NULL result stored
> then use that result.  That's how you'd ensure you always got the same
> run-time prune result between locking and plan startup.

Actually, Robert too suggested such an idea to me off-list and I think
it's worth trying.  I was not sure about the implementation, because
then we'd be passing around lists of initial pruning nodes/results
across many function/module boundaries that you mentioned in your
comment 1, but if we agree that PlannedStmt is an acceptable place for
those things to be stored, then I agree it's an attractive idea.

> 3. Also, looking at ExecGetLockRels(), shouldn't it be the planner's
> job to determine the minimum set of relations which must be locked?  I
> think the plan tree traversal during execution not great.  Seems the
> whole point of this patch is to reduce overhead during execution. A
> full additional plan traversal aside from the 3 that we already do for
> start/run/end of execution seems not great.
>
> I think this means that during AcquireExecutorLocks() you'd start with
> the minimum set or RTEs that need to be locked as determined during
> create_plan() and stored in some Bitmapset field in PlannedStmt.

The patch did have a PlannedStmt.lockrels till v6.  Though, it wasn't
the same thing as you are describing it...

> This
> minimal set would also only exclude RTIs that would only possibly be
> used due to a PartitionPruneInfo with initial pruning steps, i.e.
> include RTIs from PartitionPruneInfo with no init pruining steps (you
> can't skip any locks for those).  All you need to do to determine the
> RTEs to lock are to take the minimal set and execute each
> PartitionPruneInfo in the PlannedStmt that has init steps

So just thinking about an Append/MergeAppend, the minimum set must
include the RT indexes of all the partitioned tables whose direct and
indirect children's plans

Re: generic plans and "initial" pruning

2022-03-31 Thread David Rowley

On Thu, 31 Mar 2022 at 16:25, Amit Langote  wrote:
> Rebased.

I've been looking over the v8 patch and I'd like to propose semi-baked
ideas to improve things.  I'd need to go and write them myself to
fully know if they'd actually work ok.

1. You've changed the signature of various functions by adding
ExecLockRelsInfo *execlockrelsinfo.  I'm wondering why you didn't just
put the ExecLockRelsInfo as a new field in PlannedStmt?

I think the above gets around messing the signatures of
CreateQueryDesc(), ExplainOnePlan(), pg_plan_queries(),
PortalDefineQuery(), ProcessQuery() It would get rid of your change of
foreach to forboth in execute_sql_string() / PortalRunMulti() and gets
rid of a number of places where your carrying around a variable named
execlockrelsinfo_list. It would also make the patch significantly
easier to review as you'd be touching far fewer files.

2. I don't really like the way you've gone about most of the patch...

The way I imagine this working is that during create_plan() we visit
all nodes that have run-time pruning then inside create_append_plan()
and create_merge_append_plan() we'd tag those onto a new field in
PlannerGlobal  That way you can store the PartitionPruneInfos in the
new PlannedStmt field in standard_planner() after the
makeNode(PlannedStmt).

Instead of storing the PartitionPruneInfo in the Append / MergeAppend
struct, you'd just add a new index field to those structs. The index
would start with 0 for the 0th PartitionPruneInfo. You'd basically
just know the index by assigning
list_length(root->glob->partitionpruneinfos).

You'd then assign the root->glob->partitionpruneinfos to
PlannedStmt.partitionpruneinfos and anytime you needed to do run-time
pruning during execution, you'd need to use the Append / MergeAppend's
partition_prune_info_idx to lookup the PartitionPruneInfo in some new
field you add to EState to store those.  You'd leave that index as -1
if there's no PartitionPruneInfo for the Append / MergeAppend node.

When you do AcquireExecutorLocks(), you'd iterate over the
PlannedStmt's PartitionPruneInfo to figure out which subplans to
prune. You'd then have an array sized
list_length(plannedstmt->runtimepruneinfos) where you'd store the
result.  When the Append/MergeAppend node starts up you just check if
the part_prune_info_idx >= 0 and if there's a non-NULL result stored
then use that result.  That's how you'd ensure you always got the same
run-time prune result between locking and plan startup.

3. Also, looking at ExecGetLockRels(), shouldn't it be the planner's
job to determine the minimum set of relations which must be locked?  I
think the plan tree traversal during execution not great.  Seems the
whole point of this patch is to reduce overhead during execution. A
full additional plan traversal aside from the 3 that we already do for
start/run/end of execution seems not great.

I think this means that during AcquireExecutorLocks() you'd start with
the minimum set or RTEs that need to be locked as determined during
create_plan() and stored in some Bitmapset field in PlannedStmt. This
minimal set would also only exclude RTIs that would only possibly be
used due to a PartitionPruneInfo with initial pruning steps, i.e.
include RTIs from PartitionPruneInfo with no init pruining steps (you
can't skip any locks for those).  All you need to do to determine the
RTEs to lock are to take the minimal set and execute each
PartitionPruneInfo in the PlannedStmt that has init steps

4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
revived here.  Why don't you just add a partitioned_relids to
PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
you a Relids of them. PartitionedRelPruneInfo already has an rtindex
field, so you just need to bms_add_member whatever that rtindex is.

It's a fairly high-level review at this stage. I can look in more
detail if the above points get looked at.  You may find or know of
some reason why it can't be done like I mention above.

David

Re: generic plans and "initial" pruning

2022-03-31 Thread Amit Langote

On Thu, Mar 31, 2022 at 6:55 PM Alvaro Herrera  wrote:
> I'm looking at 0001 here with intention to commit later.  I see that
> there is some resistance to 0004, but I think a final verdict on that
> one doesn't materially affect 0001.

Thanks.

While the main goal of the refactoring patch is to make it easier to
review the more complex changes that 0004 makes to execPartition.c, I
agree it has merit on its own.  Although, one may say that the bit
about providing a PlanState-independent ExprContext is more closely
tied with 0004's requirements...

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-03-31 Thread Alvaro Herrera

I'm looking at 0001 here with intention to commit later.  I see that
there is some resistance to 0004, but I think a final verdict on that
one doesn't materially affect 0001.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El destino baraja y nosotros jugamos" (A. Schopenhauer)

Re: generic plans and "initial" pruning

2022-03-22 Thread Amit Langote

On Tue, Mar 15, 2022 at 3:19 PM Amit Langote  wrote:
> On Tue, Mar 15, 2022 at 5:06 AM Robert Haas  wrote:
> > On Mon, Mar 14, 2022 at 3:38 PM Tom Lane  wrote:
> > > Also, while I've not spent much time at all reading this patch,
> > > it seems rather desperately undercommented, and a lot of the
> > > new names are unintelligible.  In particular, I suspect that the
> > > patch is significantly redesigning when/where run-time pruning
> > > happens (unless it's just letting that be run twice); but I don't
> > > see any documentation or name changes suggesting where that
> > > responsibility is now.
> >
> > I am sympathetic to that concern. I spent a while staring at a
> > baffling comment in 0001 only to discover it had just been moved from
> > elsewhere. I really don't feel that things in this are as clear as
> > they could be -- although I hasten to add that I respect the people
> > who have done work in this area previously and am grateful for what
> > they did. It's been a huge benefit to the project in spite of the
> > bumps in the road. Moreover, this isn't the only code in PostgreSQL
> > that needs improvement, or the worst. That said, I do think there are
> > problems. I don't yet have a position on whether this patch is making
> > that better or worse.
>
> Okay, I'd like to post a new version with the comments edited to make
> them a bit more intelligible.  I understand that the comments around
> the new invocation mode(s) of runtime pruning are not as clear as they
> should be, especially as the changes that this patch wants to make to
> how things work are not very localized.

Actually, another area where the comments may not be as clear as they
should have been is the changes that the patch makes to the
AcquireExecutorLocks() logic that decides which relations are locked
to safeguard the plan tree for execution, which are those given by
RTE_RELATION entries in the range table.

Without the patch, they are found by actually scanning the range table.

With the patch, it's the same set of RTEs if the plan doesn't contain
any pruning nodes, though instead of the range table, what is scanned
is a bitmapset of their RT indexes that is made available by the
planner in the form of PlannedStmt.lockrels.  When the plan does
contain a pruning node (PlannedStmt.containsInitialPruning), the
bitmapset is constructed by calling ExecutorGetLockRels() on the plan
tree, which walks it to add RT indexes of relations mentioned in the
Scan nodes, while skipping any nodes that are pruned after performing
initial pruning steps that may be present in their containing parent
node's PartitionPruneInfo.  Also, the RT indexes of partitioned tables
that are present in the PartitionPruneInfo itself are also added to
the set.

While expanding comments added by the patch to make this clear, I
realized that there are two problems, one of them quite glaring:

* Planner's constructing this bitmapset and its copying along with the
PlannedStmt is pure overhead in the cases that this patch has nothing
to do with, which is the kind of thing that Andres cautioned against
upthread.

* Not all partitioned tables that would have been locked without the
patch to come up with a Append/MergeAppend plan may be returned by
ExecutorGetLockRels().  For example, if none of the query's
runtime-prunable quals were found to match the partition key of an
intermediate partitioned table and thus that partitioned table not
included in the PartitionPruneInfo.  Or if an Append/MergeAppend
covering a partition tree doesn't contain any PartitionPruneInfo to
begin with, in which case, only the leaf partitions and none of
partitioned parents would be accounted for by the
ExecutorGetLockRels() logic.

The 1st one seems easy to fix by not inventing PlannedStmt.lockrels
and just doing what's being done now: scan the range table if
(!PlannedStmt.containsInitialPruning).

The only way perhaps to fix the second one is to reconsider the
decision we made in the following commit:

commit 52ed730d511b7b1147f2851a7295ef1fb5273776
Author: Tom Lane 
Date:   Sun Oct 7 14:33:17 2018 -0400

Remove some unnecessary fields from Plan trees.

In the wake of commit f2343653f, we no longer need some fields that
were used before to control executor lock acquisitions:

* PlannedStmt.nonleafResultRelations can go away entirely.

* partitioned_rels can go away from Append, MergeAppend, and ModifyTable.
However, ModifyTable still needs to know the RT index of the partition
root table if any, which was formerly kept in the first entry of that
list.  Add a new field "rootRelation" to remember that.  rootRelation is
partly redundant with nominalRelation, in that if it's set it will have
the same value as nominalRelation.  However, the latter field has a
different purpose so it seems best to keep them distinct.

That is, add back the partitioned_rels field, at least to Append and
MergeAppend, to store the RT indexes of partitioned

Re: generic plans and "initial" pruning

2022-03-15 Thread Amit Langote

On Tue, Mar 15, 2022 at 5:06 AM Robert Haas  wrote:
> On Mon, Mar 14, 2022 at 3:38 PM Tom Lane  wrote:
> > What I am skeptical about is that this work actually accomplishes
> > anything under real-world conditions.  That's because if pruning would
> > save enough to make skipping the lock-acquisition phase worth the
> > trouble, the plan cache is almost certainly going to decide it should
> > be using a custom plan not a generic plan.  Now if we had a better
> > cost model (or, indeed, any model at all) for run-time pruning effects
> > then maybe that situation could be improved.  I think we'd be better
> > served to worry about that end of it before we spend more time making
> > the executor even less predictable.
>
> I don't agree with that analysis, because setting plan_cache_mode is
> not uncommon. Even if that GUC didn't exist, I'm pretty sure there are
> cases where the planner naturally falls into a generic plan anyway,
> even though pruning is happening. But as it is, the GUC does exist,
> and people use it. Consequently, while I'd love to see something done
> about the costing side of things, I do not accept that all other
> improvements should wait for that to happen.

I agree that making generic plans execute faster has merit even before
we make the costing changes to allow plancache.c prefer generic plans
over custom ones in these cases.  As the numbers in my previous email
show, simply executing a generic plan with the proposed improvements
applied is significantly cheaper than having the planner do the
pruning on every execution:

nparts  auto/custom generic
==  ==  ==
32  13359   28204
64  15760   26795
128 15825   26387
256 15017   25601
512 13479   19911
102413200   20158
204812884   16180

> > Also, while I've not spent much time at all reading this patch,
> > it seems rather desperately undercommented, and a lot of the
> > new names are unintelligible.  In particular, I suspect that the
> > patch is significantly redesigning when/where run-time pruning
> > happens (unless it's just letting that be run twice); but I don't
> > see any documentation or name changes suggesting where that
> > responsibility is now.
>
> I am sympathetic to that concern. I spent a while staring at a
> baffling comment in 0001 only to discover it had just been moved from
> elsewhere. I really don't feel that things in this are as clear as
> they could be -- although I hasten to add that I respect the people
> who have done work in this area previously and am grateful for what
> they did. It's been a huge benefit to the project in spite of the
> bumps in the road. Moreover, this isn't the only code in PostgreSQL
> that needs improvement, or the worst. That said, I do think there are
> problems. I don't yet have a position on whether this patch is making
> that better or worse.

Okay, I'd like to post a new version with the comments edited to make
them a bit more intelligible.  I understand that the comments around
the new invocation mode(s) of runtime pruning are not as clear as they
should be, especially as the changes that this patch wants to make to
how things work are not very localized.

> That said, I believe that the core idea of the patch is to optionally
> perform pruning before we acquire locks or spin up the main executor
> and then remember the decisions we made. If once the main executor is
> spun up we already made those decisions, then we must stick with what
> we decided. If not, we make those pruning decisions at the same point
> we do currently

Right.  The "initial" pruning, that this patch wants to make occur at
an earlier point (plancache.c), is currently performed in
ExecInit[Merge]Append().

If it does occur early due to the plan being a cached one,
ExecInit[Merge]Append() simply refers to its result that would be made
available via a new data structure that plancache.c has been made to
pass down to the executor alongside the plan tree.

If it does not, ExecInit[Merge]Append() does the pruning in the same
way it does now.  Such cases include initial pruning using only STABLE
expressions that the planner doesn't bother to compute by itself lest
the resulting plan may be cached, but no EXTERN parameters.

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-03-14 Thread Robert Haas

On Mon, Mar 14, 2022 at 3:38 PM Tom Lane  wrote:
> ... like EXPLAIN, for example?

Exactly! I think that's the foremost example, but extension modules
like auto_explain or even third-party extensions are also a risk. I
think there was some discussion of this previously.

> If "pruning" means physical removal from the plan tree, then it's
> probably all right.  However, it looks to me like that doesn't
> actually happen, or at least doesn't happen till much later, so
> there's room for worry about a disconnect between what plancache.c
> has verified and what executor startup will try to touch.  As you
> say, in the absence of any bugs, that's not a problem ... but if
> there are such bugs, tracking them down would be really hard.

Surgery on the plan would violate the general principle that plans are
read only once constructed. I think the idea ought to be to pass a
secondary data structure around with the plan that defines which parts
you must ignore. Any code that fails to use that other data structure
in the appropriate manner gets defined to be buggy and has to be fixed
by making it follow the new rules.

> What I am skeptical about is that this work actually accomplishes
> anything under real-world conditions.  That's because if pruning would
> save enough to make skipping the lock-acquisition phase worth the
> trouble, the plan cache is almost certainly going to decide it should
> be using a custom plan not a generic plan.  Now if we had a better
> cost model (or, indeed, any model at all) for run-time pruning effects
> then maybe that situation could be improved.  I think we'd be better
> served to worry about that end of it before we spend more time making
> the executor even less predictable.

I don't agree with that analysis, because setting plan_cache_mode is
not uncommon. Even if that GUC didn't exist, I'm pretty sure there are
cases where the planner naturally falls into a generic plan anyway,
even though pruning is happening. But as it is, the GUC does exist,
and people use it. Consequently, while I'd love to see something done
about the costing side of things, I do not accept that all other
improvements should wait for that to happen.

> Also, while I've not spent much time at all reading this patch,
> it seems rather desperately undercommented, and a lot of the
> new names are unintelligible.  In particular, I suspect that the
> patch is significantly redesigning when/where run-time pruning
> happens (unless it's just letting that be run twice); but I don't
> see any documentation or name changes suggesting where that
> responsibility is now.

I am sympathetic to that concern. I spent a while staring at a
baffling comment in 0001 only to discover it had just been moved from
elsewhere. I really don't feel that things in this are as clear as
they could be -- although I hasten to add that I respect the people
who have done work in this area previously and am grateful for what
they did. It's been a huge benefit to the project in spite of the
bumps in the road. Moreover, this isn't the only code in PostgreSQL
that needs improvement, or the worst. That said, I do think there are
problems. I don't yet have a position on whether this patch is making
that better or worse.

That said, I believe that the core idea of the patch is to optionally
perform pruning before we acquire locks or spin up the main executor
and then remember the decisions we made. If once the main executor is
spun up we already made those decisions, then we must stick with what
we decided. If not, we make those pruning decisions at the same point
we do currently - more or less on demand, at the point when we'd need
to know whether to descend that branch of the plan tree or not. I
think this scheme comes about because there are a couple of different
interfaces to the parameterized query stuff, and in some code paths we
have the values early enough to use them for pre-pruning, and in
others we don't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: generic plans and "initial" pruning

2022-03-14 Thread Tom Lane

Robert Haas  writes:
> In my opinion, the most important theoretical issue here is around
> reuse of plans that are no longer entirely valid, but the parts that
> are no longer valid are certain to be pruned. If, because we know that
> some parameter has some particular value, we skip locking a bunch of
> partitions, then when we're executing the plan, those partitions need
> not exist any more -- or they could have different indexes, be
> detached from the partitioning hierarchy and subsequently altered,
> whatever.

Check.

> That seems fine to me provided that all of our code (and any
> third-party code) is careful not to rely on the portion of the plan
> that we've pruned away, and doesn't assume that (for example) we can
> still fetch the name of an index whose OID appears in there someplace.

... like EXPLAIN, for example?

If "pruning" means physical removal from the plan tree, then it's
probably all right.  However, it looks to me like that doesn't
actually happen, or at least doesn't happen till much later, so
there's room for worry about a disconnect between what plancache.c
has verified and what executor startup will try to touch.  As you
say, in the absence of any bugs, that's not a problem ... but if
there are such bugs, tracking them down would be really hard.

What I am skeptical about is that this work actually accomplishes
anything under real-world conditions.  That's because if pruning would
save enough to make skipping the lock-acquisition phase worth the
trouble, the plan cache is almost certainly going to decide it should
be using a custom plan not a generic plan.  Now if we had a better
cost model (or, indeed, any model at all) for run-time pruning effects
then maybe that situation could be improved.  I think we'd be better
served to worry about that end of it before we spend more time making
the executor even less predictable.

Also, while I've not spent much time at all reading this patch,
it seems rather desperately undercommented, and a lot of the
new names are unintelligible.  In particular, I suspect that the
patch is significantly redesigning when/where run-time pruning
happens (unless it's just letting that be run twice); but I don't
see any documentation or name changes suggesting where that
responsibility is now.

regards, tom lane

1 2 >

1 - 100 of 125 matches

Mail list logo