Re: automating RangeTblEntry node support

2024-03-21 Thread Peter Eisentraut

On 21.03.24 10:51, Andrew Dunstan wrote:

At this point, I'm not too stressed about pressing forward with these
last two patches.  We can look at them again perhaps if we make
progress
on a more compact node output format.  When I started this thread, I
had
a lot of questions about various details about the RangeTblEntry
struct,
and we have achieved many answers during the discussions, so I'm happy
with the progress.  So for PG17, I'd like to just do patches 0001..0005.

Patches 1 thru 5 look good to me


Thanks for checking.  I have committed these (1 through 5) and will 
close the commit fest entry.





Re: automating RangeTblEntry node support

2024-03-21 Thread Andrew Dunstan
On Mon, Mar 11, 2024 at 5:29 AM Peter Eisentraut 
wrote:

> On 20.02.24 08:57, Peter Eisentraut wrote:
> > On 18.02.24 00:06, Matthias van de Meent wrote:
> >> I'm not sure that the cleanup which is done when changing a RTE's
> >> rtekind is also complete enough for this purpose.
> >> Things like inline_cte_walker change the node->rtekind, which could
> >> leave residual junk data in fields that are currently dropped during
> >> serialization (as the rtekind specifically ignores those fields), but
> >> which would add overhead when the default omission is expected to
> >> handle these fields; as they could then contain junk. It looks like
> >> there is some care about zeroing now unused fields, but I haven't
> >> checked that it covers all cases and fields to the extent required so
> >> that removing this specialized serializer would have zero impact on
> >> size once the default omission patch is committed.
> >>
> >> An additional patch with a single function that for this purpose
> >> clears junk fields from RTEs that changed kind would be appreciated:
> >> it is often hand-coded at those locations the kind changes, but that's
> >> more sensitive to programmer error.
> >
> > Yes, interesting idea.  Or maybe an assert-like function that checks an
> > existing structure for consistency.  Or maybe both.  I'll try this out.
> >
> > In the meantime, if there are no remaining concerns, I propose to commit
> > the first two patches
> >
> > Remove custom Constraint node read/write implementations
> > Remove custom _jumbleRangeTblEntry()
>
> After a few side quests, here is an updated patch set.  (I had committed
> the first of the two patches mentioned above, but not yet the second one.)
>
> v3-0001-Remove-obsolete-comment.patch
> v3-0002-Improve-comment.patch
>
> These just update a few comments around the RangeTblEntry definition.
>
> v3-0003-Reformat-some-node-comments.patch
> v3-0004-Remove-custom-_jumbleRangeTblEntry.patch
>
> This is pretty much the same patch as before.  I have now split it up to
> first reformat the comments to make room for the node annotations.  This
> patch is now also pgindent-proof.  After some side quest discussions,
> the set of fields to jumble seems correct now, so commit message
> comments to the contrary have been dropped.
>
> v3-0005-Make-RangeTblEntry-dump-order-consistent.patch
>
> I separated that from the 0008 patch below.  I think it useful even if
> we don't go ahead with 0008 now, for example in dumps from the debugger,
> and just in general to keep everything more consistent.
>
> v3-0006-WIP-AssertRangeTblEntryIsValid.patch
>
> This is in response to some of the discussions where there was some
> doubt whether all fields are always filled and cleared correctly when
> the RTE kind is changed.  Seems correct as far as this goes.  I didn't
> know of a good way to hook this in, so I put it into the write/read
> functions, which is obviously a bit weird if I'm proposing to remove
> them later.  Consider it a proof of concept.
>
> v3-0007-Simplify-range_table_mutator_impl-and-range_table.patch
> v3-0008-WIP-Remove-custom-RangeTblEntry-node-read-write-i.patch
>
> At this point, I'm not too stressed about pressing forward with these
> last two patches.  We can look at them again perhaps if we make progress
> on a more compact node output format.  When I started this thread, I had
> a lot of questions about various details about the RangeTblEntry struct,
> and we have achieved many answers during the discussions, so I'm happy
> with the progress.  So for PG17, I'd like to just do patches 0001..0005.
>


Patches 1 thru 5 look good to me

cheers

andrew


Re: automating RangeTblEntry node support

2024-03-11 Thread Peter Eisentraut

On 20.02.24 08:57, Peter Eisentraut wrote:

On 18.02.24 00:06, Matthias van de Meent wrote:

I'm not sure that the cleanup which is done when changing a RTE's
rtekind is also complete enough for this purpose.
Things like inline_cte_walker change the node->rtekind, which could
leave residual junk data in fields that are currently dropped during
serialization (as the rtekind specifically ignores those fields), but
which would add overhead when the default omission is expected to
handle these fields; as they could then contain junk. It looks like
there is some care about zeroing now unused fields, but I haven't
checked that it covers all cases and fields to the extent required so
that removing this specialized serializer would have zero impact on
size once the default omission patch is committed.

An additional patch with a single function that for this purpose
clears junk fields from RTEs that changed kind would be appreciated:
it is often hand-coded at those locations the kind changes, but that's
more sensitive to programmer error.


Yes, interesting idea.  Or maybe an assert-like function that checks an 
existing structure for consistency.  Or maybe both.  I'll try this out.


In the meantime, if there are no remaining concerns, I propose to commit 
the first two patches


Remove custom Constraint node read/write implementations
Remove custom _jumbleRangeTblEntry()


After a few side quests, here is an updated patch set.  (I had committed 
the first of the two patches mentioned above, but not yet the second one.)


v3-0001-Remove-obsolete-comment.patch
v3-0002-Improve-comment.patch

These just update a few comments around the RangeTblEntry definition.

v3-0003-Reformat-some-node-comments.patch
v3-0004-Remove-custom-_jumbleRangeTblEntry.patch

This is pretty much the same patch as before.  I have now split it up to 
first reformat the comments to make room for the node annotations.  This 
patch is now also pgindent-proof.  After some side quest discussions, 
the set of fields to jumble seems correct now, so commit message 
comments to the contrary have been dropped.


v3-0005-Make-RangeTblEntry-dump-order-consistent.patch

I separated that from the 0008 patch below.  I think it useful even if 
we don't go ahead with 0008 now, for example in dumps from the debugger, 
and just in general to keep everything more consistent.


v3-0006-WIP-AssertRangeTblEntryIsValid.patch

This is in response to some of the discussions where there was some 
doubt whether all fields are always filled and cleared correctly when 
the RTE kind is changed.  Seems correct as far as this goes.  I didn't 
know of a good way to hook this in, so I put it into the write/read 
functions, which is obviously a bit weird if I'm proposing to remove 
them later.  Consider it a proof of concept.


v3-0007-Simplify-range_table_mutator_impl-and-range_table.patch
v3-0008-WIP-Remove-custom-RangeTblEntry-node-read-write-i.patch

At this point, I'm not too stressed about pressing forward with these 
last two patches.  We can look at them again perhaps if we make progress 
on a more compact node output format.  When I started this thread, I had 
a lot of questions about various details about the RangeTblEntry struct, 
and we have achieved many answers during the discussions, so I'm happy 
with the progress.  So for PG17, I'd like to just do patches 0001..0005.
From dc53b7ddfc5aa004a0d222b4084a1c580f05a296 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 11 Mar 2024 09:15:33 +0100
Subject: [PATCH v3 1/8] Remove obsolete comment

The idea to use a union in the definition of RangeTblEntry is clearly
not being pursued.

Discussion: 
https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca...@eisentraut.org
---
 src/include/nodes/parsenodes.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2380821600..5113f97363 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1028,12 +1028,6 @@ typedef struct RangeTblEntry
 
RTEKind rtekind;/* see above */
 
-   /*
-* XXX the fields applicable to only some rte kinds should be merged 
into
-* a union.  I didn't do this yet because the diffs would impact a lot 
of
-* code that is being actively worked on.  FIXME someday.
-*/
-
/*
 * Fields valid for a plain relation RTE (else zero):
 *

base-commit: af0e7deb4a1c369bb8154ac55f085d6a93fe5c35
-- 
2.44.0

From 9fd2ae6a561ea72f627057d922e48d92eaafe099 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 11 Mar 2024 09:15:33 +0100
Subject: [PATCH v3 2/8] Improve comment

Clarify that RangeTblEntry.lateral reflects whether LATERAL was
specified in the statement (as opposed to whether lateralness is
implicit).  Also, the list of applicable entry types was incomplete.

Discussion: 
https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-

Re: automating RangeTblEntry node support

2024-02-19 Thread Peter Eisentraut

On 18.02.24 00:06, Matthias van de Meent wrote:

I'm not sure that the cleanup which is done when changing a RTE's
rtekind is also complete enough for this purpose.
Things like inline_cte_walker change the node->rtekind, which could
leave residual junk data in fields that are currently dropped during
serialization (as the rtekind specifically ignores those fields), but
which would add overhead when the default omission is expected to
handle these fields; as they could then contain junk. It looks like
there is some care about zeroing now unused fields, but I haven't
checked that it covers all cases and fields to the extent required so
that removing this specialized serializer would have zero impact on
size once the default omission patch is committed.

An additional patch with a single function that for this purpose
clears junk fields from RTEs that changed kind would be appreciated:
it is often hand-coded at those locations the kind changes, but that's
more sensitive to programmer error.


Yes, interesting idea.  Or maybe an assert-like function that checks an 
existing structure for consistency.  Or maybe both.  I'll try this out.


In the meantime, if there are no remaining concerns, I propose to commit 
the first two patches


Remove custom Constraint node read/write implementations
Remove custom _jumbleRangeTblEntry()





Re: automating RangeTblEntry node support

2024-02-17 Thread Matthias van de Meent
On Fri, 16 Feb 2024 at 21:36, Paul Jungwirth
 wrote:
>
> On 1/15/24 02:37, Peter Eisentraut wrote:
> > In this updated patch set, I have also added the treatment of the 
> > Constraint type.  (I also noted
> > that the manual read/write functions for the Constraint type are 
> > out-of-sync again, so simplifying
> > this would be really helpful.)  I have also added commit messages to each 
> > patch.
> >
> > The way I have re-ordered the patch series now, I think patches 0001 
> > through 0003 are candidates for
> > inclusion after review, patch 0004 still needs a bit more analysis and 
> > testing, as described therein.
>
> Re bloating the serialization output, we could leave this last patch until 
> after the work on that
> other thread is done to skip default-valued items.

I'm not sure that the cleanup which is done when changing a RTE's
rtekind is also complete enough for this purpose.
Things like inline_cte_walker change the node->rtekind, which could
leave residual junk data in fields that are currently dropped during
serialization (as the rtekind specifically ignores those fields), but
which would add overhead when the default omission is expected to
handle these fields; as they could then contain junk. It looks like
there is some care about zeroing now unused fields, but I haven't
checked that it covers all cases and fields to the extent required so
that removing this specialized serializer would have zero impact on
size once the default omission patch is committed.

An additional patch with a single function that for this purpose
clears junk fields from RTEs that changed kind would be appreciated:
it is often hand-coded at those locations the kind changes, but that's
more sensitive to programmer error.

Kind regards,

Matthias van de Meent




Re: automating RangeTblEntry node support

2024-02-16 Thread Paul Jungwirth

On 1/15/24 02:37, Peter Eisentraut wrote:
In this updated patch set, I have also added the treatment of the Constraint type.  (I also noted 
that the manual read/write functions for the Constraint type are out-of-sync again, so simplifying 
this would be really helpful.)  I have also added commit messages to each patch.


The way I have re-ordered the patch series now, I think patches 0001 through 0003 are candidates for 
inclusion after review, patch 0004 still needs a bit more analysis and testing, as described therein.


I had to apply the first patch by hand (on 9f13376396), so this looks due for a rebase. Patches 2-4 
applied fine.


Compiles & passes tests after each patch.

The overall idea seems like a good improvement to me.

A few remarks about cleaning up the RangeTblEntry comments:

After the fourth patch we have a "Fields valid in all RTEs" comment twice in the struct, once at the 
top and once at the bottom. It's fine IMO but maybe the second could be "More fields valid in all RTEs"?


The new order of fields in RangleTblEntry matches the intro comment, which seems like another small 
benefit.


It seems like we are moving away from ever putting RTEKind-specific fields into a union as suggested 
by the FIXME comment here. It was written in 2002. Is it time to remove it?


This now needs to say "above" not "below":

/*
 * join_using_alias is an alias clause attached directly to JOIN/USING. It
 * is different from the alias field (below) in that it does not hide the
 * range variables of the tables being joined.
 */
Alias  *join_using_alias pg_node_attr(query_jumble_ignore);

Re bloating the serialization output, we could leave this last patch until after the work on that 
other thread is done to skip default-valued items.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: automating RangeTblEntry node support

2024-01-15 Thread Peter Eisentraut

On 06.12.23 21:02, Peter Eisentraut wrote:
I have been looking into what it would take to get rid of the 
custom_read_write and custom_query_jumble for the RangeTblEntry node 
type.  This is one of the larger and more complex exceptions left.


(Similar considerations would also apply to the Constraint node type.)


In this updated patch set, I have also added the treatment of the 
Constraint type.  (I also noted that the manual read/write functions for 
the Constraint type are out-of-sync again, so simplifying this would be 
really helpful.)  I have also added commit messages to each patch.


The way I have re-ordered the patch series now, I think patches 0001 
through 0003 are candidates for inclusion after review, patch 0004 still 
needs a bit more analysis and testing, as described therein.
From f0e896a201367a639be9d08d070066867c46d7cf Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 15 Jan 2024 11:23:39 +0100
Subject: [PATCH v2 1/4] Remove custom Constraint node read/write
 implementations

This is part of an effort to reduce the number of special cases in the
automatically generated node support functions.

Allegedly, only certain fields of the Constraint node are valid based
on contype.  But this has historically not been kept up to date in the
read/write functions.  The Constraint node is only used for debugging
DDL statements, so there are no strong requirements for its output,
and there is no enforcement for its correctness.  (There was no read
support before a6bc3301925.)  Commits e7a552f303c and abf46ad9c7b are
examples of where omissions were fixed.  And the current output is
again incorrect, because the recently added "inhcount" field is not
included.

This patch just removes the custom read/write implementations for the
Constraint node type.  Now we just output all the fields, which is a
bit more than before, but at least we don't have to maintain these
functions anymore.  Also, we lose the string representation of the
contype field, but for this marginal use case that seems tolerable.
This patch also changes the documentation of the Constraint struct to
put less emphasis on grouping fields by constraint type but rather
document for each field how it's used.

Discussion: 
https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca...@eisentraut.org
---
 src/backend/nodes/outfuncs.c   | 126 -
 src/backend/nodes/readfuncs.c  | 143 -
 src/include/nodes/parsenodes.h |  38 +++--
 3 files changed, 13 insertions(+), 294 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 296ba845187..dee2b9e87b2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -699,132 +699,6 @@ _outA_Const(StringInfo str, const A_Const *node)
WRITE_LOCATION_FIELD(location);
 }
 
-static void
-_outConstraint(StringInfo str, const Constraint *node)
-{
-   WRITE_NODE_TYPE("CONSTRAINT");
-
-   WRITE_STRING_FIELD(conname);
-   WRITE_BOOL_FIELD(deferrable);
-   WRITE_BOOL_FIELD(initdeferred);
-   WRITE_LOCATION_FIELD(location);
-
-   appendStringInfoString(str, " :contype ");
-   switch (node->contype)
-   {
-   case CONSTR_NULL:
-   appendStringInfoString(str, "NULL");
-   break;
-
-   case CONSTR_NOTNULL:
-   appendStringInfoString(str, "NOT_NULL");
-   WRITE_NODE_FIELD(keys);
-   WRITE_INT_FIELD(inhcount);
-   WRITE_BOOL_FIELD(is_no_inherit);
-   WRITE_BOOL_FIELD(skip_validation);
-   WRITE_BOOL_FIELD(initially_valid);
-   break;
-
-   case CONSTR_DEFAULT:
-   appendStringInfoString(str, "DEFAULT");
-   WRITE_NODE_FIELD(raw_expr);
-   WRITE_STRING_FIELD(cooked_expr);
-   break;
-
-   case CONSTR_IDENTITY:
-   appendStringInfoString(str, "IDENTITY");
-   WRITE_NODE_FIELD(options);
-   WRITE_CHAR_FIELD(generated_when);
-   break;
-
-   case CONSTR_GENERATED:
-   appendStringInfoString(str, "GENERATED");
-   WRITE_NODE_FIELD(raw_expr);
-   WRITE_STRING_FIELD(cooked_expr);
-   WRITE_CHAR_FIELD(generated_when);
-   break;
-
-   case CONSTR_CHECK:
-   appendStringInfoString(str, "CHECK");
-   WRITE_BOOL_FIELD(is_no_inherit);
-   WRITE_NODE_FIELD(raw_expr);
-   WRITE_STRING_FIELD(cooked_expr);
-   WRITE_BOOL_FIELD(skip_validation);
-   WRITE_BOOL_FIELD(initially_valid);
-   break;
-
-   c

Re: automating RangeTblEntry node support

2023-12-06 Thread Matthias van de Meent
On Wed, 6 Dec 2023 at 21:02, Peter Eisentraut  wrote:
>
> I have been looking into what it would take to get rid of the
> custom_read_write and custom_query_jumble for the RangeTblEntry node
> type.  This is one of the larger and more complex exceptions left.
> [...]
> Now one could probably rightfully complain that having all these unused
> fields dumped would make the RangeTblEntry serialization bigger.  I'm
> not sure who big of a problem that actually is, considering how many
> often-unset fields other node types have.  But it deserves some
> consideration.  I think the best way to work around that would be to
> have a mode that omits fields that have their default value (zero).
> This would be more generally useful; for example Query also has a bunch
> of fields that are not often set.  I think this would be pretty easy to
> implement, for example like

Actually, I've worked on this last weekend, and got some good results.
It did need some fine-tuning and field annotations, but got raw
nodeToString sizes down 50%+ for the pg_rewrite table's ev_action
column, and compressed-with-pglz size of pg_rewrite total down 30%+.

> #define WRITE_INT_FIELD(fldname) \
>  if (full_mode || node->fldname) \
>  appendStringInfo(str, " :" CppAsString(fldname) " %d",
> node->fldname)
>
> There is also the discussion over at [0] about larger redesigns of the
> node serialization format.  I'm also interested in that, but here I'm
> mainly trying to remove more special cases to make that kind of work
> easier in the future.
>
> Any thoughts about the direction?

I've created a new thread [0] with my patch. It actually didn't need
_that_ many manual changes - most of it was just updating the
gen_node_support.pl code generation, and making the macros do a good
job.

In general I'm all for reducing special cases, so +1 on the idea. I'll
have to check the specifics of the patches at a later point in time.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




automating RangeTblEntry node support

2023-12-06 Thread Peter Eisentraut
I have been looking into what it would take to get rid of the 
custom_read_write and custom_query_jumble for the RangeTblEntry node 
type.  This is one of the larger and more complex exceptions left.


(Similar considerations would also apply to the Constraint node type.)

Allegedly, only certain fields of RangeTblEntry are valid based on 
rtekind.  But exactly which ones seems to be documented and handled 
inconsistently.  It seems that over time, new RTE kinds have "borrowed" 
fields that notionally belong to other RTE kinds, which is technically 
not a problem but creates a bit of a mess when trying to understand all 
this.


I have some WIP patches to accompany this discussion.

Let's start with the jumble function.  I suppose that this was just 
carried over from the pg_stat_statements-specific code without any 
detailed review.  For example, the "inh" field is documented to be valid 
in all RTEs, but it's only jumbled for RTE_RELATION.  The "lateral" 
field isn't looked at at all.  I wouldn't be surprised if there are more 
cases like this.


In the first attached patch, I remove _jumbleRangeTblEntry() and instead 
add per-field query_jumble_ignore annotations to approximately match the 
behavior of the previous custom code.  The pg_stat_statements test suite 
has some coverage of this.  I get rid of switch on rtekind; this should 
be technically correct, since we do the equal and copy functions like 
this also.  So for example the "inh" field is now considered in each 
case.  But I left "lateral" alone.  I suspect several of these new 
query_jumble_ignore should actually be dropped because the code was 
wrong before.


In the second patch, I'm removing the switch on rtekind from 
range_table_mutator_impl().  This should be fine because all the 
subroutines can handle unset/NULL fields.  And it removes one more place 
that needs to track knowledge about which fields are valid when.


In the third patch, I'm removing the custom read/write functions for 
RangeTblEntry.  Those functions wanted to have a few fields at the front 
to make the dump more legible; I'm doing that now by moving the fields 
up in the actual struct.


Not done here, but something we should do is restructure the 
documentation of RangeTblEntry itself.  I'm still thinking about the 
best way to structure this, but I'm thinking more like noting for each 
field when it's used, instead by block like it is now, which makes it 
awkward if a new RTE wants to borrow some fields.


Now one could probably rightfully complain that having all these unused 
fields dumped would make the RangeTblEntry serialization bigger.  I'm 
not sure who big of a problem that actually is, considering how many 
often-unset fields other node types have.  But it deserves some 
consideration.  I think the best way to work around that would be to 
have a mode that omits fields that have their default value (zero). 
This would be more generally useful; for example Query also has a bunch 
of fields that are not often set.  I think this would be pretty easy to 
implement, for example like


#define WRITE_INT_FIELD(fldname) \
if (full_mode || node->fldname) \
appendStringInfo(str, " :" CppAsString(fldname) " %d", 
node->fldname)


There is also the discussion over at [0] about larger redesigns of the 
node serialization format.  I'm also interested in that, but here I'm 
mainly trying to remove more special cases to make that kind of work 
easier in the future.


Any thoughts about the direction?


[0]: 
https://www.postgresql.org/message-id/flat/CACxu%3DvL_SD%3DWJiFSJyyBuZAp_2v_XBqb1x9JBiqz52a_g9z3jA%40mail.gmail.comFrom ef08061a6d8f01e7db350b74eaa2d403df6079b9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 1 Dec 2023 11:45:50 +0100
Subject: [PATCH v1 1/3] Remove custom _jumbleRangeTblEntry()

---
 src/backend/nodes/queryjumblefuncs.c | 48 
 src/include/nodes/parsenodes.h   | 42 
 2 files changed, 21 insertions(+), 69 deletions(-)

diff --git a/src/backend/nodes/queryjumblefuncs.c 
b/src/backend/nodes/queryjumblefuncs.c
index 281907a4d8..9094ea02d8 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -347,51 +347,3 @@ _jumbleA_Const(JumbleState *jstate, Node *node)
}
}
 }
-
-static void
-_jumbleRangeTblEntry(JumbleState *jstate, Node *node)
-{
-   RangeTblEntry *expr = (RangeTblEntry *) node;
-
-   JUMBLE_FIELD(rtekind);
-   switch (expr->rtekind)
-   {
-   case RTE_RELATION:
-   JUMBLE_FIELD(relid);
-   JUMBLE_NODE(tablesample);
-   JUMBLE_FIELD(inh);
-   break;
-   case RTE_SUBQUERY:
-   JUMBLE_NODE(subquery);
-   break;
-   case RTE_JOIN:
-   JUMBLE_FIELD(jointype);
-   break;
-   case RTE_FUNCTION