from:"Dean Rasheed"

Re: plpgsql: fix parsing of integer range with underscores

2024-06-04 Thread Dean Rasheed

On Fri, 17 May 2024 at 09:22, Dean Rasheed  wrote:
>
> On Wed, 15 May 2024 at 02:14, Erik Wienhold  wrote:
> >
> > plpgsql fails to parse 1_000..1_000 as 1000..1000 in FOR loops:
> >
> > Fixed in the attached patch.
> >
>
> Nice catch! The patch looks good to me on a quick read-through.
>
> I'll take a closer look next week, after the beta release, since it's
> a v16+ bug.
>

(finally got back to this)

Committed and back-patched to v16. Thanks for the report and patch.

Regards,
Dean

pgsql: Fix PL/pgSQL's handling of integer ranges containing underscores

2024-06-04 Thread Dean Rasheed

Fix PL/pgSQL's handling of integer ranges containing underscores.

Commit faff8f8e47 allowed integer literals to contain underscores, but
failed to update the lexer's "numericfail" rule. As a result, a
decimal integer literal containing underscores would fail to parse, if
used in an integer range with no whitespace after the first number,
such as "1_001..1_003" in a PL/pgSQL FOR loop.

Fix and backpatch to v16, where support for underscores in integer
literals was added.

Report and patch by Erik Wienhold.

Discussion: https://postgr.es/m/808ce947-46ec-4628-85fa-3dd600b2c154%40ewie.name

Branch
--
REL_16_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/b4e909082fa114d5934ca622b225d2352ec639fa

Modified Files
--
src/backend/parser/scan.l|  2 +-
src/fe_utils/psqlscan.l  |  2 +-
src/interfaces/ecpg/preproc/pgc.l|  2 +-
src/test/regress/expected/numerology.out | 11 +++
src/test/regress/sql/numerology.sql  |  9 +
5 files changed, 23 insertions(+), 3 deletions(-)

pgsql: Fix PL/pgSQL's handling of integer ranges containing underscores

2024-06-04 Thread Dean Rasheed

Fix PL/pgSQL's handling of integer ranges containing underscores.

Commit faff8f8e47 allowed integer literals to contain underscores, but
failed to update the lexer's "numericfail" rule. As a result, a
decimal integer literal containing underscores would fail to parse, if
used in an integer range with no whitespace after the first number,
such as "1_001..1_003" in a PL/pgSQL FOR loop.

Fix and backpatch to v16, where support for underscores in integer
literals was added.

Report and patch by Erik Wienhold.

Discussion: https://postgr.es/m/808ce947-46ec-4628-85fa-3dd600b2c154%40ewie.name

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/cd2624fd97b0c36b68da278abc5362647f69b07d

Modified Files
--
src/backend/parser/scan.l|  2 +-
src/fe_utils/psqlscan.l  |  2 +-
src/interfaces/ecpg/preproc/pgc.l|  2 +-
src/test/regress/expected/numerology.out | 11 +++
src/test/regress/sql/numerology.sql  |  9 +
5 files changed, 23 insertions(+), 3 deletions(-)

Re: Minor fixes for couple some comments around MERGE RETURNING

2024-06-04 Thread Dean Rasheed

On Thu, 23 May 2024 at 04:26, David Rowley  wrote:
>
> On Sun, 19 May 2024 at 15:20, David Rowley  wrote:
> >
> > I noticed that PlannedStmt.hasReturning and hasModifyingCTE have an
> > outdated comment now that MERGE supports RETURNING (per commit
> > c649fa24a)
> >
> > i.e. these two:
> >
> > > bool hasReturning; /* is it insert|update|delete RETURNING? */
> >
> > > bool hasModifyingCTE; /* has insert|update|delete in WITH? */
>
> I've pushed the fix for that.
>

Thanks for taking care of that.

I found another couple of similar comments that also needed updating,
so I've pushed a fix for them too.

Regards,
Dean

pgsql: Fix another couple of outdated comments for MERGE RETURNING.

2024-06-04 Thread Dean Rasheed

Fix another couple of outdated comments for MERGE RETURNING.

Oversights in c649fa24a4 which added RETURNING support to MERGE.

Discussion: 
https://postgr.es/m/caaphdvpqp6vtuzg-_josueibgyqnrnvxj-vdf+hjlxjhdhz...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/5c5bccef211cfc98e0d6c4bc1af40a33c8ac2488

Modified Files
--
src/backend/utils/adt/ruleutils.c | 2 +-
src/include/nodes/parsenodes.h| 2 +-
src/include/utils/portal.h| 6 +++---
3 files changed, 5 insertions(+), 5 deletions(-)

Re: plpgsql: fix parsing of integer range with underscores

2024-05-17 Thread Dean Rasheed

On Wed, 15 May 2024 at 02:14, Erik Wienhold  wrote:
>
> plpgsql fails to parse 1_000..1_000 as 1000..1000 in FOR loops:
>
> Fixed in the attached patch.
>

Nice catch! The patch looks good to me on a quick read-through.

I'll take a closer look next week, after the beta release, since it's
a v16+ bug.

Regards,
Dean

Re: avoid MERGE_ACTION keyword?

2024-05-17 Thread Dean Rasheed

On Thu, 16 May 2024 at 15:15, Peter Eisentraut  wrote:
>
> I wonder if we can avoid making MERGE_ACTION a keyword.
>

Yeah, there was a lot of back and forth on this point on the original
thread, and I'm still not sure which approach is best.

> I think we could parse it initially as a function and then transform it
> to a more special node later.  In the attached patch, I'm doing this in
> parse analysis.  We could try to do it even later and actually execute
> it as a function, if we could get the proper context passed into it somehow.
>

Whichever way it's done, I do think it's preferable to have the parse
analysis check, to ensure that it's being used in the right part of
the query, rather than leaving that to plan/execution time.

If it is turned into a function, the patch also needs to update the
ruleutils code --- it needs to be prepared to output a
schema-qualified function name, if necessary (something that the
keyword approach saves us from).

Regards,
Dean

Re: Underscore in positional parameters?

2024-05-14 Thread Dean Rasheed

On Tue, 14 May 2024 at 07:43, Michael Paquier  wrote:
>
> On Tue, May 14, 2024 at 05:18:24AM +0200, Erik Wienhold wrote:
> > Parameter $1_2 is taken as $1 because in rule {param} in scan.l we get
> > the parameter number with atol which stops at the underscore.  That's a
> > regression in faff8f8e47f.  Before that commit, $1_2 resulted in
> > "ERROR: trailing junk after parameter".
>
> Indeed, the behavior of HEAD is confusing.  "1_2" means 12 as a
> constant in a query, not 1, but HEAD implies 1 in the context of
> PREPARE here.
>
> > I can't tell which fix is the way to go: (1) accept underscores without
> > using atol, or (2) just forbid underscores.  Any ideas?
>
> Does the SQL specification tell anything about the way parameters
> should be marked?  Not everything out there uses dollar-marked
> parameters, so I guess that the answer to my question is no.  My take
> is all these cases should be rejected for params, only apply to
> numeric and integer constants in the queries.
>
> Adding Dean in CC as the committer of faff8f8e47f, Peter E for the SQL
> specification part, and an open item.

I'm sure that this wasn't intentional -- I think we just failed to
notice that "param" also uses "decinteger" in the scanner. Taking a
quick look, there don't appear to be any other uses of "decinteger",
so at least it only affects params.

Unless the spec explicitly says otherwise, I agree that we should
reject this, as we used to do, and add a comment saying that it's
intentionally not supported. I can't believe it would ever be useful,
and the current behaviour is clearly broken.

I've moved this to "Older bugs affecting stable branches", since it
came in with v16.

Regards,
Dean

Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-04-22 Thread Dean Rasheed

On Mon, 22 Apr 2024 at 06:04, Michael Paquier  wrote:
>
> On Fri, Apr 19, 2024 at 12:47:43PM +0300, Aleksander Alekseev wrote:
> > Thanks. I see a few pieces of code that use special FOO_NUMBER enum
> > values instead of a macro. Should we refactor these pieces
> > accordingly? PFA another patch.
>
> I don't see why not for the places you are changing here, we can be
> more consistent.

[Shrug] I do prefer using a macro. Adding a counter element to the end
of the enum feels like a hack, because the counter isn't the same kind
of thing as all the other enum elements, so it feels out of place in
the enum. On the other hand, I think it's a fairly common pattern that
most people will recognise, and for other enums that are more likely
to grow over time, it might be less error-prone than a macro, which
people might overlook and fail to update.

>  Now, such changes are material for v18.

Agreed. This has been added to the next commitfest, so let's see what
others think.

Regards,
Dean

Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-04-19 Thread Dean Rasheed

On Thu, 18 Apr 2024 at 13:00, Aleksander Alekseev
 wrote:
>
> Fair point. PFA the alternative version of the patch.
>

Thanks. Committed.

Regards,
Dean

pgsql: Use macro NUM_MERGE_MATCH_KINDS instead of '3' in MERGE code.

2024-04-19 Thread Dean Rasheed

Use macro NUM_MERGE_MATCH_KINDS instead of '3' in MERGE code.

Code quality improvement for 0294df2f1f84.

Aleksander Alekseev, reviewed by Richard Guo.

Discussion: 
https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/2e068db56e31dfb510fe7416e52b7affe26f278f

Modified Files
--
src/backend/optimizer/prep/prepjointree.c | 2 +-
src/backend/parser/parse_merge.c  | 2 +-
src/include/nodes/execnodes.h | 2 +-
src/include/nodes/primnodes.h | 2 ++
4 files changed, 5 insertions(+), 3 deletions(-)

Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-04-16 Thread Dean Rasheed

On Tue, 16 Apr 2024 at 11:35, Richard Guo  wrote:
>
> On Tue, Apr 16, 2024 at 5:48 PM Aleksander Alekseev 
>  wrote:
>>
>> Oversight of 0294df2f1f84 [1].
>>
>> [1]: 
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0294df2f1f84
>
> +1.  I think this change improves the code quality.  I searched for
> other arrays indexed by merge match kind, but found none.  So this patch
> seems thorough.
>

Yes this makes sense, though I note that some other similar code uses
a #define rather than inserting an enum element at the end (e.g.,
NUM_ROWFILTER_PUBACTIONS).

I guess the argument against inserting an enum element at the end is
that a switch statement on the enum value might generate a compiler
warning if it didn't have a default clause.

Looking at how NUM_ROWFILTER_PUBACTIONS is defined as the last element
plus one, it might seem to be barely any better than just defining it
to be 3, since any new enum element would probably be added at the
end, requiring it to be updated in any case. But if the number of
elements were much larger, it would be much more obviously correct,
making it a good general pattern to follow. So in the interests of
code consistency, I think we should do the same here.

Regards,
Dean

pgsql: Add support for MERGE ... WHEN NOT MATCHED BY SOURCE.

2024-03-30 Thread Dean Rasheed

Add support for MERGE ... WHEN NOT MATCHED BY SOURCE.

This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE
actions, which operate on rows that exist in the target relation, but
not in the data source. These actions can execute UPDATE, DELETE, or
DO NOTHING sub-commands.

This is in contrast to already-supported WHEN NOT MATCHED actions,
which operate on rows that exist in the data source, but not in the
target relation. To make this distinction clearer, such actions may
now be written as WHEN NOT MATCHED BY TARGET.

Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is
equivalent to writing WHEN NOT MATCHED BY TARGET.

Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing.

Discussion: 
https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0r4nezuvwp+syrk...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/0294df2f1f842dfb0eed79007b21016f486a3c6c

Modified Files
--
doc/src/sgml/mvcc.sgml|  12 +-
doc/src/sgml/ref/merge.sgml   | 100 +--
src/backend/executor/execMain.c   |   6 +-
src/backend/executor/execPartition.c  |  21 +-
src/backend/executor/nodeModifyTable.c| 387 --
src/backend/nodes/nodeFuncs.c |   3 +
src/backend/optimizer/plan/createplan.c   |   8 +-
src/backend/optimizer/plan/planner.c  |  22 ++
src/backend/optimizer/plan/setrefs.c  |  20 +-
src/backend/optimizer/prep/prepjointree.c |  51 +++-
src/backend/optimizer/prep/preptlist.c|  26 +-
src/backend/optimizer/util/pathnode.c |   5 +-
src/backend/parser/gram.y |  62 +++--
src/backend/parser/parse_merge.c  |  56 ++--
src/backend/utils/adt/ruleutils.c |  41 ++-
src/bin/psql/tab-complete.c   |  30 +-
src/include/catalog/catversion.h  |   2 +-
src/include/nodes/execnodes.h |  15 +-
src/include/nodes/parsenodes.h|   7 +-
src/include/nodes/pathnodes.h |   2 +
src/include/nodes/plannodes.h |   2 +
src/include/nodes/primnodes.h |  10 +-
src/include/optimizer/pathnode.h  |   3 +-
src/include/parser/kwlist.h   |   2 +
src/test/isolation/expected/merge-update.out  |  88 --
src/test/isolation/specs/merge-update.spec|  10 +-
src/test/regress/expected/merge.out   | 299 +++-
src/test/regress/expected/rules.out   |  32 +++
src/test/regress/expected/updatable_views.out |  90 ++
src/test/regress/sql/merge.sql| 122 ++--
src/test/regress/sql/rules.sql|  19 ++
src/test/regress/sql/updatable_views.sql  |  32 +++
src/tools/pgindent/typedefs.list  |   1 +
33 files changed, 1226 insertions(+), 360 deletions(-)

Re: Adding OLD/NEW support to RETURNING

2024-03-27 Thread Dean Rasheed

On Wed, 27 Mar 2024 at 07:47, Jeff Davis  wrote:
>
> This isn't a complete review, but I spent a while looking at this, and
> it looks like it's in good shape.

Thanks for looking.

> I like the syntax, and I think the solution for renaming the alias
> ("RETURNING WITH (new as n, old as o)") is a good one.

Thanks, that's good to know. Settling on a good syntax can be
difficult, so it's good to know that people are generally supportive
of this.

> The implementation touches quite a few areas. How did you identify all
> of the potential problem areas?

Hmm, well that's one of the hardest parts, and it's really difficult
to be sure that I have.

Initially, when I was just adding a new field to Var, I just tried to
look at all the existing code that made Vars, or copied other
non-default fields like varnullingrels around. I still managed to miss
the necessary change in assign_param_for_var() on my first attempt,
but fortunately that was an easy fix.

More worrying was the fact that I managed to completely overlook the
fact that I needed to worry about non-updatable columns in
auto-updatable views until v6, which added the ReturningExpr node.
Once I realised that I needed that, and that it needed to be tied to a
particular query level, and so needed a "levelsup" field, I just
looked at GroupingFunc to identify the places in code that needed to
be updated to do the right thing for a query-level-aware node.

What I'm most worried about now is that there are other areas of
functionality like that, that I'm overlooking, and which will interact
with this feature in non-trivial ways.

> It seems the primary sources of
> complexity came from rules, partitioning, and updatable views, is that
> right?

Foreign tables looked like it would be tricky at first, but then
turned out to be trivial, after disallowing direct-modify when
returning old/new.

Rules are a whole area that I wish I didn't have to worry about (I
wish we had deprecated them a long time ago). In practice though, I
haven't done much beyond what seemed like the most obvious (and
simplest) thing.

Nonetheless, there are some interesting interactions that probably
need more careful examination. For example, the fact that the
RETURNING clause in a RULE already has its own "special table names"
OLD and NEW, which are actually references to different RTEs, unlike
the OLD and NEW that this patch introduces, which are references to
the result relation. This leads to a number of different cases:

Case 1
==

In the simplest case, the rule can simply contain "RETURNING *". This
leads to what I think is the most obvious and intuitive behaviour:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
  DO INSTEAD UPDATE t1 SET val1 = NEW.val2
  RETURNING *;

UPDATE t2 SET val2 = 'New value 2'
  RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
t2.val2 AS t2_val2, val2;

  old_val2   |  new_val2   |   t2_val2   |val2
-+-+-+-
 Old value 1 | New value 2 | New value 2 | New value 2
(1 row)

So someone using the table with the rule can access old and new values
in the obvious way, and they will get new values by default for an
UPDATE.

The query plan for this is pretty-much what you'd expect:

  QUERY PLAN
---
 Update on public.t1
   Output: old.val1, new.val1, t1.val1, t1.val1
   ->  Nested Loop
 Output: 'New value 2'::text, t1.ctid, t2.ctid
 ->  Seq Scan on public.t1
   Output: t1.ctid
 ->  Materialize
   Output: t2.ctid
   ->  Seq Scan on public.t2
 Output: t2.ctid

Case 2
==

If the rule contains "RETURNING OLD.*", it means that the RETURNING
list of the rewritten query contains Vars that no longer refer to the
result relation, but instead refer to the old data in t2. This leads
the the following behaviour:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
  DO INSTEAD UPDATE t1 SET val1 = NEW.val2
  RETURNING OLD.*;

UPDATE t2 SET val2 = 'New value 2'
  RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
t2.val2 AS t2_val2, val2;

  old_val2   |  new_val2   |   t2_val2   |val2
-+-+-+-
 Old value 2 | Old value 2 | Old value 2 | Old value 2
(1 row)

The reason this happens is that the Vars in the returning list don't
refer to the result relation, and so setting varreturningtype on them
has no effect, and is simply ignored. This can be seen by looking at
the query plan:

   QUERY PLAN

Re: Functions to return random numbers in a given range

2024-03-27 Thread Dean Rasheed

On Tue, 26 Mar 2024 at 06:57, Dean Rasheed  wrote:
>
> Based on the reviews so far, I think this is ready for commit, so
> unless anyone objects, I will do so in a day or so.
>

Committed. Thanks for the reviews.

Regards,
Dean

pgsql: Add functions to generate random numbers in a specified range.

2024-03-27 Thread Dean Rasheed

Add functions to generate random numbers in a specified range.

This adds 3 new variants of the random() function:

random(min integer, max integer) returns integer
random(min bigint, max bigint) returns bigint
random(min numeric, max numeric) returns numeric

Each returns a random number x in the range min <= x <= max.

For the numeric function, the number of digits after the decimal point
is equal to the number of digits that "min" or "max" has after the
decimal point, whichever has more.

The main entry points for these functions are in a new C source file.
The existing random(), random_normal(), and setseed() functions are
moved there too, so that they can all share the same PRNG state, which
is kept private to that file.

Dean Rasheed, reviewed by Jian He, David Zhang, Aleksander Alekseev,
and Tomas Vondra.

Discussion: 
https://postgr.es/m/caezatcv89vxuq93xqdmc0t-0y2zeenqtdsjbmv7dyfbpykb...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/e6341323a8da64b18e9af3e75a4578230702d61c

Modified Files
--
doc/src/sgml/func.sgml|  43 +++-
src/backend/utils/adt/Makefile|   1 +
src/backend/utils/adt/float.c |  95 
src/backend/utils/adt/meson.build |   1 +
src/backend/utils/adt/numeric.c   | 219 ++
src/backend/utils/adt/pseudorandomfuncs.c | 185 +++
src/common/pg_prng.c  |  36 +++
src/include/catalog/catversion.h  |   2 +-
src/include/catalog/pg_proc.dat   |  12 +
src/include/common/pg_prng.h  |   1 +
src/include/utils/numeric.h   |   4 +
src/test/regress/expected/random.out  | 360 ++
src/test/regress/sql/random.sql   | 164 ++
13 files changed, 1022 insertions(+), 101 deletions(-)

Re: Catalog domain not-null constraints

2024-03-26 Thread Dean Rasheed

On Tue, 26 Mar 2024 at 07:30, Alvaro Herrera  wrote:
>
> On 2024-Mar-25, Dean Rasheed wrote:
>
> > Also (not this patch's fault), psql doesn't seem to offer a way to
> > display domain constraint names -- something you need to know to drop
> > or alter them. Perhaps \dD+ could be made to do that?
>
> Ooh, I remember we had offered a patch for \d++ to display these
> constraint names for tables, but didn't get around to gather consensus
> for it.  We did gather consensus on *not* wanting \d+ to display them,
> but we need *something*.  I suppose we should do something symmetrical
> for tables and domains.  How about \dD++ and \dt++?
>

Personally, I quite like the fact that \d+ displays NOT NULL
constraints, because it puts them on an equal footing with CHECK
constraints. However, I can appreciate that it will significantly
increase the length of the output in some cases.

With \dD it's not so nice because of the way it puts all the details
on one line. The obvious output might look something like this:

\dD
   List of domains
 Schema | Name |  Type   | Collation | Nullable | Default |  Check
+--+-+---+--+-+---
 public | d1   | integer |   | NOT NULL | | CHECK (VALUE > 0)

\dD+
List of domains
 Schema | Name |  Type   | Collation | Nullable
| Default |Check  | Access privileges
| Description
+--+-+---+-+-+---+---+-
 public | d1   | integer |   | CONSTRAINT d1_not_null NOT NULL
| | CONSTRAINT d1_check CHECK (VALUE > 0) |
|

So you'd need quite a wide window to easily view it (or use \x). I
suppose the width could be reduced by dropping the word "CONSTRAINT"
in the \dD+ case, but it's probably still going to be wider than the
average window.

Regards,
Dean

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2024-03-26 Thread Dean Rasheed

On Thu, 21 Mar 2024 at 09:35, Dean Rasheed  wrote:
>
> Trivial rebase forced by 6185c9737c.
>

I think it would be good to get this committed.

It has had a decent amount of review, at least up to v9, but a number
of things have changed since then:

1). Concurrent update behaviour -- now if a concurrent update causes a
matched candidate row to no longer match the join condition, it will
execute the first qualifying NOT MATCHED BY SOURCE action, and then
the first qualifying NOT MATCHED [BY TARGET] action. I.e., it may
execute 2 actions, which makes sense because if the rows had started
out not matching, the full join would have output 2 rows.

2). ResultRelInfo now has 3 lists of actions, one per match kind.
Previously I was putting the NOT MATCHED BY SOURCE actions in the same
list as the MATCHED actions, since they are both handled by
ExecMergeMatched(). However, to achieve (1) above, it turned out to be
easier to have 3 separate lists, and this makes some other code a
little neater.

3). I've added a new field Query.mergeJoinCondition so that
transformMergeStmt() no longer puts the join conditions in
qry->jointree->quals. That's necessary to make it work correctly on an
auto-updatable view which might have its own quals, but it also seems
neater anyway.

4). To distinguish the MATCHED case from NOT MATCHED BY SOURCE case in
the executor, it now uses the join condition (previously it added a
"source IS [NOT] NULL" clause to each merge action). This has the
advantage that it involves just one qual check per candidate row,
rather than one for each action. On the downside, it's checking the
join condition twice (since the source subplan's join node already
checked it), but I couldn't see an easy way round that. (It only does
this if there are both MATCHED and NOT MATCHED BY SOURCE actions, so
it's not making any existing queries worse.)

5). To support (4), I added new fields
ModifyTablePath.mergeJoinConditions, ModifyTable.mergeJoinConditions
and ResultRelInfo.ri_MergeJoinCondition, since the attribute numbers
in the join condition might vary by partition.

6). I got rid of Query.mergeSourceRelation, which is no longer needed,
because of (4). (And as before, it also gets rid of
Query.mergeUseOuterJoin, since the parser is no longer making the
decision about what kind of join to build.)

7). To support (1), I added a new field
ModifyTableState.mt_merge_pending_not_matched, because if it has to
execute 2 actions following a concurrent update, and there is a
RETURNING clause, it has to defer the second action until the next
call to ExecModifyTable().

8). I've added isolation tests to test (1).

9). I've added a lot more regression tests.

10). I've made a lot of comment changes in nodeModifyTable.c,
especially relating to the discussion around concurrent updates.

Overall, I feel like this is in pretty good shape, and it manages to
make a few code simplifications that look quite nice.

Regards,
Dean

Re: Functions to return random numbers in a given range

2024-03-26 Thread Dean Rasheed

On Tue, 27 Feb 2024 at 17:33, Dean Rasheed  wrote:
>
> On Sat, 24 Feb 2024 at 17:10, Tomas Vondra
> >
> > I did a quick review and a little bit of testing on the patch today. I
> > think it's a good/useful idea, and I think the code is ready to go (the
> > code is certainly much cleaner than anything I'd written ...).
>

Based on the reviews so far, I think this is ready for commit, so
unless anyone objects, I will do so in a day or so.

As a quick summary, this adds a new file:

src/backend/utils/adt/pseudorandomfuncs.c

which contains SQL-callable functions that access a single shared
pseudorandom number generator, whose state is private to that file.
Currently the functions are:

  random() returns double precision [moved from float.c]
  random(min integer, max integer) returns integer  [new]
  random(min bigint, max bigint) returns bigint [new]
  random(min numeric, max numeric) returns numeric  [new]
  random_normal() returns double precision  [moved from float.c]
  setseed(seed double precision) returns void   [moved from float.c]

It's possible that functions to return other random distributions or
other datatypes might get added in the future, but I have no plans to
do so at the moment.

Regards,
Dean

Re: Catalog domain not-null constraints

2024-03-25 Thread Dean Rasheed

On Fri, 22 Mar 2024 at 08:28, jian he  wrote:
>
> On Thu, Mar 21, 2024 at 7:23 PM Peter Eisentraut  wrote:
> >
> > Hmm.  CREATE DOMAIN uses column constraint syntax, but ALTER DOMAIN uses
> > table constraint syntax.  Attached is a patch to try to sort this out.
>
> also you should also change src/backend/utils/adt/ruleutils.c?
>
> src6=# \dD
>   List of domains
>  Schema |Name |  Type   | Collation | Nullable | Default |
>  Check
> +-+-+---+--+-+--
>  public | domain_test | integer |   | not null | |
> CHECK (VALUE > 0) NOT NULL VALUE
> (1 row)
>
> probably change to CHECK (VALUE IS NOT NULL)

I'd say it should just output "NOT NULL", since that's the input
syntax that created the constraint. But then again, why display NOT
NULL constraints in that column at all, when there's a separate
"Nullable" column?

Also (not this patch's fault), psql doesn't seem to offer a way to
display domain constraint names -- something you need to know to drop
or alter them. Perhaps \dD+ could be made to do that?

+   The syntax NOT NULL in this command is a
+   PostgreSQL extension.  (A standard-conforming
+   way to write the same would be CHECK (VALUE IS NOT
+   NULL).  However, per ,
+   such constraints a best avoided in practice anyway.)  The
+   NULL constraint is a
+   PostgreSQL extension (see also ).

I didn't verify this, but I thought that according to the SQL
standard, only non-NULL values should be passed to CHECK constraints,
so there is no standard-conforming way to write a NOT NULL domain
constraint.

FWIW, I think NOT NULL domain constraints are a useful feature to
have, and I suspect that there are more people out there who use them
and like them, than who care what the SQL standard says. If so, I'm in
favour of allowing them to be named and managed in the same way as NOT
NULL table constraints.

+   processCASbits($5, @5, "CHECK",
+  NULL, NULL, >skip_validation,
+  >is_no_inherit, yyscanner);
+   n->initially_valid = !n->skip_validation;

+   /* no NOT VALID support yet */
+   processCASbits($3, @3, "NOT NULL",
+  NULL, NULL, NULL,
+  >is_no_inherit, yyscanner);
+   n->initially_valid = true;

NO INHERIT is allowed for domain constraints? What does that even mean?

There's something very wonky about this:

CREATE DOMAIN d1 AS int CHECK (value > 0) NO INHERIT; -- Rejected
ERROR:  check constraints for domains cannot be marked NO INHERIT

CREATE DOMAIN d1 AS int;
ALTER DOMAIN d1 ADD CHECK (value > 0) NO INHERIT; -- Allowed

CREATE DOMAIN d2 AS int NOT NULL NO INHERIT; -- Now allowed (used to
syntax error)

CREATE DOMAIN d3 AS int;
ALTER DOMAIN d3 ADD NOT NULL NO INHERIT; -- Allowed

Presumably all of those should be rejected in the grammar.

Regards,
Dean

Re: Catalog domain not-null constraints

2024-03-20 Thread Dean Rasheed

On Wed, 20 Mar 2024 at 09:43, Peter Eisentraut  wrote:
>
> On 19.03.24 10:57, jian he wrote:
> > this new syntax need to be added into the alter_domain.sgml's synopsis and 
> > also
> > need an explanation varlistentry?
>
> The ALTER DOMAIN reference page refers to CREATE DOMAIN about the
> details of the constraint syntax.  I believe this is still accurate.  We
> could add more detail locally on the ALTER DOMAIN page, but that is not
> this patch's job.  For example, the details of CHECK constraints are
> also not shown on the ALTER DOMAIN page right now.
>

Hmm, for CHECK constraints, the ALTER DOMAIN syntax for adding a
constraint is the same as for CREATE DOMAIN, but that's not the case
for NOT NULL constraints. So, for example, these both work:

CREATE DOMAIN d AS int CONSTRAINT c1 CHECK (value > 0);

ALTER DOMAIN d ADD CONSTRAINT c2 CHECK (value < 10);

However, for NOT NULL constraints, the ALTER DOMAIN syntax differs
from the CREATE DOMAIN syntax, because it expects "NOT NULL" to be
followed by a column name. So the following CREATE DOMAIN syntax
works:

CREATE DOMAIN d AS int CONSTRAINT nn NOT NULL;

but the equivalent ALTER DOMAIN syntax doesn't work:

ALTER DOMAIN d ADD CONSTRAINT nn NOT NULL;

ERROR:  syntax error at or near ";"
LINE 1: ALTER DOMAIN d ADD CONSTRAINT nn NOT NULL;
 ^

All the examples in the tests append "value" to this, presumably by
analogy with CHECK constraints, but it looks as though anything works,
and is simply ignored:

ALTER DOMAIN d ADD CONSTRAINT nn NOT NULL xxx; -- works

That doesn't seem particularly satisfactory. I think it should not
require (and reject) a column name after "NOT NULL".

Looking in the SQL spec, it seems to only mention adding CHECK
constraints to domains, so the option to add NOT NULL constraints
should probably be listed in the "Compatibility" section.

Regards,
Dean

Re: Proposal to include --exclude-extension Flag in pg_dump

2024-03-20 Thread Dean Rasheed

On Tue, 19 Mar 2024 at 19:17, Ayush Vatsa  wrote:
>
> > I'm marking this ready-for-commit (which I'll probably do myself in a
> > day or two, unless anyone else claims it first).
>
> Thank you very much, this marks my first contribution to the open-source 
> community, and I'm enthusiastic about making further meaningful contributions 
> to PostgreSQL in the future.
>

Committed. Congratulations on your first contribution to PostgreSQL!
May it be the first of many to come.

Regards,
Dean

pgsql: Add "--exclude-extension" to pg_dump's options.

2024-03-20 Thread Dean Rasheed

Add "--exclude-extension" to pg_dump's options.

This option (or equivalently specifying "exclude extension pattern" in
a filter file) allows extensions matching the specified pattern to be
excluded from the dump.

Ayush Vatsa, reviewed by Junwang Zhao, Dean Rasheed, and Daniel
Gustafsson.

Discussion: 
https://postgr.es/m/CACX+KaP=VgVy9h-EUh598DTu+-fNr1jyEmpghC8rRp9s=w3...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/522ed12f7c600243870b13d9ff59f8fd5af10978

Modified Files
--
doc/src/sgml/ref/pg_dump.sgml   | 34 +--
src/bin/pg_dump/pg_dump.c   | 33 ++-
src/test/modules/test_pg_dump/t/001_base.pl | 88 ++---
3 files changed, 139 insertions(+), 16 deletions(-)

Re: Improving EXPLAIN's display of SubPlan nodes

2024-03-19 Thread Dean Rasheed

On Tue, 19 Mar 2024 at 21:40, Tom Lane  wrote:
>
> I'm inclined to leave that alone.  The actual source sub-SELECT
> could only have had one column, so no real confusion is possible.
> Yeah, there's a resjunk grouping column visible in the plan as well,
> but those exist in many other queries, and we've not gotten questions
> about them.
>

Fair enough. I have no further comments.

Regards,
Dean

Re: Improving EXPLAIN's display of SubPlan nodes

2024-03-19 Thread Dean Rasheed

On Tue, 19 Mar 2024 at 16:42, Tom Lane  wrote:
>
> Here's a hopefully-final version that makes that adjustment and
> tweaks a couple of comments.
>

This looks very good to me.

One final case that could possibly be improved is this one from aggregates.out:

explain (verbose, costs off)
select array(select sum(x+y) s
from generate_series(1,3) y group by y order by s)
  from generate_series(1,3) x;
QUERY PLAN
---
 Function Scan on pg_catalog.generate_series x
   Output: ARRAY(SubPlan 1)
   Function Call: generate_series(1, 3)
   SubPlan 1
 ->  Sort
   Output: (sum((x.x + y.y))), y.y
   Sort Key: (sum((x.x + y.y)))
   ->  HashAggregate
 Output: sum((x.x + y.y)), y.y
 Group Key: y.y
 ->  Function Scan on pg_catalog.generate_series y
   Output: y.y
   Function Call: generate_series(1, 3)

ARRAY operates on a SELECT with a single targetlist item, but in this
case it looks like the subplan output has 2 columns, which might
confuse people.

I wonder if we should output "ARRAY((SubPlan 1).col1)" to make it
clearer. Since ARRAY_SUBLINK is a special case, which always collects
the first column's values, we could just always output "col1" for
ARRAY.

Regards,
Dean

Re: Proposal to include --exclude-extension Flag in pg_dump

2024-03-19 Thread Dean Rasheed

On Sat, 16 Mar 2024 at 17:36, Ayush Vatsa  wrote:
>
> Attached is the complete patch with all the required code changes.
> Looking forward to your review and feedback.
>

This looks good to me. I tested it and everything worked as expected.

I ran it through pgindent to fix some whitespace issues and added
another test for the filter option, based on the test case you added.

I'm marking this ready-for-commit (which I'll probably do myself in a
day or two, unless anyone else claims it first).

Regards,
Dean
From f757ebe748ab47d1e1ab40b343af2a43a9183287 Mon Sep 17 00:00:00 2001
From: Ayush Vatsa 
Date: Mon, 25 Dec 2023 14:46:05 +0530
Subject: [PATCH v3] Add support for --exclude-extension in pg_dump

When specified, extensions matching the given pattern are excluded
in dumps.
---
 doc/src/sgml/ref/pg_dump.sgml   | 34 ++--
 src/bin/pg_dump/pg_dump.c   | 33 +++-
 src/test/modules/test_pg_dump/t/001_base.pl | 88 ++---
 3 files changed, 139 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 0caf56e0e0..8edf03a03d 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -256,6 +256,27 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-extension=pattern
+  
+   
+Do not dump any extensions matching pattern.  The pattern is
+interpreted according to the same rules as for -e.
+--exclude-extension can be given more than once to exclude extensions
+matching any of several patterns.
+   
+
+   
+When both -e and --exclude-extension are given, the behavior
+is to dump just the extensions that match at least one -e
+switch but no --exclude-extension switches.  If --exclude-extension
+appears without -e, then extensions matching --exclude-extension are
+excluded from what is otherwise a normal dump.
+   
+  
+ 
+
  
   -E encoding
   --encoding=encoding
@@ -848,10 +869,11 @@ PostgreSQL documentation
 --exclude-table-and-children or
 -T for tables,
 -n/--schema for schemas,
---include-foreign-data for data on foreign servers and
+--include-foreign-data for data on foreign servers,
 --exclude-table-data,
---exclude-table-data-and-children for table data,
--e/--extension for extensions.
+--exclude-table-data-and-children for table data, and
+-e/--extension or
+--exclude-extension for extensions.
 To read from STDIN, use - as the
 filename.  The --filter option can be specified in
 conjunction with the above listed options for including or excluding
@@ -874,8 +896,7 @@ PostgreSQL documentation
  
   
extension: extensions, works like the
-   --extension option. This keyword can only be
-   used with the include keyword.
+   -e/--extension option.
   
  
  
@@ -1278,7 +1299,8 @@ PostgreSQL documentation


 This option has no effect
-on -N/--exclude-schema,
+on --exclude-extension,
+-N/--exclude-schema,
 -T/--exclude-table,
 or --exclude-table-data.  An exclude pattern failing
 to match any objects is not considered an error.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a5149ca823..3ab7c6676a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -136,6 +136,9 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 static SimpleStringList extension_include_patterns = {NULL, NULL};
 static SimpleOidList extension_include_oids = {NULL, NULL};
 
+static SimpleStringList extension_exclude_patterns = {NULL, NULL};
+static SimpleOidList extension_exclude_oids = {NULL, NULL};
+
 static const CatalogId nilCatalogId = {0, 0};
 
 /* override for standard extra_float_digits setting */
@@ -437,6 +440,7 @@ main(int argc, char **argv)
 		{"exclude-table-data-and-children", required_argument, NULL, 14},
 		{"sync-method", required_argument, NULL, 15},
 		{"filter", required_argument, NULL, 16},
+		{"exclude-extension", required_argument, NULL, 17},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -672,6 +676,11 @@ main(int argc, char **argv)
 read_dump_filters(optarg, );
 break;
 
+			case 17:			/* exclude extension(s) */
+simple_string_list_append(_exclude_patterns,
+		  optarg);
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -890,6 +899,10 @@ main(int argc, char **argv)
 		if (extension_include_oids.head == NULL)
 			pg_fatal("no matching extensions were found");
 	}
+	expand_extension_name_patterns(fout, _exclude_patterns,
+   _exclude_oids,
+   false);
+	/* non-matching exclusion patterns aren't an error */
 
 	/*
 	 * Dumping LOs

Re: Improving EXPLAIN's display of SubPlan nodes

2024-03-18 Thread Dean Rasheed

On Mon, 18 Mar 2024 at 23:19, Tom Lane  wrote:
>
> > Hm.  I used "rescan(SubPlan)" in the attached, but it still looks
> > a bit odd to my eye.
>
> After thinking a bit more, I understood what was bothering me about
> that notation: it looks too much like a call of a user-defined
> function named "rescan()".  I think we'd be better off with the
> all-caps "RESCAN()".
>

Or perhaps move the parentheses, and write "(rescan SubPlan N)" or
"(reset SubPlan N)". Dunno.

Regards,
Dean

Re: Improving EXPLAIN's display of SubPlan nodes

2024-03-18 Thread Dean Rasheed

On Sun, 17 Mar 2024 at 19:39, Tom Lane  wrote:
>
> Here's a cleaned-up version that seems to successfully resolve
> PARAM_EXEC references in all cases.  I haven't changed the
> "(plan_name).colN" notation, but that's an easy fix once we have
> consensus on the spelling.

I took a more detailed look at this and the code and doc changes all
look good to me.

There's a comment at the end of find_param_generator() that should
probably say "No generator found", rather than "No referent found".

The get_rule_expr() code could perhaps be simplified a bit, getting
rid of the show_subplan_name variable and moving the recursive calls
to get_rule_expr() to after the switch statement -- if testexpr is
non-NULL, print it, else print the subplan name probably works for all
subplan types.

The "colN" notation has grown on me, especially when you look at
examples like those in partition_prune.out with a mix of Param types.
Not using "$n" for 2 different purposes is good, and I much prefer
this to the original "FROM [HASHED] SubPlan N (returns ...)" notation.

> There are two other loose ends bothering me:
>
> 1.  I see that Gather nodes sometimes print things like
>
>->  Gather (actual rows=N loops=N)
>  Workers Planned: 2
>  Params Evaluated: $0, $1
>  Workers Launched: N
>
> I propose we nuke the "Params Evaluated" output altogether.

+1

> 2.  MULTIEXPR_SUBLINK subplans now result in EXPLAIN output like
>
>->  Result
>  Output: (SubPlan 1).col1, (SubPlan 1).col2, (SubPlan 1), i.tableoid, 
> i.ctid
>
> The undecorated reference to (SubPlan 1) is fairly confusing, since
> it doesn't correspond to anything that will actually get output.
> I suggest that perhaps instead this should read
>
>  Output: (SubPlan 1).col1, (SubPlan 1).col2, IGNORE(SubPlan 1), 
> i.tableoid, i.ctid
>
> or
>
>  Output: (SubPlan 1).col1, (SubPlan 1).col2, RESET(SubPlan 1), 
> i.tableoid, i.ctid

I think "RESET()" or "RESCAN()" or something like that is better than
"INGORE()", because it indicates that it is actually doing something.
I don't really have a better idea. Perhaps not all uppercase though,
since that seems to go against the rest of the EXPLAIN output.

Regards,
Dean

Re: MERGE ... RETURNING

2024-03-18 Thread Dean Rasheed

On Fri, 15 Mar 2024 at 17:14, Jeff Davis  wrote:
>
> On Fri, 2024-03-15 at 11:20 +, Dean Rasheed wrote:
>
> > So barring any further objections, I'd like to go ahead and get this
> > patch committed.
>
> I like this feature from a user perspective. So +1 from me.
>

I have committed this. Thanks for all the feedback everyone.

Regards,
Dean

pgsql: Fix PDF doc generation.

2024-03-17 Thread Dean Rasheed

Fix PDF doc generation.

Commit c649fa24a4 broke PDF generation, due to a misplaced id
attribute.

Per buildfarm member crake.

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/7eb9a8201890f3b208fd4c109a5b08bf139b692a

Modified Files
--
doc/src/sgml/func.sgml  |  4 ++--
doc/src/sgml/ref/merge.sgml | 17 +
2 files changed, 11 insertions(+), 10 deletions(-)

pgsql: Add RETURNING support to MERGE.

2024-03-17 Thread Dean Rasheed

Add RETURNING support to MERGE.

This allows a RETURNING clause to be appended to a MERGE query, to
return values based on each row inserted, updated, or deleted. As with
plain INSERT, UPDATE, and DELETE commands, the returned values are
based on the new contents of the target table for INSERT and UPDATE
actions, and on its old contents for DELETE actions. Values from the
source relation may also be returned.

As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be
used as the source relation for other operations such as WITH queries
and COPY commands.

Additionally, a special function merge_action() is provided, which
returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action
executed for each row. The merge_action() function can be used
anywhere in the RETURNING list, including in arbitrary expressions and
subqueries, but it is an error to use it anywhere outside of a MERGE
query's RETURNING list.

Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera,
Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut,
and Wolfgang Walther.

Discussion: 
http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/c649fa24a42ba89bf5460c7110e4fc8eeca65959

Modified Files
--
doc/src/sgml/dml.sgml |  22 ++-
doc/src/sgml/func.sgml|  79 
doc/src/sgml/glossary.sgml|   6 +-
doc/src/sgml/plpgsql.sgml |  16 +-
doc/src/sgml/queries.sgml |   9 +-
doc/src/sgml/ref/copy.sgml|  19 +-
doc/src/sgml/ref/merge.sgml   |  68 +--
doc/src/sgml/ref/select.sgml  |  11 +-
doc/src/sgml/rowtypes.sgml|   2 +-
doc/src/sgml/spi.sgml |  14 +-
doc/src/sgml/xfunc.sgml   |   8 +-
src/backend/commands/copy.c   |   6 -
src/backend/commands/copyto.c |   3 +-
src/backend/executor/execExpr.c   |  13 ++
src/backend/executor/execExprInterp.c |  48 +
src/backend/executor/execPartition.c  |   8 +-
src/backend/executor/functions.c  |   9 +-
src/backend/executor/nodeModifyTable.c| 202 +--
src/backend/executor/spi.c|   7 +-
src/backend/jit/llvm/llvmjit_expr.c   |   6 +
src/backend/jit/llvm/llvmjit_types.c  |   1 +
src/backend/nodes/nodeFuncs.c |  17 ++
src/backend/optimizer/plan/subselect.c|   9 +-
src/backend/optimizer/util/paramassign.c  |  51 +
src/backend/parser/analyze.c  |  19 +-
src/backend/parser/gram.y |  14 +-
src/backend/parser/parse_agg.c|   2 +
src/backend/parser/parse_cte.c|  10 +-
src/backend/parser/parse_expr.c   |  34 
src/backend/parser/parse_func.c   |   1 +
src/backend/parser/parse_merge.c  |   7 +-
src/backend/parser/parse_relation.c   |   7 +-
src/backend/parser/parse_target.c |   4 +
src/backend/rewrite/rewriteHandler.c  |   9 +-
src/backend/rewrite/rowsecurity.c |  28 ++-
src/backend/tcop/utility.c|   3 +-
src/backend/utils/adt/ruleutils.c |  14 +-
src/bin/psql/common.c |   8 +-
src/include/catalog/catversion.h  |   2 +-
src/include/executor/execExpr.h   |   3 +
src/include/executor/spi.h|   1 +
src/include/nodes/execnodes.h |   3 +
src/include/nodes/parsenodes.h|   1 +
src/include/nodes/primnodes.h |  21 ++
src/include/optimizer/paramassign.h   |   2 +
src/include/parser/analyze.h  |   2 +
src/include/parser/kwlist.h   |   1 +
src/include/parser/parse_node.h   |   3 +-
src/pl/plpgsql/src/pl_exec.c  |  12 +-
src/pl/tcl/pltcl.c|   1 +
src/test/regress/expected/merge.out   | 266 --
src/test/regress/expected/rowsecurity.out |  32 +++-
src/test/regress/expected/rules.out   |  16 +-
src/test/regress/expected/updatable_views.out |  30 ++-
src/test/regress/expected/with.out|  10 +
src/test/regress/sql/merge.sql| 169 ++--
src/test/regress/sql/rowsecurity.sql  |  21 ++
src/test/regress/sql/rules.sql|   6 +-
src/test/regress/sql/updatable_views.sql  |   9 +-
src/test/regress/sql/with.sql |   8 +
src/tools/pgindent/typedefs.list  |   1 +
61 files changed, 1198 insertions(+), 216 deletions(-)

pgsql: Fix EXPLAIN output for subplans in MERGE.

2024-03-17 Thread Dean Rasheed

Fix EXPLAIN output for subplans in MERGE.

Given a subplan in a MERGE query, EXPLAIN would sometimes fail to
properly display expressions involving Params referencing variables in
other parts of the plan tree.

This would affect subplans outside the topmost join plan node, for
which expansion of Params would go via the top-level ModifyTable plan
node.  The problem was that "inner_tlist" for the ModifyTable node's
deparse_namespace was set to the join node's targetlist, but
"inner_plan" was set to the ModifyTable node itself, rather than the
join node, leading to incorrect results when descending to the
referenced variable.

Fix and backpatch to v15, where MERGE was introduced.

Discussion: 
https://postgr.es/m/CAEZATCWAv-sZuH%2BwG5xJ-%2BGt7qGNGX8wUQd3XYydMFDKgRB9nw%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/33e729c5148c3a697abc552621b34bdc5fd497ed

Modified Files
--
src/backend/utils/adt/ruleutils.c   | 21 +---
src/test/regress/expected/merge.out | 50 +
src/test/regress/sql/merge.sql  | 17 +
3 files changed, 79 insertions(+), 9 deletions(-)

pgsql: Fix EXPLAIN output for subplans in MERGE.

2024-03-17 Thread Dean Rasheed

Fix EXPLAIN output for subplans in MERGE.

Given a subplan in a MERGE query, EXPLAIN would sometimes fail to
properly display expressions involving Params referencing variables in
other parts of the plan tree.

This would affect subplans outside the topmost join plan node, for
which expansion of Params would go via the top-level ModifyTable plan
node.  The problem was that "inner_tlist" for the ModifyTable node's
deparse_namespace was set to the join node's targetlist, but
"inner_plan" was set to the ModifyTable node itself, rather than the
join node, leading to incorrect results when descending to the
referenced variable.

Fix and backpatch to v15, where MERGE was introduced.

Discussion: 
https://postgr.es/m/CAEZATCWAv-sZuH%2BwG5xJ-%2BGt7qGNGX8wUQd3XYydMFDKgRB9nw%40mail.gmail.com

Branch
--
REL_16_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/34c854b93fbec6d7b6dce4ee48c5b7459364a124

Modified Files
--
src/backend/utils/adt/ruleutils.c   | 21 +---
src/test/regress/expected/merge.out | 50 +
src/test/regress/sql/merge.sql  | 17 +
3 files changed, 79 insertions(+), 9 deletions(-)

pgsql: Fix EXPLAIN output for subplans in MERGE.

2024-03-17 Thread Dean Rasheed

Fix EXPLAIN output for subplans in MERGE.

Given a subplan in a MERGE query, EXPLAIN would sometimes fail to
properly display expressions involving Params referencing variables in
other parts of the plan tree.

This would affect subplans outside the topmost join plan node, for
which expansion of Params would go via the top-level ModifyTable plan
node.  The problem was that "inner_tlist" for the ModifyTable node's
deparse_namespace was set to the join node's targetlist, but
"inner_plan" was set to the ModifyTable node itself, rather than the
join node, leading to incorrect results when descending to the
referenced variable.

Fix and backpatch to v15, where MERGE was introduced.

Discussion: 
https://postgr.es/m/CAEZATCWAv-sZuH%2BwG5xJ-%2BGt7qGNGX8wUQd3XYydMFDKgRB9nw%40mail.gmail.com

Branch
--
REL_15_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/89ee14a2f22b2353a6cf9d193c54504b4d3131f5

Modified Files
--
src/backend/utils/adt/ruleutils.c   | 21 +---
src/test/regress/expected/merge.out | 50 +
src/test/regress/sql/merge.sql  | 17 +
3 files changed, 79 insertions(+), 9 deletions(-)

Re: Improving EXPLAIN's display of SubPlan nodes

2024-03-17 Thread Dean Rasheed

On Sat, 16 Mar 2024 at 17:25, Tom Lane  wrote:
>
> After looking at your draft some more, it occurred to me that we're
> not that far from getting rid of showing "$n" entirely in this
> context.  Instead of (subplan_name).$n, we could write something like
> (subplan_name).colN or (subplan_name).columnN or (subplan_name).fN,
> depending on your taste for verboseness.  "fN" would correspond to the
> names we assign to columns of anonymous record types, but it hasn't
> got much else to recommend it.  In the attached I used "colN";
> "columnN" would be my next choice.

Using the column number rather than the parameter index looks a lot
neater, especially in output with multiple subplans. Of those choices,
"colN" looks nicest, however...

I think it would be confusing if there were tables whose columns are
named "colN". In that case, a SQL qual like "t1.col2 = t2.col2" might
be output as something like "t1.col2 = (SubPlan 1).col3", since the
subplan's targetlist wouldn't necessarily just output the table
columns in order.

I actually think "$n" is not so bad (especially if we make n the
column number). The fact that it's qualified by the subplan name ought
to be sufficient to avoid it being confused with an external
parameter. Maybe there are other options, but I think it's important
to choose something that's unlikely to be confused with a real column
name.

Whatever name is chosen, I think we should still output "(returns
...)" on the subplan nodes. In a complex query there might be a lot of
output columns, and the expressions might be quite complex, making it
hard to see how many columns the subplan is returning. Besides,
without that, it might not be obvious to everyone what "colN" (or
whatever we settle on) means in places that refer to the subplan.

> You could also imagine trying to use the sub-SELECT's actual output
> column names, but I fear that would be ambiguous --- too often it'd
> be "?column?" or some other non-unique name.

Yeah, I think that's a non-starter, because the output column names
aren't necessarily unique.

> The attached proof-of-concept is incomplete: it fails to replace some
> $n occurrences with subplan references, as is visible in some of the
> test cases.  I believe your quick hack in get_parameter() is not
> correct in detail, but for the moment I didn't bother to debug it.

Yeah, that's exactly what it was, a quick hack. I just wanted to get
some output to see what it would look like in a few real cases.

Overall, I think this is heading in the right direction. I think we
just need a good way to say "the n'th output column of the subplan",
that can't be confused with anything else in the output.

Regards,
Dean

Re: MERGE ... RETURNING

2024-03-15 Thread Dean Rasheed

On Fri, 15 Mar 2024 at 11:06, Dean Rasheed  wrote:
>
> Updated patch attached.
>

I have gone over this patch again in detail, and I believe that the
code is in good shape. All review comments have been addressed, and
the only thing remaining is the syntax question.

To recap, this adds support for a single RETURNING list at the end of
a MERGE command, and a special MERGE_ACTION() function that may be
used in the RETURNING list to return the action command string
('INSERT', 'UPDATE', or 'DELETE') that was executed.

Looking for similar precedents in other databases, SQL Server uses a
slightly different (non-standard) syntax for MERGE, and uses "OUTPUT"
instead of "RETURNING" to return rows. But it does allow "$action" in
the output list, which is functionally equivalent to MERGE_ACTION():

https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver16#output_clause

In the future, we may choose to support the SQL standard syntax for
returning rows modified by INSERT, UPDATE, DELETE, and MERGE commands,
but I don't think that this patch needs to do that.

What this patch does is to make MERGE more consistent with INSERT,
UPDATE, and DELETE, by allowing RETURNING. And if the patch to add
support for returning OLD/NEW values [1] makes it in too, it will be
more powerful than the SQL standard syntax, since it will allow both
old and new values to be returned at the same time, in arbitrary
expressions.

So barring any further objections, I'd like to go ahead and get this
patch committed.

Regards,
Dean

[1] 
https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50agoe+pv...@mail.gmail.com

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2024-03-13 Thread Dean Rasheed

Rebased version attached.

Regards,
Dean
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
new file mode 100644
index f8f83d4..380d0c9
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -394,10 +394,14 @@
 conditions for each action are re-evaluated on the updated version of
 the row, starting from the first action, even if the action that had
 originally matched appears later in the list of actions.
-On the other hand, if the row is concurrently updated or deleted so
-that the join condition fails, then MERGE will
-evaluate the condition's NOT MATCHED actions next,
-and execute the first one that succeeds.
+On the other hand, if the row is concurrently updated so that the join
+condition fails, then MERGE will evaluate the
+command's NOT MATCHED BY SOURCE and
+NOT MATCHED [BY TARGET] actions next, and execute
+the first one of each kind that succeeds.
+If the row is concurrently deleted, then MERGE
+will evaluate the command's NOT MATCHED [BY TARGET]
+actions, and execute the first one that succeeds.
 If MERGE attempts an INSERT
 and a unique index is present and a duplicate row is concurrently
 inserted, then a uniqueness violation error is raised;
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
new file mode 100644
index e745fbd..6845f14
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -33,7 +33,8 @@ USING dat
 and when_clause is:
 
 { WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
-  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }
+  WHEN NOT MATCHED BY SOURCE [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
+  WHEN NOT MATCHED [BY TARGET] [ AND condition ] THEN { merge_insert | DO NOTHING } }
 
 and merge_insert is:
 
@@ -72,7 +73,9 @@ DELETE
from data_source to
the target table
producing zero or more candidate change rows.  For each candidate change
-   row, the status of MATCHED or NOT MATCHED
+   row, the status of MATCHED,
+   NOT MATCHED BY SOURCE,
+   or NOT MATCHED [BY TARGET]
is set just once, after which WHEN clauses are evaluated
in the order specified.  For each candidate change row, the first clause to
evaluate as true is executed.  No more than one WHEN
@@ -254,18 +257,40 @@ DELETE
   At least one WHEN clause is required.
  
  
+  The WHEN clause may specify WHEN MATCHED,
+  WHEN NOT MATCHED BY SOURCE, or
+  WHEN NOT MATCHED [BY TARGET].
+  Note that the SQL standard only defines
+  WHEN MATCHED and WHEN NOT MATCHED
+  (which is defined to mean no matching target row).
+  WHEN NOT MATCHED BY SOURCE is an extension to the
+  SQL standard, as is the option to append
+  BY TARGET to WHEN NOT MATCHED, to
+  make its meaning more explicit.
+ 
+ 
   If the WHEN clause specifies WHEN MATCHED
   and the candidate change row matches a row in the
-  target table,
-  the WHEN clause is executed if the
+  data_source to a row in the
+  target table, the WHEN clause is executed if the
   condition is
   absent or it evaluates to true.
  
  
-  Conversely, if the WHEN clause specifies
-  WHEN NOT MATCHED
-  and the candidate change row does not match a row in the
-  target table,
+  If the WHEN clause specifies
+  WHEN NOT MATCHED BY SOURCE and the candidate change
+  row represents a row in the target table that does not match a row in the
+  data_source, the
+  WHEN clause is executed if the
+  condition is
+  absent or it evaluates to true.
+ 
+ 
+  If the WHEN clause specifies
+  WHEN NOT MATCHED [BY TARGET] and the candidate change
+  row represents a row in the
+  data_source that does not
+  match a row in the target table,
   the WHEN clause is executed if the
   condition is
   absent or it evaluates to true.
@@ -285,7 +310,10 @@ DELETE
  
   A condition on a WHEN MATCHED clause can refer to columns
   in both the source and the target relations. A condition on a
-  WHEN NOT MATCHED clause can only refer to columns from
+  WHEN NOT MATCHED BY SOURCE clause can only refer to
+  columns from the target relation, since by definition there is no matching
+  source row. A condition on a WHEN NOT MATCHED [BY TARGET]
+  clause can only refer to columns from
   the source relation, since by definition there is no matching target row.
   Only the system attributes from the target table are accessible.
  
@@ -409,8 +437,10 @@ DELETE
   WHEN MATCHED clause, the expression can use values
   from the original row in the target table, and values from the
   data_source row.
-  If used in a WHEN NOT MATCHED clause, the
-  expression can use values from the
+  If used in a WHEN NOT MATCHED BY SOURCE clause, the
+

Re: MERGE ... RETURNING

2024-03-13 Thread Dean Rasheed

On Wed, 13 Mar 2024 at 08:58, Dean Rasheed  wrote:
>
> I think I'll go make those doc changes, and back-patch them
> separately, since they're not related to this patch.
>

OK, I've done that. Here is a rebased patch on top of that, with the
other changes you suggested.

Regards,
Dean
diff --git a/doc/src/sgml/dml.sgml b/doc/src/sgml/dml.sgml
new file mode 100644
index cbbc5e2..3d95bdb
--- a/doc/src/sgml/dml.sgml
+++ b/doc/src/sgml/dml.sgml
@@ -283,10 +283,15 @@ DELETE FROM products;
RETURNING
   
 
+  
+   MERGE
+   RETURNING
+  
+
   
Sometimes it is useful to obtain data from modified rows while they are
being manipulated.  The INSERT, UPDATE,
-   and DELETE commands all have an
+   DELETE, and MERGE commands all have an
optional RETURNING clause that supports this.  Use
of RETURNING avoids performing an extra database query to
collect the data, and is especially valuable when it would otherwise be
@@ -339,6 +344,21 @@ DELETE FROM products
 
   
 
+  
+   In a MERGE, the data available to RETURNING is
+   the content of the source row plus the content of the inserted, updated, or
+   deleted target row.  Since it is quite common for the source and target to
+   have many of the same columns, specifying RETURNING *
+   can lead to a lot of duplicated columns, so it is often more useful to
+   qualify it so as to return just the source or target row.  For example:
+
+MERGE INTO products p USING new_products n ON p.product_no = n.product_no
+  WHEN NOT MATCHED THEN INSERT VALUES (n.product_no, n.name, n.price)
+  WHEN MATCHED THEN UPDATE SET name = n.name, price = n.price
+  RETURNING p.*;
+
+  
+
   
If there are triggers () on the target table,
the data available to RETURNING is the row as modified by
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 0bb7aeb..0e2b888
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22421,6 +22421,85 @@ SELECT count(*) FROM sometable;
 
  
 
+ 
+  Merge Support Functions
+
+  
+   MERGE
+   RETURNING
+  
+
+  
+   PostgreSQL includes one merge support function
+   that may be used in the RETURNING list of a
+command to identify the action taken for each
+   row.
+  
+
+  
+   Merge Support Functions
+
+   
+
+ 
+  
+   Function
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   
+merge_action
+   
+   merge_action ( )
+   text
+  
+  
+   Returns the merge action command executed for the current row.  This
+   will be 'INSERT', 'UPDATE', or
+   'DELETE'.
+  
+ 
+
+   
+  
+
+  
+   Example:
+
+  
+
+  
+   Note that this function can only be used in the RETURNING
+   list of a MERGE command.  It is an error to use it in any
+   other part of a query.
+  
+
+ 
+
  
   Subquery Expressions
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 8c2f114..a81c17a
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -1442,9 +1442,9 @@
  to a client upon the
  completion of an SQL command, usually a
  SELECT but it can be an
- INSERT, UPDATE, or
- DELETE command if the RETURNING
- clause is specified.
+ INSERT, UPDATE,
+ DELETE, or MERGE command if the
+ RETURNING clause is specified.
 
 
  The fact that a result set is a relation means that a query can be used
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
new file mode 100644
index c2b9c6a..406834e
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1043,8 +1043,8 @@ INSERT INTO mytable VALUES (1,'one'), (2
 
 
 
- If the command does return rows (for example SELECT,
- or INSERT/UPDATE/DELETE
+ If the command does return rows (for example SELECT, or
+ INSERT/UPDATE/DELETE/MERGE
  with RETURNING), there are two ways to proceed.
  When the command will return at most one row, or you only care about
  the first row of output, write the command as usual but add
@@ -1172,6 +1172,7 @@ SELECT select_expressionsexpressions INTO STRICT target;
 UPDATE ... RETURNING expressions INTO STRICT target;
 DELETE ... RETURNING expressions INTO STRICT target;
+MERGE ... RETURNING expressions INTO STRICT target;
 
 
  where target can be a record variable, a row
@@ -1182,8 +1183,8 @@ DELETE ... RETURNING expres
  INTO clause) just as described above,
  and the plan is cached in the same way.
  This works for SELECT,
- INSERT/UPDATE/DELETE with
- RETURNING, and certain utility commands
+ INSERT/UPDATE/DELETE/MERGE
+ with RETURNING, and certain utility commands
  that return row sets, such as EXPLAIN.
  Except for the INTO clause, the SQL command is the same
  as it would be written outside PL/pgSQL.
@@ -1259,7 +1260,7 @@ END;
 
 
 
- For INSERT/UPDATE/DELETE with
+ For INSERT/UPDATE/DELETE/MERGE with
  R

pgsql: doc: Improve a couple of places in the MERGE docs.

2024-03-13 Thread Dean Rasheed

doc: Improve a couple of places in the MERGE docs.

In the synopsis, make the syntax for merge_update consistent with the
syntax for a plain UPDATE command. It was missing the optional "ROW"
keyword that can be used in a multi-column assignment, and the option
to assign from a multi-column subquery, both of which have been
supported by MERGE since it was introduced.

In the parameters section for the with_query parameter, mention that
WITH RECURSIVE isn't supported, since this is different from plain
INSERT, UPDATE, and DELETE commands. While at it, move that entry to
the top of the list, for consistency with the other pages.

Back-patch to v15, where MERGE was introduced.

Discussion: 
https://postgr.es/m/CAEZATCWoQyWkMFfu7JXXQr8dA6%3DgxjhYzgpuBP2oz0QoJTxGWw%40mail.gmail.com

Branch
--
REL_15_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/a875743ff40226f4096140efc506bd909da2a21f

Modified Files
--
doc/src/sgml/ref/merge.sgml | 36 
1 file changed, 28 insertions(+), 8 deletions(-)

pgsql: doc: Improve a couple of places in the MERGE docs.

2024-03-13 Thread Dean Rasheed

doc: Improve a couple of places in the MERGE docs.

In the synopsis, make the syntax for merge_update consistent with the
syntax for a plain UPDATE command. It was missing the optional "ROW"
keyword that can be used in a multi-column assignment, and the option
to assign from a multi-column subquery, both of which have been
supported by MERGE since it was introduced.

In the parameters section for the with_query parameter, mention that
WITH RECURSIVE isn't supported, since this is different from plain
INSERT, UPDATE, and DELETE commands. While at it, move that entry to
the top of the list, for consistency with the other pages.

Back-patch to v15, where MERGE was introduced.

Discussion: 
https://postgr.es/m/CAEZATCWoQyWkMFfu7JXXQr8dA6%3DgxjhYzgpuBP2oz0QoJTxGWw%40mail.gmail.com

Branch
--
REL_16_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/d4c573d8e81ed28ee335e9cb2115098b99d38e71

Modified Files
--
doc/src/sgml/ref/merge.sgml | 36 
1 file changed, 28 insertions(+), 8 deletions(-)

pgsql: doc: Improve a couple of places in the MERGE docs.

2024-03-13 Thread Dean Rasheed

doc: Improve a couple of places in the MERGE docs.

In the synopsis, make the syntax for merge_update consistent with the
syntax for a plain UPDATE command. It was missing the optional "ROW"
keyword that can be used in a multi-column assignment, and the option
to assign from a multi-column subquery, both of which have been
supported by MERGE since it was introduced.

In the parameters section for the with_query parameter, mention that
WITH RECURSIVE isn't supported, since this is different from plain
INSERT, UPDATE, and DELETE commands. While at it, move that entry to
the top of the list, for consistency with the other pages.

Back-patch to v15, where MERGE was introduced.

Discussion: 
https://postgr.es/m/CAEZATCWoQyWkMFfu7JXXQr8dA6%3DgxjhYzgpuBP2oz0QoJTxGWw%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/97d4262683acb586778af2b2a3721fa7cdc402f1

Modified Files
--
doc/src/sgml/ref/merge.sgml | 36 
1 file changed, 28 insertions(+), 8 deletions(-)

Re: MERGE ... RETURNING

2024-03-13 Thread Dean Rasheed

On Wed, 13 Mar 2024 at 06:44, jian he  wrote:
>
> 
> [ WITH with_query [, ...] ]
> MERGE INTO [ ONLY ] 
> here the "WITH" part should have "[ RECURSIVE ]"

Actually, no. MERGE doesn't support WITH RECURSIVE.

It's not entirely clear to me why though. I did a quick test, removing
that restriction in the parse analysis code, and it seemed to work
fine. Alvaro, do you remember why that restriction is there?

It's probably worth noting it in the docs, since it's different from
INSERT, UPDATE and DELETE. I think this would suffice:

with_query

  The WITH clause allows you to specify one or more
  subqueries that can be referenced by name in the MERGE
  query. See  and 
  for details.  Note that WITH RECURSIVE is not supported
  by MERGE.

And then maybe we can remove that restriction in HEAD, if there really
isn't any need for it anymore.

I also noticed that the "UPDATE SET ..." syntax in the synopsis is
missing a couple of options that are supported -- the optional "ROW"
keyword in the multi-column assignment syntax, and the syntax to
assign from a subquery that returns multiple columns. So this should
be updated to match update.sgml:

UPDATE SET { column_name
= { expression | DEFAULT
} |
 ( column_name [, ...] ) = [ ROW ] ( {
expression | DEFAULT } [,
...] ) |
 ( column_name [, ...] ) = ( sub-SELECT )
   } [, ...]

and then in the parameter section:

sub-SELECT

  A SELECT sub-query that produces as many output columns
  as are listed in the parenthesized column list preceding it.  The
  sub-query must yield no more than one row when executed.  If it
  yields one row, its column values are assigned to the target columns;
  if it yields no rows, NULL values are assigned to the target columns.
  The sub-query can refer to values from the original row in the
target table,
  and values from the data_source.

(basically copied verbatim from update.sgml)

I think I'll go make those doc changes, and back-patch them
separately, since they're not related to this patch.

Regards,
Dean

Broken EXPLAIN output for SubPlan in MERGE

2024-03-12 Thread Dean Rasheed

While playing around with EXPLAIN and SubPlans, I noticed that there's
a bug in how this is handled for MERGE. For example:

drop table if exists src, tgt, ref;
create table src (a int, b text);
create table tgt (a int, b text);
create table ref (a int);

explain (verbose, costs off)
merge into tgt t
  using (select (select r.a from ref r where r.a = s.a) a, b from src s) s
  on t.a = s.a
  when not matched then insert values (s.a, s.b);

QUERY PLAN
---
 Merge on public.tgt t
   ->  Merge Left Join
 Output: t.ctid, s.a, s.b, s.ctid
 Merge Cond: (((SubPlan 1)) = t.a)
 ->  Sort
   Output: s.a, s.b, s.ctid, ((SubPlan 1))
   Sort Key: ((SubPlan 1))
   ->  Seq Scan on public.src s
 Output: s.a, s.b, s.ctid, (SubPlan 1)
 SubPlan 1
   ->  Seq Scan on public.ref r
 Output: r.a
 Filter: (r.a = s.a)
 ->  Sort
   Output: t.ctid, t.a
   Sort Key: t.a
   ->  Seq Scan on public.tgt t
 Output: t.ctid, t.a
   SubPlan 2
 ->  Seq Scan on public.ref r_1
   Output: r_1.a
   Filter: (r_1.a = t.ctid)

The final filter condition "(r_1.a = t.ctid)" is incorrect, and should
be "(r_1.a = s.a)".

What's happening is that the right hand side of that filter expression
is an input Param node which get_parameter() tries to display by
calling find_param_referent() and then drilling down through the
ancestor node (the ModifyTable node) to try to find the real name of
the variable (s.a).

However, that isn't working properly for MERGE because the inner_plan
and inner_tlist of the corresponding deparse_namespace aren't set
correctly. Actually the inner_tlist is correct, but the inner_plan is
set to the ModifyTable node, whereas it needs to be the outer child
node -- in a MERGE, any references to the source relation will be
INNER_VAR references to the targetlist of the join node immediately
under the ModifyTable node.

So I think we want to do something like the attached.

Regards,
Dean
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
new file mode 100644
index 2a1ee69..2231752
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -4988,8 +4988,11 @@ set_deparse_plan(deparse_namespace *dpns
 	 * For a WorkTableScan, locate the parent RecursiveUnion plan node and use
 	 * that as INNER referent.
 	 *
-	 * For MERGE, make the inner tlist point to the merge source tlist, which
-	 * is same as the targetlist that the ModifyTable's source plan provides.
+	 * For MERGE, pretend the ModifyTable's source plan (its outer plan) is
+	 * INNER referent.  This is the join from the target relation to the data
+	 * source, and all INNER_VAR Vars in other parts of the query refer to its
+	 * targetlist.
+	 *
 	 * For ON CONFLICT .. UPDATE we just need the inner tlist to point to the
 	 * excluded expression's tlist. (Similar to the SubqueryScan we don't want
 	 * to reuse OUTER, it's used for RETURNING in some modify table cases,
@@ -5004,17 +5007,17 @@ set_deparse_plan(deparse_namespace *dpns
 		dpns->inner_plan = find_recursive_union(dpns,
 (WorkTableScan *) plan);
 	else if (IsA(plan, ModifyTable))
-		dpns->inner_plan = plan;
-	else
-		dpns->inner_plan = innerPlan(plan);
-
-	if (IsA(plan, ModifyTable))
 	{
 		if (((ModifyTable *) plan)->operation == CMD_MERGE)
-			dpns->inner_tlist = dpns->outer_tlist;
+			dpns->inner_plan = outerPlan(plan);
 		else
-			dpns->inner_tlist = ((ModifyTable *) plan)->exclRelTlist;
+			dpns->inner_plan = plan;
 	}
+	else
+		dpns->inner_plan = innerPlan(plan);
+
+	if (IsA(plan, ModifyTable) && ((ModifyTable *) plan)->operation == CMD_INSERT)
+		dpns->inner_tlist = ((ModifyTable *) plan)->exclRelTlist;
 	else if (dpns->inner_plan)
 		dpns->inner_tlist = dpns->inner_plan->targetlist;
 	else
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
new file mode 100644
index e3ebf46..1a6f6ad
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -1473,6 +1473,56 @@ WHEN MATCHED AND t.a < 10 THEN
 
 DROP TABLE ex_msource, ex_mtarget;
 DROP FUNCTION explain_merge(text);
+-- EXPLAIN SubPlans and InitPlans
+CREATE TABLE src (a int, b int, c int, d int);
+CREATE TABLE tgt (a int, b int, c int, d int);
+CREATE TABLE ref (ab int, cd int);
+EXPLAIN (verbose, costs off)
+MERGE INTO tgt t
+USING (SELECT *, (SELECT count(*) FROM ref r
+   WHERE r.ab = s.a + s.b
+ AND r.cd = s.c - s.d) cnt
+ FROM src s) s
+ON t.a = s.a AND t.b < s.cnt
+WHEN MATCHED AND t.c > s.cnt THEN
+  UPDATE SET (b, c) = (SELECT s.b, s.cnt);
+ QUERY PLAN

Re: Improving EXPLAIN's display of SubPlan nodes

2024-03-09 Thread Dean Rasheed

On Fri, 16 Feb 2024 at 19:39, Tom Lane  wrote:
>
> > So now I'm thinking that we do have enough detail in the present
> > proposal, and we just need to think about whether there's some
> > nicer way to present it than the particular spelling I used here.
>

One thing that concerns me about making even greater use of "$n" is
the potential for confusion with generic plan parameters. Maybe it's
always possible to work out which is which from context, but still it
looks messy:

drop table if exists foo;
create table foo(id int, x int, y int);

explain (verbose, costs off, generic_plan)
select row($3,$4) = (select x,y from foo where id=y) and
   row($1,$2) = (select min(x+y),max(x+y) from generate_series(1,3) x)
from generate_series(1,3) y;

  QUERY PLAN
---
 Function Scan on pg_catalog.generate_series y
   Output: (($3 = $0) AND ($4 = $1) AND (ROWCOMPARE (($1 = $3) AND ($2
= $4)) FROM SubPlan 2 (returns $3,$4)))
   Function Call: generate_series(1, 3)
   InitPlan 1 (returns $0,$1)
 ->  Seq Scan on public.foo
   Output: foo.x, foo.y
   Filter: (foo.id = foo.y)
   SubPlan 2 (returns $3,$4)
 ->  Aggregate
   Output: min((x.x + y.y)), max((x.x + y.y))
   ->  Function Scan on pg_catalog.generate_series x
 Output: x.x
 Function Call: generate_series(1, 3)

Another odd thing about that is the inconsistency between how the
SubPlan and InitPlan expressions are displayed. I think "ROWCOMPARE"
is really just an internal detail that could be omitted without losing
anything. But the "FROM SubPlan ..." is useful to work out where it's
coming from. Should it also output "FROM InitPlan ..."? I think that
would risk making it harder to read.

Another possibility is to put the SubPlan and InitPlan names inline,
rather than outputting "FROM SubPlan ...". I had a go at hacking that
up and this was the result:

  QUERY PLAN
---
 Function Scan on pg_catalog.generate_series y
   Output: (($3 = (InitPlan 1).$0) AND ($4 = (InitPlan 1).$1) AND
((($1 = (SubPlan 2).$3) AND ($2 = (SubPlan 2).$4
   Function Call: generate_series(1, 3)
   InitPlan 1 (returns $0,$1)
 ->  Seq Scan on public.foo
   Output: foo.x, foo.y
   Filter: (foo.id = foo.y)
   SubPlan 2 (returns $3,$4)
 ->  Aggregate
   Output: min((x.x + y.y)), max((x.x + y.y))
   ->  Function Scan on pg_catalog.generate_series x
 Output: x.x
 Function Call: generate_series(1, 3)

It's a little more verbose in this case, but in a lot of other cases
it ended up being more compact.

The code is a bit messy, but I think the regression test output
(attached) is clearer and easier to interpret. SubPlans and InitPlans
are displayed consistently, and it's easier to distinguish
SubPlan/InitPlan outputs from external parameters.

There are a few more regression test changes, corresponding to cases
where InitPlans are referenced, such as:

  Seq Scan on document
-   Filter: ((dlevel <= $0) AND f_leak(dtitle))
+   Filter: ((dlevel <= (InitPlan 1).$0) AND f_leak(dtitle))
InitPlan 1 (returns $0)
  ->  Index Scan using uaccount_pkey on uaccount
Index Cond: (pguser = CURRENT_USER)

but I think that's useful extra clarification.

Regards,
Dean
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
new file mode 100644
index c355e8f..f0ff936
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3021,7 +3021,7 @@ select exists(select 1 from pg_enum), su
 QUERY PLAN
 --
  Foreign Scan
-   Output: $0, (sum(ft1.c1))
+   Output: (InitPlan 1).$0, (sum(ft1.c1))
Relations: Aggregate on (public.ft1)
Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1"
InitPlan 1 (returns $0)
@@ -3039,7 +3039,7 @@ select exists(select 1 from pg_enum), su
 QUERY PLAN 
 ---
  GroupAggregate
-   Output: $0, sum(ft1.c1)
+   Output: (InitPlan 1).$0, sum(ft1.c1)
InitPlan 1 (returns $0)
  ->  Seq Scan on pg_catalog.pg_enum
->  Foreign Scan on public.ft1
@@ -3264,14 +3264,14 @@ select sum(c1) filter (where (c1 / c1) *

 explain (verbose, costs off)
 select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1;
-QUERY PLAN 

+ QUERY PLAN

Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2024-03-08 Thread Dean Rasheed

On Thu, 7 Mar 2024 at 17:32, Alvaro Herrera  wrote:
>
> This seems to work okay.
>

Yes, this looks good. I tested it against CREATE TABLE ... LIKE, and
it worked as expected. It might be worth adding a test case for that,
to ensure that it doesn't get broken in the future. Do we also want a
test case that does what pg_dump would do:

 - Add a NOT NULL constraint
 - Add a deferrable PK constraint
 - Drop the NOT NULL constraint
 - Check the column is still not nullable

Looking at rel.h, I think that the new field should probably come
after rd_pkindex, under the comment "data managed by
RelationGetIndexList", and have its own comment.

Also, if I'm nitpicking, the new field and local variables should use
the term "deferrable" rather than "deferred". A DEFERRABLE constraint
can be set to be either DEFERRED or IMMEDIATE within a transaction,
but "deferrable" is the right term to use to describe the persistent
property of an index/constraint that can be deferred. (The same
objection applies to the field name "indimmediate", but it's too late
to change that.)

Also, for neatness/consistency, the new field should probably be reset
in load_relcache_init_file(), alongside rd_pkindex, though I don't
think it can matter in practice.

Regards,
Dean

Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases

2024-03-08 Thread Dean Rasheed

On Wed, 6 Mar 2024 at 22:22, Nathan Bossart  wrote:
>
> Thanks for taking a look.  I updated the synopsis sections in v3.

OK, that looks good. The vacuumdb synopsis in particular looks a lot
better now that "-N | --exclude-schema" is on its own line, because it
was hard to read previously, and easy to mistakenly think that -n
could be combined with -N.

If I'm nitpicking, "[--verbose | -v]" in the clusterdb synopsis should
be replaced with "[option...]", like the other commands, because there
are other general-purpose options like --quiet and --echo.

> I also spent some more time on the reindexdb patch (0003).  I previously
> had decided to restrict combinations of tables, schemas, and indexes
> because I felt it was "ambiguous and inconsistent with vacuumdb," but
> looking closer, I think that's the wrong move.  reindexdb already supports
> such combinations, which it interprets to mean it should reindex each
> listed object.  So, I removed that change in v3.

Makes sense.

> Even though reindexdb allows combinations of tables, schema, and indexes,
> it doesn't allow combinations of "system catalogs" and other objects, and
> it's not clear why.  In v3, I've removed this restriction, which ended up
> simplifying the 0003 patch a bit.  Like combinations of tables, schemas,
> and indexes, reindexdb will now interpret combinations that include
> --system to mean it should reindex each listed object as well as the system
> catalogs.

OK, that looks useful, especially given that most people will still
probably use this against a single database, and it's making that more
flexible.

I think this is good to go.

Regards,
Dean

Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2024-03-07 Thread Dean Rasheed

On Thu, 7 Mar 2024 at 13:00, Alvaro Herrera  wrote:
>
> So I think the original developers of REPLICA IDENTITY had the right
> idea here (commit 07cacba983ef), and we mustn't change this aspect,
> because it'll lead to data corruption in replication.  Using a deferred
> PK for DDL considerations seems OK, but it seems certain that for actual
> data replication it's going to be a disaster.
>

Yes, that makes sense. If I understand correctly though, the
replication code uses relation->rd_replidindex (not
relation->rd_pkindex), although sometimes it's the same thing. So can
we get away with making sure that RelationGetIndexList() doesn't set
relation->rd_replidindex to a deferrable PK, while still allowing
relation->rd_pkindex to be one?

Regards,
Dean

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-03-07 Thread Dean Rasheed

On Tue, 5 Mar 2024 at 13:55, Dean Rasheed  wrote:
>
> >  If I only execute merge , I will get the following error:
> > merge into tgt a using src1 c on  a.a = c.a when matched then update 
> > set b = c.b when not matched then insert (a,b) values(c.a,c.b);  -- excute 
> > fail
> > ERROR:  MERGE command cannot affect row a second time
> > HIINT:  Ensure that not more than one source row matches any one target 
> > row.
> >
> >  But when I execute the update and merge concurrently, I will get the 
> > following result set.
>
> Yes, this should still produce that error, even after a concurrent update.
>
> My initial reaction is that neither of those blocks of code is
> entirely correct, and that they should both be doing both of those
> checks. I.e., something like the attached (which probably needs some
> additional test cases).
>

OK, I've pushed and back-patched that fix for this issue, after adding
some tests (nice catch, by the way!).

That wasn't related to the original issue though, so the problem with
UNION ALL still remains to be fixed. The patch from [1] looks
promising (for HEAD at least), but it really needs more pairs of eyes
on it (bearing in mind that it's just a rough proof-of-concept patch
at this stage).

[1] 
https://www.postgresql.org/message-id/CAEZATCVa-mgPuOdgd8%2BYVgOJ4wgJuhT2mJznbj_tmsGAP8hHJw%40mail.gmail.com

Regards,
Dean

pgsql: Fix handling of self-modified tuples in MERGE.

2024-03-07 Thread Dean Rasheed

Fix handling of self-modified tuples in MERGE.

When an UPDATE or DELETE action in MERGE returns TM_SelfModified,
there are 2 possible causes:

1). The target tuple was already updated or deleted by the current
command. This can happen if the target row joins to more than one
source row, and the SQL standard explicitly says that this must be
an error.

2). The target tuple was already updated or deleted by a later command
in the current transaction. This can happen if the tuple is
modified by a BEFORE trigger or a volatile function used in the
query, and should be an error for the same reason that it is in a
plain UPDATE or DELETE command.

In MERGE's primary error handling block, it failed to check for (2),
causing it to return a misleading error message in such cases.

In the secondary error handling block, following a concurrent update
from another session, it failed to check for (1), causing it to
silently ignore target rows joined to more than one source row,
instead of reporting an error.

Fix this, and add tests for both of these cases.

Per report from Wenjiang Zhang. Back-patch to v15, where MERGE was
introduced.

Discussion: 
https://postgr.es/m/tencent_41DE0FF443FE14B94A5898D373792109E408%40qq.com

Branch
--
REL_15_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/b5c645d2a2652070ac128a42cf0d7b34403a6cfe

Modified Files
--
src/backend/executor/nodeModifyTable.c   | 46 +++-
src/test/isolation/expected/merge-update.out | 43 ++
src/test/isolation/specs/merge-update.spec   | 13 
src/test/regress/expected/triggers.out   |  8 +
src/test/regress/sql/triggers.sql|  4 +++
5 files changed, 106 insertions(+), 8 deletions(-)

pgsql: Fix handling of self-modified tuples in MERGE.

2024-03-07 Thread Dean Rasheed

Fix handling of self-modified tuples in MERGE.

When an UPDATE or DELETE action in MERGE returns TM_SelfModified,
there are 2 possible causes:

1). The target tuple was already updated or deleted by the current
command. This can happen if the target row joins to more than one
source row, and the SQL standard explicitly says that this must be
an error.

2). The target tuple was already updated or deleted by a later command
in the current transaction. This can happen if the tuple is
modified by a BEFORE trigger or a volatile function used in the
query, and should be an error for the same reason that it is in a
plain UPDATE or DELETE command.

In MERGE's primary error handling block, it failed to check for (2),
causing it to return a misleading error message in such cases.

In the secondary error handling block, following a concurrent update
from another session, it failed to check for (1), causing it to
silently ignore target rows joined to more than one source row,
instead of reporting an error.

Fix this, and add tests for both of these cases.

Per report from Wenjiang Zhang. Back-patch to v15, where MERGE was
introduced.

Discussion: 
https://postgr.es/m/tencent_41DE0FF443FE14B94A5898D373792109E408%40qq.com

Branch
--
REL_16_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/dd73d10adf0048d2d8dc4e94cd64ce5e5ff45a4b

Modified Files
--
src/backend/executor/nodeModifyTable.c   | 46 +++-
src/test/isolation/expected/merge-update.out | 43 ++
src/test/isolation/specs/merge-update.spec   | 13 
src/test/regress/expected/triggers.out   |  8 +
src/test/regress/sql/triggers.sql|  4 +++
5 files changed, 106 insertions(+), 8 deletions(-)

pgsql: Fix handling of self-modified tuples in MERGE.

2024-03-07 Thread Dean Rasheed

Fix handling of self-modified tuples in MERGE.

When an UPDATE or DELETE action in MERGE returns TM_SelfModified,
there are 2 possible causes:

1). The target tuple was already updated or deleted by the current
command. This can happen if the target row joins to more than one
source row, and the SQL standard explicitly says that this must be
an error.

2). The target tuple was already updated or deleted by a later command
in the current transaction. This can happen if the tuple is
modified by a BEFORE trigger or a volatile function used in the
query, and should be an error for the same reason that it is in a
plain UPDATE or DELETE command.

In MERGE's primary error handling block, it failed to check for (2),
causing it to return a misleading error message in such cases.

In the secondary error handling block, following a concurrent update
from another session, it failed to check for (1), causing it to
silently ignore target rows joined to more than one source row,
instead of reporting an error.

Fix this, and add tests for both of these cases.

Per report from Wenjiang Zhang. Back-patch to v15, where MERGE was
introduced.

Discussion: 
https://postgr.es/m/tencent_41DE0FF443FE14B94A5898D373792109E408%40qq.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/29ef1dd19b4f3eb54569b2eece4a8a65034a2216

Modified Files
--
src/backend/executor/nodeModifyTable.c   | 46 +++-
src/test/isolation/expected/merge-update.out | 43 ++
src/test/isolation/specs/merge-update.spec   | 13 
src/test/regress/expected/triggers.out   |  8 +
src/test/regress/sql/triggers.sql|  4 +++
5 files changed, 106 insertions(+), 8 deletions(-)

Re: MERGE ... RETURNING

2024-03-06 Thread Dean Rasheed

On Wed, 6 Mar 2024 at 08:51, Peter Eisentraut  wrote:
>
> For comparison with standard SQL (see ):
>
> For an INSERT you could write
>
> SELECT whatever FROM NEW TABLE (INSERT statement here)
>
> or for an DELETE
>
> SELECT whatever FROM OLD TABLE (DELETE statement here)
>
> And for an UPDATE could can pick either OLD or NEW.
>

Thanks, that's very interesting. I hadn't seen that syntax before.

Over on [1], I have a patch in the works that extends RETURNING,
allowing it to return OLD.colname, NEW.colname, OLD.*, and NEW.*. It
looks like this new SQL standard syntax could be built on top of that
(perhaps by having the rewriter turn queries of the above form into
CTEs).

However, the RETURNING syntax is more powerful, because it allows OLD
and NEW to be used together in arbitrary expressions, for example:

RETURNING ..., NEW.val - OLD.val AS delta, ...

> > The current implementation uses a special function MERGING (a
> > grammatical construct without an OID that parses into a new MergingFunc
> > expr), which takes keywords ACTION or CLAUSE_NUMBER in the argument
> > positions. That's not totally unprecedented in SQL -- the XML and JSON
> > functions are kind of similar. But it's different in the sense that
> > MERGING is also context-sensitive: grammatically, it fits pretty much
> > anywhere a function fits, but then gets rejected at parse analysis time
> > (or perhaps even execution time?) if it's not called from the right
> > place.
>
> An analogy here might be that MATCH_RECOGNIZE (row-pattern recognition)
> has a magic function MATCH_NUMBER() that can be used inside that clause.
>   So a similar zero-argument magic function might make sense.  I don't
> like the MERGING(ACTION) syntax, but something like MERGE_ACTION() might
> make sense.  (This is just in terms of what kind of syntax might be
> palatable.  Depending on where the syntax of the overall clause ends up,
> we might not need it (see above).)
>

It could be that having the ability to return OLD and NEW values, as
in [1], is sufficient for use in MERGE, to identify the action
performed. However, I still think that dedicated functions would be
useful, if we can agree on names/syntax.

I think that I prefer the names MERGE_ACTION() and
MERGE_CLAUSE_NUMBER() from an aesthetic point of view, but it requires
2 new COL_NAME_KEYWORD keywords. Maybe that's OK, I don't know.

Alternatively, we could avoid adding new keywords by going back to
making these regular functions, as they were in an earlier version of
this patch, and then use some special-case code during parse analysis
to turn them into MergeFunc nodes (not quite a complete revert back to
an earlier version of the patch, but not far off).

Regards,
Dean

[1] 
https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50agoe+pv...@mail.gmail.com

Re: Proposal to include --exclude-extension Flag in pg_dump

2024-03-06 Thread Dean Rasheed

On Mon, 1 Jan 2024 at 13:28, Ayush Vatsa  wrote:
>
> According to the documentation of pg_dump when the --extension option is not 
> specified, all non-system extensions in the target database will get dumped.
> > Why do we need to explicitly exclude extensions?
> Hence to include only a few we use --extension, but to exclude a few I am 
> proposing --exclude-extension.
>

Thanks for working on this. It seems like a useful feature to have.
The code changes look good, and it appears to work as expected.

In my opinion the order of options in pg_dump.sgml and the --help
output is fine. Keeping this new option together with -e/--extension
makes it easier to see, while otherwise it would get lost much further
down. Other opinions on that might differ though.

There are a couple of things missing from the patch, that should be added:

1). The --filter option should be extended to support "exclude
extension pattern" lines in the filter file. That syntax is already
accepted, but it throws a not-supported error, but it's hopefully not
too hard to make that work now.

2). It ought to have some tests in the test script.

Regards,
Dean

Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases

2024-03-05 Thread Dean Rasheed

On Tue, 5 Mar 2024 at 02:22, Nathan Bossart  wrote:
>
> I saw that this thread was referenced elsewhere [0], so I figured I'd take
> a fresh look.  From a quick glance, I'd say 0001 and 0002 are pretty
> reasonable and could probably be committed for v17.
>

I'm not sure how useful these changes are, but I don't really object.
You need to update the synopsis section of the docs though.

Regards,
Dean

Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2024-03-05 Thread Dean Rasheed

On Tue, 5 Mar 2024 at 12:36, Alvaro Herrera  wrote:
>
> Yeah.  As I said upthread, a good fix seems to require no longer relying
> on RelationGetIndexAttrBitmap to obtain the columns in the primary key,
> because that function does not include deferred primary keys.  I came up
> with the attached POC, which seems to fix the reported problem, but of
> course it needs more polish, a working test case, and verifying whether
> the new function should be used in more places -- in particular, whether
> it can be used to revert the changes to RelationGetIndexList that
> b0e96f311985 did.
>

Looking at the other places that call RelationGetIndexAttrBitmap()
with INDEX_ATTR_BITMAP_PRIMARY_KEY, they all appear to want to include
deferrable PKs, since they are relying on the result to see which
columns are not nullable.

So there are other bugs here. For example:

CREATE TABLE foo (id int PRIMARY KEY DEFERRABLE, val text);
CREATE TABLE bar (LIKE foo);

now fails to mark bar.id as not nullable, whereas prior to
b0e96f311985 it would have been.

So I think RelationGetIndexAttrBitmap() should include deferrable PKs,
but not all the changes made to RelationGetIndexList() by b0e96f311985
need reverting.

Regards,
Dean

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-03-05 Thread Dean Rasheed

[cc'ing Alvaro]

On Tue, 5 Mar 2024 at 10:04, zwj  wrote:
>
>  If I only execute merge , I will get the following error:
> merge into tgt a using src1 c on  a.a = c.a when matched then update set 
> b = c.b when not matched then insert (a,b) values(c.a,c.b);  -- excute fail
> ERROR:  MERGE command cannot affect row a second time
> HIINT:  Ensure that not more than one source row matches any one target 
> row.
>
>  But when I execute the update and merge concurrently, I will get the 
> following result set.
>

Yes, this should still produce that error, even after a concurrent update.

In the first example without a concurrent update, when we reach the
tgt.a = 3 row the second time, ExecUpdateAct() returns TM_SelfModified
and we do this:

case TM_SelfModified:

/*
 * The SQL standard disallows this for MERGE.
 */
if (TransactionIdIsCurrentTransactionId(context->tmfd.xmax))
ereport(ERROR,
(errcode(ERRCODE_CARDINALITY_VIOLATION),
/* translator: %s is a SQL command name */
 errmsg("%s command cannot affect row a second time",
"MERGE"),
 errhint("Ensure that not more than one source row
matches any one target row.")));
/* This shouldn't happen */
elog(ERROR, "attempted to update or delete invisible tuple");
break;

whereas in the second case, after a concurrent update, ExecUpdateAct()
returns TM_Updated, we attempt to lock the tuple prior to running EPQ,
and table_tuple_lock() returns TM_SelfModified, which does this:

case TM_SelfModified:

/*
 * This can be reached when following an update
 * chain from a tuple updated by another session,
 * reaching a tuple that was already updated in
 * this transaction. If previously modified by
 * this command, ignore the redundant update,
 * otherwise error out.
 *
 * See also response to TM_SelfModified in
 * ExecUpdate().
 */
if (context->tmfd.cmax != estate->es_output_cid)
ereport(ERROR,
(errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
 errmsg("tuple to be updated or deleted
was already modified by an operation triggered by the current
command"),
 errhint("Consider using an AFTER trigger
instead of a BEFORE trigger to propagate changes to other rows.")));
return false;

My initial reaction is that neither of those blocks of code is
entirely correct, and that they should both be doing both of those
checks. I.e., something like the attached (which probably needs some
additional test cases).

Regards,
Dean
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
new file mode 100644
index fcb6133..9351fbc
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3001,8 +3001,29 @@ lmerge_matched:
 			case TM_SelfModified:

 /*
- * The SQL standard disallows this for MERGE.
+ * The target tuple was already updated or deleted by the
+ * current command, or by a later command in the current
+ * transaction.  The former case is explicitly disallowed by
+ * the SQL standard for MERGE, which insists that the MERGE
+ * join condition should not join a target row to more than
+ * one source row.
+ *
+ * The latter case arises if the tuple is modified by a
+ * command in a BEFORE trigger, or perhaps by a command in a
+ * volatile function used in the query.  In such situations we
+ * should not ignore the MERGE action, but it is equally
+ * unsafe to proceed.  We don't want to discard the original
+ * MERGE action while keeping the triggered actions based on
+ * it; and it would be no better to allow the original MERGE
+ * action while discarding the updates that it triggered.  So
+ * throwing an error is the only safe course.
  */
+if (context->tmfd.cmax != estate->es_output_cid)
+	ereport(ERROR,
+			(errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
+			 errmsg("tuple to be updated or deleted was already modified by an operation triggered by the current command"),
+			 errhint("Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows.")));
+
 if (TransactionIdIsCurrentTransactionId(context->tmfd.xmax))
 	ereport(ERROR,
 			(errcode(ERRCODE_CARDINALITY_VIOLATION),
@@ -3010,6 +3031,7 @@ lmerge_matched:
 			 errmsg("%s command cannot affect row a second time",
 	"MERGE"),
 			 errhint("Ensure that not more than one source row matches any one target row.")));
+
 /* This shouldn't happen */
 elog(ERROR, "attempted to update or delete invisible

Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2024-03-04 Thread Dean Rasheed

On Mon, 4 Mar 2024 at 12:34, Aleksander Alekseev
 wrote:
>
> > This was an experimental patch, I was looking for the comment on the 
> > proposed
> > approach -- whether we could simply skip the throwaway NOT NULL constraint 
> > for
> > deferred PK constraint.  Moreover,  skip that for any PK constraint.
>
> I confirm that the patch fixes the bug. All the tests pass. Looks like
> RfC to me.
>

I don't think that this is the right fix. ISTM that the real issue is
that dropping a NOT NULL constraint should not mark the column as
nullable if it is part of a PK, whether or not that PK is deferrable
-- a deferrable PK still marks a  column as not nullable.

The reason pg_dump creates these throwaway NOT NULL constraints is to
avoid a table scan to check for NULLs when the PK is later created.
That rationale still applies to deferrable PKs, so we still want the
throwaway NOT NULL constraints in that case, otherwise we'd be hurting
performance of restore.

Regards,
Dean

pgsql: Fix doc omission for MERGE into updatable views.

2024-03-04 Thread Dean Rasheed

Fix doc omission for MERGE into updatable views.

Commit 5f2e179bd3 missed one place in rules.sgml that should have
mentioned MERGE. Also, be more specific when saying that MERGE doesn't
support rules, since it does support SELECT rules.

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/8545b28679a2ec2aa66ad2ec81f17019dab5d780

Modified Files
--
doc/src/sgml/rules.sgml | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)

Re: Supporting MERGE on updatable views

2024-02-29 Thread Dean Rasheed

On Thu, 29 Feb 2024 at 09:50, Alvaro Herrera  wrote:
>
> By all means let's get the feature out there.  It's not a frequently
> requested thing but it does seem to come up.
>

Pushed. Thanks for reviewing.

Regards,
Dean

pgsql: Support MERGE into updatable views.

2024-02-29 Thread Dean Rasheed

Support MERGE into updatable views.

This allows the target relation of MERGE to be an auto-updatable or
trigger-updatable view, and includes support for WITH CHECK OPTION,
security barrier views, and security invoker views.

A trigger-updatable view must have INSTEAD OF triggers for every type
of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command.
An auto-updatable view must not have any INSTEAD OF triggers. Mixing
auto-update and trigger-update actions (i.e., having a partial set of
INSTEAD OF triggers) is not supported.

Rule-updatable views are also not supported, since there is no
rewriter support for non-SELECT rules with MERGE operations.

Dean Rasheed, reviewed by Jian He and Alvaro Herrera.

Discussion: 
https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zo...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/5f2e179bd31e5f5803005101eb12a8d7bf8db8f3

Modified Files
--
doc/src/sgml/ref/create_view.sgml |  42 +-
doc/src/sgml/ref/merge.sgml   |  22 +-
doc/src/sgml/rules.sgml   |  40 +-
src/backend/commands/copyfrom.c   |   2 +-
src/backend/executor/execMain.c   |  51 +--
src/backend/executor/execPartition.c  |   4 +-
src/backend/executor/nodeModifyTable.c| 138 +--
src/backend/optimizer/prep/prepjointree.c |  20 +-
src/backend/optimizer/util/appendinfo.c   |   3 +-
src/backend/parser/parse_merge.c  |  21 +-
src/backend/rewrite/rewriteHandler.c  | 403 +--
src/backend/rewrite/rewriteManip.c|  20 +-
src/bin/psql/tab-complete.c   |   1 +
src/include/catalog/catversion.h  |   2 +-
src/include/executor/executor.h   |   3 +-
src/include/nodes/parsenodes.h|   8 +
src/include/rewrite/rewriteHandler.h  |   6 +
src/test/regress/expected/merge.out   |  10 -
src/test/regress/expected/rules.out   |  12 +
src/test/regress/expected/updatable_views.out | 553 +-
src/test/regress/sql/merge.sql|   9 -
src/test/regress/sql/rules.sql|  13 +
src/test/regress/sql/updatable_views.sql  | 285 -
23 files changed, 1380 insertions(+), 288 deletions(-)

pgsql: Remove field UpdateContext->updated in nodeModifyTable.c

2024-02-29 Thread Dean Rasheed

Remove field UpdateContext->updated in nodeModifyTable.c

This field has been redundant ever since it was added by commit
25e777cf8e, which split up ExecUpdate() and ExecDelete() into reusable
pieces. The only place that reads it is ExecMergeMatched(), if the
result from ExecUpdateAct() is TM_Ok. However, all paths through
ExecUpdateAct() that return TM_Ok also set this field to true, so the
return status by itself is sufficient to tell if the update happened.

Removing this field is a modest simplification, and it brings the
UPDATE path in ExecMergeMatched() more into line with ExecUpdate(),
ensuring that ExecUpdateEpilogue() is always called if ExecUpdateAct()
returns TM_Ok, reducing the chance of bugs.

Dean Rasheed, reviewed by Alvaro Herrera.

Discussion: 
https://postgr.es/m/CAEZATCWGGmigGBzLHkJm5Ccv2mMxXmwi3%2Buq0yhwDHm-tsvSLg%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/362de947cd7e8c826d9b3c5dc2590348263ed3c1

Modified Files
--
src/backend/executor/nodeModifyTable.c | 6 +-
1 file changed, 1 insertion(+), 5 deletions(-)

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-02-28 Thread Dean Rasheed

On Wed, 28 Feb 2024 at 09:16, jian he  wrote:
>
> + oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
> +
> + node->as_epq_tupdesc = lookup_rowtype_tupdesc_copy(tupType, tupTypmod);
> +
> + ExecAssignExprContext(estate, >ps);
> +
> + node->ps.ps_ProjInfo =
> + ExecBuildProjectionInfo(castNode(Append, node->ps.plan)->epq_targetlist,
> +
> EvalPlanQualStart, EvalPlanQualNext will switch the memory context to
> es_query_cxt.
> so the memory context switch here is not necessary?
>

Yes it is necessary. The EvalPlanQual mechanism switches to the
epqstate->recheckestate->es_query_cxt memory context, which is not the
same as the main query's estate->es_query_cxt (they're different
executor states). Most stuff allocated under EvalPlanQual() is
intended to be short-lived (just for the duration of that specific EPQ
check), whereas this stuff (the TupleDesc and Projection) is intended
to last for the duration of the main query, so that it can be reused
in later EPQ checks.

> do you think it's sane to change
> `ExecAssignExprContext(estate, >ps);`
> to
> `
> /* need an expression context to do the projection */
> if (node->ps.ps_ExprContext == NULL)
> ExecAssignExprContext(estate, >ps);
> `
> ?

Possibly. More importantly, it should be doing a ResetExprContext() to
free any previous stuff before projecting the new row.

At this stage, this is just a rough draft patch. There are many things
like that and code comments that will need to be improved before it is
committable, but for now I'm more concerned with whether it actually
works, and if the approach it's taking is sane.

I've tried various things and I haven't been able to break it, but I'm
still nervous that I might be overlooking something, particularly in
relation to what might get added to the targetlist at various stages
during planning for different types of query.

Regards,
Dean

Re: Functions to return random numbers in a given range

2024-02-27 Thread Dean Rasheed

On Sat, 24 Feb 2024 at 17:10, Tomas Vondra
 wrote:
>
> Hi Dean,
>
> I did a quick review and a little bit of testing on the patch today. I
> think it's a good/useful idea, and I think the code is ready to go (the
> code is certainly much cleaner than anything I'd written ...).
>

Thanks for reviewing!

> I do have one minor comments regarding the docs - it refers to "random
> functions" in a couple places, which sounds to me as if it was talking
> about some functions arbitrarily taken from some list, although it
> clearly means "functions generating random numbers". (I realize this
> might be just due to me not being native speaker.)
>

Yes, I think you're right, that wording was a bit clumsy. Attached is
an update that's hopefully a bit better.

> Did you think about adding more functions generating either other types
> of data distributions (now we have uniform and normal), or random data
> for other data types (I often need random strings, for example)?
>
> Of course, I'm not saying this patch needs to do that. But perhaps it
> might affect how we name stuff to make it "extensible".
>

I don't have any plans to add more random functions, but I did think
about it from that perspective. Currently we have "random" and
"random_normal", so the natural extension would be
"random_${distribution}" for other data distributions, with "uniform"
as the default distribution, if omitted.

For different result datatypes, it ought to be mostly possible to
determine the result type from the arguments. There might be some
exceptions, like maybe "random_bytes(length)" to generate a byte
array, but I think that would be OK.

Regards,
Dean
From b1a63ecce667377435dc16fc262509bff2355b29 Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Fri, 25 Aug 2023 10:42:38 +0100
Subject: [PATCH v3] Add random-number-in-range functions.

This adds 3 functions:

random(min int, max int) returns int
random(min bigint, max bigint) returns bigint
random(min numeric, max numeric) returns numeric

Each returns a random number in the range [min, max].

In the numeric case, the result scale is Max(scale(min), scale(max)).
---
 doc/src/sgml/func.sgml|  43 ++-
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/float.c |  95 --
 src/backend/utils/adt/meson.build |   1 +
 src/backend/utils/adt/numeric.c   | 219 +
 src/backend/utils/adt/pseudorandomfuncs.c | 185 +++
 src/common/pg_prng.c  |  36 +++
 src/include/catalog/pg_proc.dat   |  12 +
 src/include/common/pg_prng.h  |   1 +
 src/include/utils/numeric.h   |   4 +
 src/test/regress/expected/random.out  | 360 ++
 src/test/regress/sql/random.sql   | 164 ++
 12 files changed, 1021 insertions(+), 100 deletions(-)
 create mode 100644 src/backend/utils/adt/pseudorandomfuncs.c

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e5fa82c161..e39e569fb6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1862,6 +1862,39 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

   
 
+  
+   
+
+ random
+
+random ( min integer, max integer )
+integer
+   
+   
+random ( min bigint, max bigint )
+bigint
+   
+   
+random ( min numeric, max numeric )
+numeric
+   
+   
+Returns a random value in the range
+min = x = max.
+For type numeric, the result will have the same number of
+fractional decimal digits as min or
+max, whichever has more.
+   
+   
+random(1, 10)
+7
+   
+   
+random(-0.499, 0.499)
+0.347
+   
+  
+
   

 
@@ -1906,19 +1939,19 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

 
   
-   The random() function uses a deterministic
-   pseudo-random number generator.
+   The random() and random_normal()
+   functions listed in  use a
+   deterministic pseudo-random number generator.
It is fast but not suitable for cryptographic
applications; see the  module for a more
secure alternative.
If setseed() is called, the series of results of
-   subsequent random() calls in the current session
+   subsequent calls to these functions in the current session
can be repeated by re-issuing setseed() with the same
argument.
Without any prior setseed() call in the same
-   session, the first random() call obtains a seed
+   session, the first call to any of these functions obtains a seed
from a platform-dependent source of random bits.
-   These remarks hold equally for random_normal().
   
 
   
diff --git a/src/ba

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-02-27 Thread Dean Rasheed

On Fri, 23 Feb 2024 at 00:12, Tom Lane  wrote:
>
> So after studying this for awhile, I see that the planner is emitting
> a PlanRowMark that presumes that the UNION ALL subquery will be
> scanned as though it's a base relation; but since we've converted it
> to an appendrel, the executor just ignores that rowmark, and the wrong
> things happen.  I think the really right fix would be to teach the
> executor to honor such PlanRowMarks, by getting nodeAppend.c and
> nodeMergeAppend.c to perform EPQ row substitution.

Yes, I agree that's a much better solution, if it can be made to work,
though I have been really struggling to see how.


> the planner produces targetlists like
>
> Output: src_1.val, src_1.id, ROW(src_1.id, src_1.val)
>
> and as you can see the order of the columns doesn't match.
> I can see three ways we might attack that:
>
> 1. Persuade the planner to build output tlists that always match
> the row identity Var.
>
> 2. Change generation of the ROW() expression so that it lists only
> the values we're going to output, in the order we're going to
> output them.
>
> 3. Fix the executor to remap what it gets out of the ROW() into the
> order of the subquery tlists.  This is probably do-able but I'm
> not certain; it may be that the executor hasn't enough info.
> We might need to teach the planner to produce a mapping projection
> and attach it to the Append node, which carries some ABI risk (but
> in the past we've gotten away with adding new fields to the ends
> of plan nodes in the back branches).  Another objection is that
> adding cycles to execution rather than planning might be a poor
> tradeoff --- although if we only do the work when EPQ is invoked,
> maybe it'd be the best way.
>

Of those, option 3 feels like the best one, though I'm really not
sure. I played around with it and convinced myself that the executor
doesn't have the information it needs to make it work, but I think all
it needs is the Append node's original targetlist, as it is just
before it's rewritten by set_dummy_tlist_references(), which rewrites
the attribute numbers sequentially. In the original targetlist, all
the Vars have the right attribute numbers, so it can be used to build
the required projection (I think).

Attached is a very rough patch. It seemed better to build the
projection in the executor rather than the planner, since then the
extra work can be avoided, if EPQ is not invoked.

It seems to work (it passes the isolation tests, and I couldn't break
it in ad hoc testing), but it definitely needs tidying up, and it's
hard to be sure that it's not overlooking something.

Regards,
Dean
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
new file mode 100644
index c7059e7..ccd994c
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -275,6 +275,8 @@ ExecInitAppend(Append *node, EState *est
 	/* For parallel query, this will be overridden later. */
 	appendstate->choose_next_subplan = choose_next_subplan_locally;
 
+	appendstate->as_epq_tupdesc = NULL;
+
 	return appendstate;
 }
 
@@ -288,8 +290,107 @@ static TupleTableSlot *
 ExecAppend(PlanState *pstate)
 {
 	AppendState *node = castNode(AppendState, pstate);
+	EState	   *estate = node->ps.state;
 	TupleTableSlot *result;
 
+	if (estate->es_epq_active != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If there is a relevant
+		 * rowmark for the append relation, return the test tuple if one is
+		 * available.
+		 */
+		EPQState   *epqstate = estate->es_epq_active;
+		int			scanrelid;
+
+		if (bms_get_singleton_member(castNode(Append, node->ps.plan)->apprelids,
+	 ))
+		{
+			if (epqstate->relsubs_done[scanrelid - 1])
+			{
+/*
+ * Return empty slot, as either there is no EPQ tuple for this
+ * rel or we already returned it.
+ */
+TupleTableSlot *slot = node->ps.ps_ResultTupleSlot;
+
+return ExecClearTuple(slot);
+			}
+			else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
+			{
+/*
+ * Return replacement tuple provided by the EPQ caller.
+ */
+TupleTableSlot *slot = epqstate->relsubs_slot[scanrelid - 1];
+
+Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);
+
+/* Mark to remember that we shouldn't return it again */
+epqstate->relsubs_done[scanrelid - 1] = true;
+
+return slot;
+			}
+			else if (epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+			{
+/*
+ * Fetch and return replacement tuple using a non-locking
+ * rowmark.
+ */
+ExecAuxRowMark *earm = epqstate->relsubs_rowmark[scanrelid - 1];
+ExecRowMark *erm = earm->rowmark;
+Datum		datum;
+bool		isNull;
+TupleTableSlot *slot;
+
+Assert(erm->markType == ROW_MARK_COPY);
+
+datum = ExecGetJunkAttribute(epqstate->origslot,
+			 earm->wholeAttNo,
+			 );
+if (isNull)
+	return NULL;
+
+if (node->as_epq_tupdesc == NULL)
+{
+	HeapTupleHeader tuple;
+	Oid			tupType;

Re: RangeTblEntry.inh vs. RTE_SUBQUERY

2024-02-23 Thread Dean Rasheed

On Fri, 23 Feb 2024 at 14:35, Peter Eisentraut  wrote:
>
> Various code comments say that the RangeTblEntry field inh may only be
> set for entries of kind RTE_RELATION.
>
> The function pull_up_simple_union_all() in prepjointree.c sets ->inh to
> true for RTE_SUBQUERY entries:
>
>  /*
>   * Mark the parent as an append relation.
>   */
>  rte->inh = true;
>
> Whatever this is doing appears to be some undocumented magic.

Yes, it's explained a bit more clearly/accurately in expand_inherited_rtentry():

/*
 * expand_inherited_rtentry
 *Expand a rangetable entry that has the "inh" bit set.
 *
 * "inh" is only allowed in two cases: RELATION and SUBQUERY RTEs.
 *
 * "inh" on a plain RELATION RTE means that it is a partitioned table or the
 * parent of a traditional-inheritance set.  In this case we must add entries
 * for all the interesting child tables to the query's rangetable, and build
 * additional planner data structures for them, including RelOptInfos,
 * AppendRelInfos, and possibly PlanRowMarks.
 *
 * Note that the original RTE is considered to represent the whole inheritance
 * set.  In the case of traditional inheritance, the first of the generated
 * RTEs is an RTE for the same table, but with inh = false, to represent the
 * parent table in its role as a simple member of the inheritance set.  For
 * partitioning, we don't need a second RTE because the partitioned table
 * itself has no data and need not be scanned.
 *
 * "inh" on a SUBQUERY RTE means that it's the parent of a UNION ALL group,
 * which is treated as an appendrel similarly to inheritance cases; however,
 * we already made RTEs and AppendRelInfos for the subqueries.  We only need
 * to build RelOptInfos for them, which is done by expand_appendrel_subquery.
 */

> Is this something we should explain the RangeTblEntry comments?
>

+1

Regards,
Dean

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-02-22 Thread Dean Rasheed

On Thu, 22 Feb 2024 at 03:46, zwj  wrote:
>
>   If I want to get the same results as Oracle, do I need to adjust the lock 
> behavior of the update and merge statements?
>   If I want to achieve the same results as Oracle, can I achieve exclusive 
> locking by adjusting update and merge?  Do you have any suggestions?
>

I think that trying to get the same results in Oracle and Postgres may
not always be possible. Each has their own (probably quite different)
implementation of these features, that simply may not be compatible.

In Postgres, MERGE aims to make UPDATE and DELETE actions behave in
the same way as standalone UPDATE and DELETE commands under concurrent
modifications. However, it does not attempt to prevent INSERT actions
from inserting duplicates.

In that context, the UNION ALL issue is a clear bug, and I'll aim to
get that patch committed and back-patched sometime in the next few
days, if there are no objections from other hackers.

However, the issue with INSERT actions inserting duplicates is a
design choice, rather than something that we regard as a bug. It's
possible that a future version of Postgres might improve MERGE,
providing some way round that issue, but there's no guarantee of that
ever happening. Similarly, it sounds like Oracle also sometimes allows
duplicates, as well as having other "bugs" like the one discussed in
[1], that may be difficult for them to fix within their
implementation.

In Postgres, if the target table is subject to concurrent inserts (or
primary key updates), it might be better to use INSERT ... ON CONFLICT
DO UPDATE [2] instead of MERGE. That would avoid inserting duplicates
(though I can't say how compatible that is with anything in Oracle).

Regards,
Dean

[1] 
https://www.postgresql.org/message-id/CAEZATCV_6t5E57q7HsWQBX6a5YOjN5o7K-HicZ8a73EPzfwo=a...@mail.gmail.com

[2] https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-02-21 Thread Dean Rasheed

On Tue, 20 Feb 2024 at 14:49, Dean Rasheed  wrote:
>
> On the face of it, the simplest fix is to tweak is_simple_union_all()
> to prevent UNION ALL subquery pullup for MERGE, forcing a
> subquery-scan plan. A quick test shows that that fixes the reported
> issue.
>
> However, that leaves the question of whether we should do the same for
> UPDATE and DELETE.
>

Attached is a patch that prevents UNION ALL subquery pullup in MERGE only.

I've re-used and extended the isolation test cases added by
1d5caec221, since it's clear that replacing the plain source relation
in those tests with a UNION ALL subquery that returns the same results
should produce the same end result. (Without this patch, the UNION ALL
subquery is pulled up, EPQ rechecking fails to re-find the match, and
a WHEN NOT MATCHED THEN INSERT action is executed instead, resulting
in a primary key violation.)

It's still not quite clear whether preventing UNION ALL subquery
pullup should also apply to UPDATE and DELETE, but I wasn't able to
find any live bug there, so I think they're best left alone.

This fixes the reported issue, though it's worth noting that
concurrent WHEN NOT MATCHED THEN INSERT actions will still lead to
duplicate rows being inserted, which is a limitation that is already
documented [1].

[1] https://www.postgresql.org/docs/current/transaction-iso.html

Regards,
Dean
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
new file mode 100644
index aa83dd3..50ebdd2
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -102,7 +102,7 @@ static bool is_simple_values(PlannerInfo
 static Node *pull_up_constant_function(PlannerInfo *root, Node *jtnode,
 	   RangeTblEntry *rte,
 	   AppendRelInfo *containing_appendrel);
-static bool is_simple_union_all(Query *subquery);
+static bool is_simple_union_all(PlannerInfo *root, Query *subquery);
 static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery,
 		List *colTypes);
 static bool is_safe_append_member(Query *subquery);
@@ -849,7 +849,7 @@ pull_up_subqueries_recurse(PlannerInfo *
 		 * in set_append_rel_pathlist, not here.)
 		 */
 		if (rte->rtekind == RTE_SUBQUERY &&
-			is_simple_union_all(rte->subquery))
+			is_simple_union_all(root, rte->subquery))
 			return pull_up_simple_union_all(root, jtnode, rte);
 
 		/*
@@ -1888,8 +1888,9 @@ pull_up_constant_function(PlannerInfo *r
  * same datatypes.
  */
 static bool
-is_simple_union_all(Query *subquery)
+is_simple_union_all(PlannerInfo *root, Query *subquery)
 {
+	Query	   *parse = root->parse;
 	SetOperationStmt *topop;
 
 	/* Let's just make sure it's a valid subselect ... */
@@ -1910,6 +1911,21 @@ is_simple_union_all(Query *subquery)
 		subquery->cteList)
 		return false;
 
+	/*
+	 * UPDATE/DELETE/MERGE is also a problem, because preprocess_rowmarks()
+	 * will add a rowmark to the UNION ALL subquery, to ensure that it returns
+	 * the same row if rescanned by EvalPlanQual.  Allowing the subquery to be
+	 * pulled up would render that rowmark ineffective.
+	 *
+	 * In practice, however, this seems to only be a problem for MERGE, which
+	 * allows the subquery to appear under an outer join with the target
+	 * relation --- such an outer join might output multiple not-matched rows
+	 * in addition to the required matched row, breaking EvalPlanQual's
+	 * expectation that rerunning the query returns just one row.
+	 */
+	if (parse->commandType == CMD_MERGE)
+		return false;
+
 	/* Recursively check the tree of set operations */
 	return is_simple_union_all_recurse((Node *) topop, subquery,
 	   topop->colTypes);
diff --git a/src/test/isolation/expected/merge-join.out b/src/test/isolation/expected/merge-join.out
new file mode 100644
index 57f048c..6cf045a
--- a/src/test/isolation/expected/merge-join.out
+++ b/src/test/isolation/expected/merge-join.out
@@ -146,3 +146,151 @@ id|val
  3| 30
 (3 rows)
 
+
+starting permutation: b1 b2 m1 hj exu m2u c1 c2 s1
+step b1: BEGIN ISOLATION LEVEL READ COMMITTED;
+step b2: BEGIN ISOLATION LEVEL READ COMMITTED;
+step m1: MERGE INTO tgt USING src ON tgt.id = src.id
+ WHEN MATCHED THEN UPDATE SET val = src.val
+ WHEN NOT MATCHED THEN INSERT VALUES (src.id, src.val);
+step hj: SET LOCAL enable_mergejoin = off; SET LOCAL enable_nestloop = off;
+step exu: EXPLAIN (verbose, costs off)
+   MERGE INTO tgt USING (SELECT * FROM src
+ UNION ALL
+ SELECT * FROM src2) src ON tgt.id = src.id
+ WHEN MATCHED THEN UPDATE SET val = src.val
+ WHEN NOT MATCHED THEN INSERT VALUES (src.id, src.val);
+QUERY PLAN   
+-
+Merge on public.tgt  
+  ->  Hash Left Join

Re: numeric_big in make check?

2024-02-20 Thread Dean Rasheed

On Tue, 20 Feb 2024 at 15:16, Tom Lane  wrote:
>
> Dean Rasheed  writes:
> > Looking at the script itself, the addition, subtraction,
> > multiplication and division tests at the top are probably pointless,
> > since I would expect those operations to be tested adequately (and
> > probably more thoroughly) by the transcendental test cases. In fact, I
> > think it would probably be OK to delete everything above line 650, and
> > just keep the bottom half of the script -- the pow(), exp(), ln() and
> > log() tests, which cover various edge cases, as well as exercising
> > basic arithmetic operations internally.
>
> I could go with that, but let's just transpose those into numeric.
>

Works for me.

Regards,
Dean

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-02-20 Thread Dean Rasheed

On Tue, 20 Feb 2024 at 14:49, Dean Rasheed  wrote:
>
> Also, if the concurrent update were an update of a key
> column that was included in the join condition, the re-scan would
> follow the update to a new matching source row, which is inconsistent
> with what would happen if it were a join to a regular relation.
>

In case it wasn't clear what I was talking about there, here's a simple example:

-- Setup
DROP TABLE IF EXISTS src1, src2, tgt;
CREATE TABLE src1 (a int, b text);
CREATE TABLE src2 (a int, b text);
CREATE TABLE tgt (a int, b text);

INSERT INTO src1 SELECT x, 'Src1 '||x FROM generate_series(1, 3) g(x);
INSERT INTO src2 SELECT x, 'Src2 '||x FROM generate_series(4, 6) g(x);
INSERT INTO tgt SELECT x, 'Tgt '||x FROM generate_series(1, 6, 2) g(x);

-- Session 1
BEGIN;
UPDATE tgt SET a = 2 WHERE a = 1;

-- Session 2
UPDATE tgt t SET b = s.b
  FROM (SELECT * FROM src1 UNION ALL SELECT * FROM src2) s
 WHERE s.a = t.a;
SELECT * FROM tgt;

-- Session 1
COMMIT;

and the result in tgt is:

 a |   b
---+
 2 | Src1 2
 3 | Src1 3
 5 | Src2 5
(3 rows)

whereas if that UNION ALL subquery had been a regular table with the
same contents, the result would have been:

 a |   b
---+
 2 | Tgt 1
 3 | Src1 3
 5 | Src2 5

i.e., the concurrently modified row would not have been updated.

Regards,
Dean

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-02-20 Thread Dean Rasheed

On Mon, 19 Feb 2024 at 17:48, zwj  wrote:
>
> Hello,
>
>I found an issue while using the latest version of PG15 
> (8fa4a1ac61189efffb8b851ee77e1bc87360c445).
>This question is about 'merge into'.
>
>When two merge into statements are executed concurrently, I obtain the 
> following process and results.
>Firstly, the execution results of each Oracle are different, and secondly, 
> I tried to understand its execution process and found that it was not very 
> clear.
>

Hmm, looking at this I think there is a problem with how UNION ALL
subqueries are pulled up, and I don't think it's necessarily limited
to MERGE.

Looking at the plan for this MERGE operation:

explain (verbose, costs off)
merge into mergeinto_0023_tb01 a using (select aid,name,year from
mergeinto_0023_tb02 union all select aid,name,year from
mergeinto_0023_tb03) c on (a.id=c.aid) when matched then update set
year=c.year when not matched then insert(id,name,year)
values(c.aid,c.name,c.year);

QUERY PLAN
-
 Merge on public.mergeinto_0023_tb01 a
   ->  Merge Right Join
 Output: a.ctid, mergeinto_0023_tb02.year,
mergeinto_0023_tb02.aid, mergeinto_0023_tb02.name,
(ROW(mergeinto_0023_tb02.aid, mergeinto_0023_tb02.name,
mergeinto_0023_tb02.year))
 Merge Cond: (a.id = mergeinto_0023_tb02.aid)
 ->  Sort
   Output: a.ctid, a.id
   Sort Key: a.id
   ->  Seq Scan on public.mergeinto_0023_tb01 a
 Output: a.ctid, a.id
 ->  Sort
   Output: mergeinto_0023_tb02.year,
mergeinto_0023_tb02.aid, mergeinto_0023_tb02.name,
(ROW(mergeinto_0023_tb02.aid, mergeinto_0023_tb02.name,
mergeinto_0023_tb02.year))
   Sort Key: mergeinto_0023_tb02.aid
   ->  Append
 ->  Seq Scan on public.mergeinto_0023_tb02
   Output: mergeinto_0023_tb02.year,
mergeinto_0023_tb02.aid, mergeinto_0023_tb02.name,
ROW(mergeinto_0023_tb02.aid, mergeinto_0023_tb02.name,
mergeinto_0023_tb02.year)
 ->  Seq Scan on public.mergeinto_0023_tb03
   Output: mergeinto_0023_tb03.year,
mergeinto_0023_tb03.aid, mergeinto_0023_tb03.name,
ROW(mergeinto_0023_tb03.aid, mergeinto_0023_tb03.name,
mergeinto_0023_tb03.year)

The "ROW(...)" targetlist entries are added because
preprocess_rowmarks() adds a rowmark to the UNION ALL subquery, which
at that point is the only non-target relation in the jointree. It does
this intending that the same values be returned during EPQ rechecking.
However, pull_up_subqueries() causes the UNION all subquery and its
leaf subqueries to be pulled up into the main query as appendrel
entries. So when it comes to EPQ rechecking, the rowmark does
absolutely nothing, and EvalPlanQual() does a full re-scan of
mergeinto_0023_tb02 and mergeinto_0023_tb03 and a re-sort for each
concurrently modified row.

A similar thing happens for UPDATE and DELETE, if they're joined to a
UNION ALL subquery. However, AFAICS that doesn't cause a problem
(other than being pretty inefficient) since, for UPDATE and DELETE,
the join to the UNION ALL subquery will always be an inner join, I
think, and so the join output will always be correct.

However, for MERGE, the join may be an outer join, so during an EPQ
recheck, we're joining the target relation (fixed to return just the
updated row) to the full UNION ALL subquery. So if it's an outer join,
the join output will return all-but-one of the subquery rows as not
matched rows in addition to the one matched row that we want, whereas
the EPQ mechanism is expecting the plan to return just one row.

On the face of it, the simplest fix is to tweak is_simple_union_all()
to prevent UNION ALL subquery pullup for MERGE, forcing a
subquery-scan plan. A quick test shows that that fixes the reported
issue.

is_simple_union_all() already has a test for rowmarks, and a comment
saying that locking isn't supported, but since it is called before
preprocess_rowmarks(), it doesn't know that the subquery is about to
be marked.

However, that leaves the question of whether we should do the same for
UPDATE and DELETE. There doesn't appear to be a live bug there, so
maybe they're best left alone. Also, back-patching a change like that
might make existing queries less efficient. But I feel like I might be
overlooking something here, and this doesn't seem to be how EPQ
rechecks are meant to work (doing a full re-scan of non-target
relations). Also, if the concurrent update were an update of a key
column that was included in the join condition, the re-scan would
follow the update to a new matching source row, which is inconsistent
with what would happen if it were a join to a regular relation.

Thoughts?

Regards,
Dean

Re: numeric_big in make check?

2024-02-20 Thread Dean Rasheed

On Mon, 19 Feb 2024 at 15:35, Tom Lane  wrote:
>
> I thought I'd try to acquire some actual facts here, so I compared
> the code coverage shown by "make check" as of HEAD, versus "make
> check" after adding numeric_big to parallel_schedule.  I saw the
> following lines of numeric.c as being covered in the second run
> and not the first:
>

I don't think that tells the whole story. Code coverage only tells you
whether a particular line of code has been hit, not whether it has
been properly tested with all the values that might lead to different
cases. For example:

> sqrt_var():
> 10073 res_ndigits = res_weight + 1 - (-rscale - 1) / DEC_DIGITS;
>

To get code coverage of this line, all you need to do is ensure that
sqrt_var() is called with rscale < -1 (which can only happen from the
range-reduction code in ln_var()). You could do that by computing
ln(1e50), which results in calling sqrt_var() with rscale = -2,
causing that line of code in sqrt_var() to be hit. That would satisfy
code coverage, but I would argue that you've only really tested that
line of code properly if you also hit it with rscale = -3, and
probably a few more values, to check that the round-down division is
working as intended.

Similarly, looking at commit 18a02ad2a5, the crucial code change was
the following in power_var():

-val = Max(val, -NUMERIC_MAX_RESULT_SCALE);
-val = Min(val, NUMERIC_MAX_RESULT_SCALE);
val *= 0.434294481903252;/* approximate decimal result weight */

Any test that calls numeric_power() is going to hit those lines of
code, but to see a failure, it was necessary to hit them with the
absolute value of "val" greater than NUMERIC_MAX_RESULT_SCALE, which
is why that commit added 2 new test cases to numeric_big, calling
power_var() with "val" outside that range. Code coverage is never
going to tell you whether or not that is being tested, since the code
change was to delete lines. Even if that weren't the case, any line of
code involving Min() or Max() has 2 branches, and code coverage won't
tell you if you've hit both of them.

> Pretty poor return on investment for the runtime consumed.  I don't
> object to adding something to numeric.sql that's targeted at adding
> coverage for these (or anyplace else that's not covered), but let's
> not just throw cycles at the problem.
>

I agree that blindly performing a bunch of large computations (just
throwing cycles at the problem) is not a good approach to testing. And
maybe that's a fair characterisation of parts of numeric_big. However,
it also contains some fairly well-targeted tests intended to test
specific edge cases that only occur with particular ranges of inputs,
which don't necessarily show up as increased code coverage.

So I think this requires more careful analysis. Simply deleting
numeric_big and adding tests to numeric.sql until the same level of
code coverage is achieved will not give the same level of testing.

It's also worth noting that the cost of running numeric_big has come
down very significantly over the last few years, as can be seen by
running the current numeric_big script against old backends. There's a
lot of random variation in the timings, but the trend is very clear:

9.51.641s
9.60.856s
100.845s
110.750s
120.760s
130.672s
140.430s
150.347s
160.336s

Arguably, this is a useful benchmark to spot performance regressions
and test proposed performance-improving patches.

If I run "EXTRA_TESTS=numeric_big make check | grep 'ok ' | sort
-nrk5", numeric_big is not in the top 10 most expensive tests (it's
usually down at around 15'th place).

Looking at the script itself, the addition, subtraction,
multiplication and division tests at the top are probably pointless,
since I would expect those operations to be tested adequately (and
probably more thoroughly) by the transcendental test cases. In fact, I
think it would probably be OK to delete everything above line 650, and
just keep the bottom half of the script -- the pow(), exp(), ln() and
log() tests, which cover various edge cases, as well as exercising
basic arithmetic operations internally. We might want to check that
I/O of large numerics is still being tested properly though.

If we did that, numeric_big would be even further down the list of
expensive tests, and I'd say it should be run by default.

Regards,
Dean

Re: numeric_big in make check?

2024-02-19 Thread Dean Rasheed

> > On 19 Feb 2024, at 12:48, Tom Lane  wrote:
> >
> > Or we could just flush it.  It's never detected a bug, and I think
> > you'd find that it adds zero code coverage (or if not, we could
> > fix that in a far more surgical and less expensive manner).
>

Off the top of my head, I can't say to what extent that's true, but it
wouldn't surprise me if at least some of the tests added in the last 4
commits to touch that file aren't covered by tests elsewhere. Indeed
that certainly looks like the case for 18a02ad2a5. I'm sure those
tests could be pared down though.

Regards,
Dean

Re: bug report: some issues about pg_15_stable(8fa4a1ac61189efffb8b851ee77e1bc87360c445)

2024-02-05 Thread Dean Rasheed

On Sun, 4 Feb 2024 at 18:19, zwj  wrote:
>
>I found an issue while using the latest version of PG15 
> (8fa4a1ac61189efffb8b851ee77e1bc87360c445).
>
>This issue is related to Megre into and other concurrent statements being 
> written.
>I obtained different results in Oracle and PG using the same statement 
> below.
>Based on my personal judgment, the result set of this statement in pg is 
> abnormal.
>

I would say that the Postgres result is correct here. The merge update
action would have updated the row with id=2, setting year=30, but
since there is a concurrent update, changing that id to 5, it waits to
see if that transaction commits. Once it does, the id no longer
matches, and merge update action is aborted, and the "not matched"
action is performed instead. That's consistent with the result that
you'd get if you performed the 2 transactions serially, doing the
update first then the merge.

Oracle, on the other hand, proceeds with the merge update action,
after the other transaction commits, even though the row being updated
no longer matches the join condition. Not only does that seem to be
incorrect, it's inconsistent with what both Oracle and Postgres do if
you replace the merge command with the equivalent standalone update
for that row: "update mergeinto_0023_tb01 set year = 30 where id = 2"
in the second session. So I'd say that this is an Oracle bug.

Regards,
Dean

Re: Functions to return random numbers in a given range

2024-01-30 Thread Dean Rasheed

On Tue, 30 Jan 2024 at 12:47, Aleksander Alekseev
 wrote:
>
> Maybe I'm missing something but I'm not sure if I understand what this
> test tests particularly:
>
> ```
> -- There should be no triple duplicates in 1000 full-range 32-bit random()
> -- values.  (Each of the C(1000, 3) choices of triplets from the 1000 values
> -- has a probability of 1/(2^32)^2 of being a triple duplicate, so the
> -- average number of triple duplicates is 1000 * 999 * 998 / 6 / 2^64, which
> -- is roughly 9e-12.)
> SELECT r, count(*)
> FROM (SELECT random(-2147483648, 2147483647) r
>   FROM generate_series(1, 1000)) ss
> GROUP BY r HAVING count(*) > 2;
> ```
>
> The intent seems to be to check the fact that random numbers are
> distributed evenly. If this is the case I think the test is wrong. The
> sequence of numbers 100, 100, 100, 100, 100 is as random as 99, 8, 4,
> 12, 45 and every particular sequence has low probability. All in all
> personally I would argue that this is a meaningless test that just
> fails with a low probability. Same for the tests that follow below.
>

I'm following the same approach used to test the existing random
functions, and the idea is the same. For example, this existing test:

-- There should be no duplicates in 1000 random() values.
-- (Assuming 52 random bits in the float8 results, we could
-- take as many as 3000 values and still have less than 1e-9 chance
-- of failure, per https://en.wikipedia.org/wiki/Birthday_problem)
SELECT r, count(*)
FROM (SELECT random() r FROM generate_series(1, 1000)) ss
GROUP BY r HAVING count(*) > 1;

If the underlying PRNG were non-uniform, or the method of reduction to
the required range was flawed in some way that reduced the number of
actual possible return values, then the probability of duplicates
would be increased. A non-uniform distribution would probably be
caught by the KS tests, but uniform gaps in the possible outputs might
not be, so I think this test still has value.

> The proper way of testing PRNG would be to call setseed() and compare
> return values with expected ones. I don't mind testing the proposed
> invariants but they should do this after calling setseed(). Currently
> the patch places the tests right before the call.
>

There are also new tests of that nature, following the call to
setseed(0.5). They're useful for a quick visual check of the results,
and confirming the expected number of digits after the decimal point
in the numeric case. However, I think those tests are insufficient on
their own.

Regards,
Dean

Re: Functions to return random numbers in a given range

2024-01-29 Thread Dean Rasheed

On Fri, 26 Jan 2024 at 20:44, David Zhang  wrote:
>
> Thank you for the patch.
>
> I applied this patch manually to the master branch, resolving a conflict
> in `numeric.h`. It successfully passed both `make check` and `make
> check-world`.
>

Thanks for testing.

Interestingly, the cfbot didn't pick up on the fact that it needed
rebasing. Anyway, the copyright years in the new file's header comment
needed updating, so here is a rebase doing that.

Regards,
Dean
From 15d0ba981ff03eca7143726fe7512adf00ee3a84 Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Fri, 25 Aug 2023 10:42:38 +0100
Subject: [PATCH v2] Add random-number-in-range functions.

This adds 3 functions:

random(min int, max int) returns int
random(min bigint, max bigint) returns bigint
random(min numeric, max numeric) returns numeric

Each returns a random number in the range [min, max].

In the numeric case, the result scale is Max(scale(min), scale(max)).
---
 doc/src/sgml/func.sgml|  39 ++-
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/float.c |  95 --
 src/backend/utils/adt/meson.build |   1 +
 src/backend/utils/adt/numeric.c   | 219 +
 src/backend/utils/adt/pseudorandomfuncs.c | 185 +++
 src/common/pg_prng.c  |  36 +++
 src/include/catalog/pg_proc.dat   |  12 +
 src/include/common/pg_prng.h  |   1 +
 src/include/utils/numeric.h   |   4 +
 src/test/regress/expected/random.out  | 360 ++
 src/test/regress/sql/random.sql   | 164 ++
 12 files changed, 1017 insertions(+), 100 deletions(-)
 create mode 100644 src/backend/utils/adt/pseudorandomfuncs.c

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6788ba8ef4..6d76fb5853 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1862,6 +1862,36 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

   
 
+  
+   
+
+ random
+
+random ( min integer, max integer )
+integer
+   
+   
+random ( min bigint, max bigint )
+bigint
+   
+   
+random ( min numeric, max numeric )
+numeric
+   
+   
+Return a random value in the range
+min = x = max.
+   
+   
+random(1, 10)
+7
+   
+   
+random(-0.499, 0.499)
+0.347
+   
+  
+
   

 
@@ -1906,19 +1936,18 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

 
   
-   The random() function uses a deterministic
-   pseudo-random number generator.
+   The random functions listed in 
+   use a deterministic pseudo-random number generator.
It is fast but not suitable for cryptographic
applications; see the  module for a more
secure alternative.
If setseed() is called, the series of results of
-   subsequent random() calls in the current session
+   subsequent calls to these random functions in the current session
can be repeated by re-issuing setseed() with the same
argument.
Without any prior setseed() call in the same
-   session, the first random() call obtains a seed
+   session, the first call to any of these random functions obtains a seed
from a platform-dependent source of random bits.
-   These remarks hold equally for random_normal().
   
 
   
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 199eae525d..610ccf2f79 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -82,6 +82,7 @@ OBJS = \
 	pg_lsn.o \
 	pg_upgrade_support.o \
 	pgstatfuncs.o \
+	pseudorandomfuncs.o \
 	pseudotypes.o \
 	quote.o \
 	rangetypes.o \
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 901edcc896..cbbb8aecaf 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -21,10 +21,8 @@
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
-#include "common/pg_prng.h"
 #include "common/shortest_dec.h"
 #include "libpq/pqformat.h"
-#include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/float.h"
 #include "utils/fmgrprotos.h"
@@ -64,10 +62,6 @@ float8		degree_c_sixty = 60.0;
 float8		degree_c_one_half = 0.5;
 float8		degree_c_one = 1.0;
 
-/* State for drandom() and setseed() */
-static bool drandom_seed_set = false;
-static pg_prng_state drandom_seed;
-
 /* Local function prototypes */
 static double sind_q1(double x);
 static double cosd_q1(double x);
@@ -2785,95 +2779,6 @@ derfc(PG_FUNCTION_ARGS)
 }
 
 
-/* == RANDOM FUNCTIONS == */
-
-
-/*
- * initialize_drandom_seed - initialize drandom_seed if not yet done
- */
-static void
-initialize_drandom_seed(void)
-{
-	/* Initialize random seed

Re: Functions to return random numbers in a given range

2024-01-29 Thread Dean Rasheed

On Thu, 28 Dec 2023 at 07:34, jian he  wrote:
>
> Your patch works.
> performance is the best amount for other options in [0].
> I don't have deep knowledge about which one is more random.
>

Thanks for testing.

> Currently we have to explicitly mention the lower and upper bound.
> but can we do this:
> just give me an int, int means the int data type can be represented.
> or just give me a random bigint.
> but for numeric, the full numeric values that can be represented are very big.
>
> Maybe we can use the special value null to achieve this
> like use
> select random(NULL::int,null)
> to represent a random int in the full range of integers values can be
> represented.
>

Hmm, I don't particularly like that idea. It seems pretty ugly. Now
that we support literal integers in hex, with underscores, it's
relatively easy to pass INT_MIN/MAX as arguments to these functions,
if that's what you need. I think if we were going to have a shorthand
for getting full-range random integers, it would probably be better to
introduce separate no-arg functions for that. I'm not really sure if
that's a sufficiently common use case to justify the effort though.

Regards,
Dean

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2024-01-29 Thread Dean Rasheed

On Fri, 26 Jan 2024 at 15:57, Alvaro Herrera  wrote:
>
> Well, firstly this is clearly a feature we want to have, even though
> it's non-standard, because people use it and other implementations have
> it.  (Eh, so maybe somebody should be talking to the SQL standard
> committee about it).  As for code quality, I didn't do a comprehensive
> review, but I think it is quite reasonable.  Therefore, my inclination
> would be to get it committed soonish, and celebrate it widely so that
> people can test it soon and complain if they see something they don't
> like.
>

Thanks. I have been going over this patch again, and for the most
part, I'm pretty happy with it.

One thing that's bothering me though is what happens if a row being
merged is concurrently updated. Specifically, if a concurrent update
causes a formerly matching row to no longer match the join condition,
and there are both NOT MATCHED BY SOURCE and NOT MATCHED BY TARGET
actions, so that it's doing in full join between the source and target
relations. In this case, when the EPQ mechanism rescans the subplan
node, there will be 2 possible output tuples (one with source null,
and one with target null), and EvalPlanQual() will just return the
first one, which is a more-or-less arbitrary choice, depending on the
type of join (hash/merge), and (for a mergejoin) the values of the
inner and outer join keys. Thus, it may execute a NOT MATCHED BY
SOURCE action, or a NOT MATCHED BY TARGET action, and it's difficult
to predict which.

Arguably it's not worth worrying too much about what happens in a
corner-case concurrent update like this, when MERGE is already
inconsistent under other concurrent update scenarios, but I don't like
having unpredictable results like this, which can depend on the plan
chosen.

I think the best (and probably simplest) solution is to always opt for
a NOT MATCHED BY TARGET action in this case, so then the result is
predictable, and we can document what is expected to happen.

Regards,
Dean

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2024-01-26 Thread Dean Rasheed

n Fri, 26 Jan 2024 at 14:59, vignesh C  wrote:
>
> CFBot shows that the patch does not apply anymore as in [1]:
>

Rebased version attached.

Regards,
Dean
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
new file mode 100644
index f8f83d4..6ef0c2b
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -396,8 +396,8 @@
 originally matched appears later in the list of actions.
 On the other hand, if the row is concurrently updated or deleted so
 that the join condition fails, then MERGE will
-evaluate the condition's NOT MATCHED actions next,
-and execute the first one that succeeds.
+evaluate the condition's NOT MATCHED [BY TARGET]
+actions next, and execute the first one that succeeds.
 If MERGE attempts an INSERT
 and a unique index is present and a duplicate row is concurrently
 inserted, then a uniqueness violation error is raised;
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
new file mode 100644
index 655f7dc..f421716
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -33,7 +33,8 @@ USING dat
 and when_clause is:
 
 { WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
-  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }
+  WHEN NOT MATCHED BY SOURCE [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
+  WHEN NOT MATCHED [BY TARGET] [ AND condition ] THEN { merge_insert | DO NOTHING } }
 
 and merge_insert is:
 
@@ -70,7 +71,9 @@ DELETE
from data_source to
target_table_name
producing zero or more candidate change rows.  For each candidate change
-   row, the status of MATCHED or NOT MATCHED
+   row, the status of MATCHED,
+   NOT MATCHED BY SOURCE,
+   or NOT MATCHED [BY TARGET]
is set just once, after which WHEN clauses are evaluated
in the order specified.  For each candidate change row, the first clause to
evaluate as true is executed.  No more than one WHEN
@@ -226,16 +229,37 @@ DELETE
   At least one WHEN clause is required.
  
  
+  The WHEN clause may specify WHEN MATCHED,
+  WHEN NOT MATCHED BY SOURCE, or
+  WHEN NOT MATCHED [BY TARGET].
+  Note that the SQL standard only defines
+  WHEN MATCHED and WHEN NOT MATCHED
+  (which is defined to mean no matching target row).
+  WHEN NOT MATCHED BY SOURCE is an extension to the
+  SQL standard, as is the option to append
+  BY TARGET to WHEN NOT MATCHED, to
+  make its meaning more explicit.
+ 
+ 
   If the WHEN clause specifies WHEN MATCHED
-  and the candidate change row matches a row in the
+  and the candidate change row matches a row in the source to a row in the
   target_table_name,
   the WHEN clause is executed if the
   condition is
   absent or it evaluates to true.
  
  
-  Conversely, if the WHEN clause specifies
-  WHEN NOT MATCHED
+  If the WHEN clause specifies
+  WHEN NOT MATCHED BY SOURCE and the candidate change
+  row represents a row in the
+  target_table_name that does
+  not match a source row, the WHEN clause is executed
+  if the condition is
+  absent or it evaluates to true.
+ 
+ 
+  If the WHEN clause specifies
+  WHEN NOT MATCHED [BY TARGET]
   and the candidate change row does not match a row in the
   target_table_name,
   the WHEN clause is executed if the
@@ -257,7 +281,10 @@ DELETE
  
   A condition on a WHEN MATCHED clause can refer to columns
   in both the source and the target relations. A condition on a
-  WHEN NOT MATCHED clause can only refer to columns from
+  WHEN NOT MATCHED BY SOURCE clause can only refer to
+  columns from the target relation, since by definition there is no matching
+  source row. A condition on a WHEN NOT MATCHED [BY TARGET]
+  clause can only refer to columns from
   the source relation, since by definition there is no matching target row.
   Only the system attributes from the target table are accessible.
  
@@ -382,8 +409,10 @@ DELETE
   WHEN MATCHED clause, the expression can use values
   from the original row in the target table, and values from the
   data_source row.
-  If used in a WHEN NOT MATCHED clause, the
-  expression can use values from the data_source.
+  If used in a WHEN NOT MATCHED BY SOURCE clause, the
+  expression can only use values from the original row in the target table.
+  If used in a WHEN NOT MATCHED [BY TARGET] clause, the
+  expression can only use values from the data_source.
  
 

@@ -452,8 +481,9 @@ MERGE tot

 
  
-  Evaluate whether each row is MATCHED or
-  NOT MATCHED.
+  Evaluate whether each row is MATCHED,
+  NOT MATCHED BY SOURCE, or
+  NOT MATCHED [BY TARGET].
  
 
 
@@ -528,7 +558,8 @@ MERGE tot
   
If a

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2024-01-26 Thread Dean Rasheed

On Mon, 22 Jan 2024 at 02:10, Peter Smith  wrote:
>
> Hi, this patch was marked in CF as "Needs Review" [1], but there has
> been no activity on this thread for 6+ months.
>
> Is anything else planned? Can you post something to elicit more
> interest in the latest patch? Otherwise, if nothing happens then the
> CF entry will be closed ("Returned with feedback") at the end of this
> CF.
>

I think it has had a decent amount of review and all the review
comments have been addressed. I'm not quite sure from Alvaro's last
comment whether he was implying that he thought it was ready for
commit.

Looking back through the thread, the general sentiment seems to be in
favour of adding this feature, and I still think it's worth doing, but
I haven't managed to find much time to progress it recently.

Regards,
Dean

Re: psql JSON output format

2024-01-09 Thread Dean Rasheed

[cc'ing Joe]

On Tue, 9 Jan 2024 at 14:35, Christoph Berg  wrote:
>
> Getting it print numeric/boolean without quotes was actually easy, as
> well as json(b). Implemented as the attached v2 patch.
>
> But: not quoting json means that NULL and 'null'::json will both be
> rendered as 'null'. That strikes me as a pretty undesirable conflict.
> Does the COPY patch also do that?
>

Yes. Perhaps what needs to happen is for a NULL column to be omitted
entirely from the output. I think the COPY TO json patch would have to
do that if COPY FROM json were to be added later, to make it
round-trip safe.

> > OTOH, this patch outputs the Postgres string representation of the
> > object, which might be round-trip safe, but is not very convenient
> > for any other tool to read.
>
> For my use case, I need something that can be fed back into PG.
> Reassembling all the json parts back into proper values would be a
> pretty hard problem.
>

What is your use case? It seems like what you want is quite different
from the COPY patch.

Regards,
Dean

Re: psql JSON output format

2024-01-09 Thread Dean Rasheed

On Tue, 9 Jan 2024 at 09:43, Christoph Berg  wrote:
>
> I can see we probably wouldn't want two different output formats named
> json, but the general idea of "allow psql to format results as json of
> strings" makes a lot of sense, so we should try to make it work. Does
> it even have to be compatible?
>

I would say that they should be compatible, in the sense that an
external tool parsing the outputs should regard them as equal, but
maybe there are different expectations for the two features.

> I'll note that the current code uses PG's string representation of
> strings which is meant to be round-trip safe when fed back into the
> server. So quoted numeric values aren't a problem at all. (And that
> part is fixable.)
>

I'm not sure that being round-trip safe is a necessary goal here, but
again, it's about the expectations for the feature. I was imagining
that the goal was to produce something that an external tool would
parse, rather than something Postgres would read back in. So not
quoting numeric values seems desirable to produce output that better
reflects the semantic content of the data (though it doesn't affect it
being round-trip safe).

> The real problem here is that COPY json violates that by pulling json
> values up one syntax level. "Normal" cases will be fixable by just
> looking for json(b) and printing that unquoted. And composite types
> with jsonb members... are these really only half-quoted?!
>

What to do with composites is an interesting point in question. COPY
format json will turn a composite into a JSON object whose keys are
the field names. That's useful if you want to use an external tool to
parse the result and get at the individual fields, but it's not
round-trip safe. OTOH, this patch outputs the Postgres string
representation of the object, which might be round-trip safe, but is
not very convenient for any other tool to read.

Regards,
Dean

Re: Emitting JSON to file using COPY TO

2024-01-08 Thread Dean Rasheed

On Thu, 7 Dec 2023 at 01:10, Joe Conway  wrote:
>
> The attached should fix the CopyOut response to say one column.
>

Playing around with this, I found a couple of cases that generate an error:

COPY (SELECT 1 UNION ALL SELECT 2) TO stdout WITH (format json);

COPY (VALUES (1), (2)) TO stdout WITH (format json);

both of those generate the following:

ERROR:  record type has not been registered

Regards,
Dean

Re: psql JSON output format

2024-01-08 Thread Dean Rasheed

On Mon, 18 Dec 2023 at 16:34, Jelte Fennema-Nio  wrote:
>
> On Mon, 18 Dec 2023 at 16:38, Christoph Berg  wrote:
> > We'd want both patches even if they do the same thing on two different
> > levels, I'd say.
>
> Makes sense.
>

I can see the appeal in this feature. However, as it stands, this
isn't compatible with copy format json, and I think it would need to
duplicate quite a lot of the JSON output code in client-side code to
make it compatible.

Consider, for example:

CREATE TABLE foo(col json);
INSERT INTO foo VALUES ('"str_value"');

copy foo to stdout with (format json) produces this:

{"col":"str_value"}

which is as expected. However, psql -Jc "select * from foo" produces

[
{ "col": "\"str_value\"" }
]

The problem is, various datatypes such as boolean, number types, json,
and jsonb must not be quoted and escaped, since that would change them
to strings or double-encode them in the result. And then there are
domain types built on top of those types, and arrays, etc. See, for
example, the logic in json_categorize_type(). I think that trying to
duplicate that client-side is doomed to failure.

Regards,
Dean

Functions to return random numbers in a given range

2023-12-21 Thread Dean Rasheed

Attached is a patch that adds 3 SQL-callable functions to return
random integer/numeric values chosen uniformly from a given range:

  random(min int, max int) returns int
  random(min bigint, max bigint) returns bigint
  random(min numeric, max numeric) returns numeric

The return value is in the range [min, max], and in the numeric case,
the result scale equals Max(scale(min), scale(max)), so it can be used
to generate large random integers, as well as decimals.

The goal is to provide simple, easy-to-use functions that operate
correctly over arbitrary ranges, which is trickier than it might seem
using the existing random() function. The main advantages are:

1. Support for arbitrary bounds (provided that max >= min). A SQL or
PL/pgSQL implementation based on the existing random() function can
suffer from integer overflow if the difference max-min is too large.

2. Uniform results over the full range. It's easy to overlook the fact
that in a naive implementation doing something like
"((max-min)*random()+min)::int", the endpoint values will be half as
likely as any other value, since casting to integer rounds to nearest.

3. Makes better use of the underlying PRNG, not limited to the 52-bits
of double precision values.

4. Simpler and more efficient generation of random numeric values.
This is something I have commonly wanted in the past, and have usually
resorted to hacks involving multiple calls to random() to build
strings of digits, which is horribly slow, and messy.

The implementation moves the existing random functions to a new source
file, so the new functions all share a common PRNG state with the
existing random functions, and that state is kept private to that
file.

Regards,
Dean
From 0b7015668387c337114adb4b3c24fe1d8053bf9c Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Fri, 25 Aug 2023 10:42:38 +0100
Subject: [PATCH v1] Add random-number-in-range functions.

This adds 3 functions:

random(min int, max int) returns int
random(min bigint, max bigint) returns bigint
random(min numeric, max numeric) returns numeric

Each returns a random number in the range [min, max].

In the numeric case, the result scale is Max(scale(min), scale(max)).
---
 doc/src/sgml/func.sgml|  39 ++-
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/float.c |  95 --
 src/backend/utils/adt/meson.build |   1 +
 src/backend/utils/adt/numeric.c   | 219 +
 src/backend/utils/adt/pseudorandomfuncs.c | 185 +++
 src/common/pg_prng.c  |  36 +++
 src/include/catalog/pg_proc.dat   |  12 +
 src/include/common/pg_prng.h  |   1 +
 src/include/utils/numeric.h   |   4 +
 src/test/regress/expected/random.out  | 360 ++
 src/test/regress/sql/random.sql   | 164 ++
 12 files changed, 1017 insertions(+), 100 deletions(-)
 create mode 100644 src/backend/utils/adt/pseudorandomfuncs.c

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 20da3ed033..b0b65d81dc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1862,6 +1862,36 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

   
 
+  
+   
+
+ random
+
+random ( min integer, max integer )
+integer
+   
+   
+random ( min bigint, max bigint )
+bigint
+   
+   
+random ( min numeric, max numeric )
+numeric
+   
+   
+Return a random value in the range
+min = x = max.
+   
+   
+random(1, 10)
+7
+   
+   
+random(-0.499, 0.499)
+0.347
+   
+  
+
   

 
@@ -1906,19 +1936,18 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

 
   
-   The random() function uses a deterministic
-   pseudo-random number generator.
+   The random functions listed in 
+   use a deterministic pseudo-random number generator.
It is fast but not suitable for cryptographic
applications; see the  module for a more
secure alternative.
If setseed() is called, the series of results of
-   subsequent random() calls in the current session
+   subsequent calls to these random functions in the current session
can be repeated by re-issuing setseed() with the same
argument.
Without any prior setseed() call in the same
-   session, the first random() call obtains a seed
+   session, the first call to any of these random functions obtains a seed
from a platform-dependent source of random bits.
-   These remarks hold equally for random_normal().
   
 
   
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 199eae525d..610ccf2f79 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -82,6 +82,7 @@ OBJS = \
 	pg_lsn.o \
 	pg_upgr

pgsql: Fix BEFORE ROW trigger handling in cross-partition MERGE update.

2023-12-21 Thread Dean Rasheed

Fix BEFORE ROW trigger handling in cross-partition MERGE update.

Fix a bug during MERGE if a cross-partition update is attempted on a
partitioned table with a BEFORE DELETE ROW trigger that returns NULL,
to prevent the update. This would cause an error to be thrown, or an
assert failure in an assert-enabled build.

This was an oversight in 9321c79c86, which failed to properly
distinguish a DELETE prevented by a trigger from one prevented by a
concurrent update. Fix by having ExecDelete() return the TM_Result
status to ExecCrossPartitionUpdate(), so that it can distinguish the
two cases, and make ExecCrossPartitionUpdate() return the TM_Result
status to ExecUpdateAct(), so that it can return the correct status
from a concurrent update.

In addition, ensure that the command tag is correctly updated by
having ExecMergeMatched() pass canSetTag to ExecUpdateAct(), rather
than passing false, so that it updates the command tag if it does a
cross-partition update, making this code path in ExecMergeMatched()
consistent with ExecUpdate().

Per bug #18238 from Alexander Lakhin. Back-patch to v15, where MERGE
was introduced.

Dean Rasheed, reviewed by Richard Guo and Jian He.

Discussion: https://postgr.es/m/18238-2f2bdc7f720180b9%40postgresql.org

Branch
--
REL_16_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/7f07384dc2697abb405e47a7447c6edf7e07d5f9

Modified Files
--
src/backend/executor/nodeModifyTable.c | 20 +---
src/test/regress/expected/merge.out| 91 ++
src/test/regress/sql/merge.sql | 64 
3 files changed, 167 insertions(+), 8 deletions(-)

pgsql: Fix BEFORE ROW trigger handling in cross-partition MERGE update.

2023-12-21 Thread Dean Rasheed

Fix BEFORE ROW trigger handling in cross-partition MERGE update.

Fix a bug during MERGE if a cross-partition update is attempted on a
partitioned table with a BEFORE DELETE ROW trigger that returns NULL,
to prevent the update. This would cause an error to be thrown, or an
assert failure in an assert-enabled build.

This was an oversight in 9321c79c86, which failed to properly
distinguish a DELETE prevented by a trigger from one prevented by a
concurrent update. Fix by having ExecDelete() return the TM_Result
status to ExecCrossPartitionUpdate(), so that it can distinguish the
two cases, and make ExecCrossPartitionUpdate() return the TM_Result
status to ExecUpdateAct(), so that it can return the correct status
from a concurrent update.

In addition, ensure that the command tag is correctly updated by
having ExecMergeMatched() pass canSetTag to ExecUpdateAct(), rather
than passing false, so that it updates the command tag if it does a
cross-partition update, making this code path in ExecMergeMatched()
consistent with ExecUpdate().

Per bug #18238 from Alexander Lakhin. Back-patch to v15, where MERGE
was introduced.

Dean Rasheed, reviewed by Richard Guo and Jian He.

Discussion: https://postgr.es/m/18238-2f2bdc7f720180b9%40postgresql.org

Branch
--
REL_15_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/7e8c6d7af6588b8a7249cc3e1eb195cc32d120a2

Modified Files
--
src/backend/executor/nodeModifyTable.c | 20 +---
src/test/regress/expected/merge.out| 91 ++
src/test/regress/sql/merge.sql | 64 
3 files changed, 167 insertions(+), 8 deletions(-)

pgsql: Fix BEFORE ROW trigger handling in cross-partition MERGE update.

2023-12-21 Thread Dean Rasheed

Fix BEFORE ROW trigger handling in cross-partition MERGE update.

Fix a bug during MERGE if a cross-partition update is attempted on a
partitioned table with a BEFORE DELETE ROW trigger that returns NULL,
to prevent the update. This would cause an error to be thrown, or an
assert failure in an assert-enabled build.

This was an oversight in 9321c79c86, which failed to properly
distinguish a DELETE prevented by a trigger from one prevented by a
concurrent update. Fix by having ExecDelete() return the TM_Result
status to ExecCrossPartitionUpdate(), so that it can distinguish the
two cases, and make ExecCrossPartitionUpdate() return the TM_Result
status to ExecUpdateAct(), so that it can return the correct status
from a concurrent update.

In addition, ensure that the command tag is correctly updated by
having ExecMergeMatched() pass canSetTag to ExecUpdateAct(), rather
than passing false, so that it updates the command tag if it does a
cross-partition update, making this code path in ExecMergeMatched()
consistent with ExecUpdate().

Per bug #18238 from Alexander Lakhin. Back-patch to v15, where MERGE
was introduced.

Dean Rasheed, reviewed by Richard Guo and Jian He.

Discussion: https://postgr.es/m/18238-2f2bdc7f720180b9%40postgresql.org

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/a0ff37173d7d412d53bc299fc611068be3bc7060

Modified Files
--
src/backend/executor/nodeModifyTable.c | 20 +---
src/test/regress/expected/merge.out| 91 ++
src/test/regress/sql/merge.sql | 64 
3 files changed, 167 insertions(+), 8 deletions(-)

Adding OLD/NEW support to RETURNING

2023-12-04 Thread Dean Rasheed

I have been playing around with the idea of adding support for OLD/NEW
to RETURNING, partly motivated by the discussion on the MERGE
RETURNING thread [1], but also because I think it would be a very
useful addition for other commands (UPDATE in particular).

This was discussed a long time ago [2], but that previous discussion
didn't lead to a workable patch, and so I have taken a different
approach here.

My first thought was that this would only really make sense for UPDATE
and MERGE, since OLD/NEW are pretty pointless for INSERT/DELETE
respectively. However...

1. For an INSERT with an ON CONFLICT ... DO UPDATE clause, returning
OLD might be very useful, since it provides a way to see which rows
conflicted, and return the old conflicting values.

2. If a DELETE is turned into an UPDATE by a rule (e.g., to mark rows
as deleted, rather than actually deleting them), then returning NEW
can also be useful. (I admit, this is a somewhat obscure use case, but
it's still possible.)

3. In a MERGE, we need to be able to handle all 3 command types anyway.

4. It really isn't any extra effort to support INSERT and DELETE.

So in the attached very rough patch (no docs, minimal testing) I have
just allowed OLD/NEW in RETURNING for all command types (except, I
haven't done MERGE here - I think that's best kept as a separate
patch). If there is no OLD/NEW row in a particular context, it just
returns NULLs. The regression tests contain examples of  1 & 2 above.


Based on Robert Haas' suggestion in [2], the patch works by adding a
new "varreturningtype" field to Var nodes. This field is set during
parse analysis of the returning clause, which adds new namespace
aliases for OLD and NEW, if tables with those names/aliases are not
already present. So the resulting Var nodes have the same
varno/varattno as they would normally have had, but a different
varreturningtype.

For the most part, the rewriter and parser are then untouched, except
for a couple of places necessary to ensure that the new field makes it
through correctly. In particular, none of this affects the shape of
the final plan produced. All of the work to support the new Var
returning type is done in the executor.

This turns out to be relatively straightforward, except for
cross-partition updates, which was a little trickier since the tuple
format of the old row isn't necessarily compatible with the new row,
which is in a different partition table and so might have a different
column order.

One thing that I've explicitly disallowed is returning OLD/NEW for
updates to foreign tables. It's possible that could be added in a
later patch, but I have no plans to support that right now.


One difficult question is what names to use for the new aliases. I
think OLD and NEW are the most obvious and natural choices. However,
there is a problem - if they are used in a trigger function, they will
conflict. In PL/pgSQL, this leads to an error like the following:

ERROR:  column reference "new.f1" is ambiguous
LINE 3: RETURNING new.f1, new.f4
  ^
DETAIL:  It could refer to either a PL/pgSQL variable or a table column.

That's the same error that you'd get if a different alias name had
been chosen, and it happened to conflict with a user-defined PL/pgSQL
variable, except that in that case, the user could just change their
variable name to fix the problem, which is not possible with the
automatically-added OLD/NEW trigger variables. As a way round that, I
added a way to optionally change the alias used in the RETURNING list,
using the following syntax:

  RETURNING [ WITH ( { OLD | NEW } AS output_alias [, ...] ) ]
* | output_expression [ [ AS ] output_name ] [, ...]

for example:

  RETURNING WITH (OLD AS o) o.id, o.val, ...

I'm not sure how good a solution that is, but the syntax doesn't look
too bad to me (somewhat reminiscent of a WITH-query), and it's only
necessary in cases where there is a name conflict.

The simpler solution would be to just pick different alias names to
start with. The previous thread seemed to settle on BEFORE/AFTER, but
I don't find those names particularly intuitive or appealing. Over on
[1], PREVIOUS/CURRENT was suggested, which I prefer, but they still
don't seem as natural as OLD/NEW.

So, as is often the case, naming things turns out to be the hardest
problem, which is why I quite like the idea of letting the user pick
their own name, if they need to. In most contexts, OLD and NEW will
work, so they won't need to.

Thoughts?

Regards,
Dean

[1] 
https://www.postgresql.org/message-id/flat/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3...@mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/51822C0F.5030807%40gmail.com
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
new file mode 100644
index 2c62b0c..7f6f2c5
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -55,10 +55,15 @@
 
 typedef struct ExprSetupInfo
 {
-	/* Highest attribute numbers

Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-11-28 Thread Dean Rasheed

On Tue, 28 Nov 2023 at 03:42, Shubham Khanna
 wrote:
>
> I reviewed the given Patch and it is working fine.
>

Thanks for checking. Patch pushed.

Regards,
Dean

pgsql: psql: Add tab completion for view options.

2023-11-28 Thread Dean Rasheed

psql: Add tab completion for view options.

Add support for tab completion of WITH (...) options to CREATE VIEW,
and for the corresponding SET/RESET (...) options in ALTER VIEW.

Christoph Heiss, reviewed by Melih Mutlu, Vignesh C, Jim Jones,
Mikhail Gribkov, David Zhang, Shubham Khanna, and me.

Discussion: https://postgr.es/m/a2075c5a-66f9-a564-f038-9ac044b03...@c8h4.io

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/cd342474890f31a7364c0d1161334546822b639c

Modified Files
--
src/bin/psql/tab-complete.c | 50 +
1 file changed, 46 insertions(+), 4 deletions(-)

Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-11-23 Thread Dean Rasheed

On Mon, 14 Aug 2023 at 18:34, David Zhang  wrote:
>
> it would be great to switch the order of the 3rd and the 4th line to make a
> better match for "CREATE" and "CREATE OR REPLACE" .
>

I took a look at this, and I think it's probably neater to keep the
"AS SELECT" completion for CREATE [OR REPLACE] VIEW xxx WITH (*)
separate from the already existing support for "AS SELECT" without
WITH.

A couple of other points:

1. It looks slightly neater, and works better, to complete one word at
a time -- e.g., "WITH" then "(", instead of "WITH (", since the latter
doesn't work if the user has already typed "WITH".

2. It should also complete with "=" after the option, where appropriate.

3. CREATE VIEW should offer "local" and "cascaded" after
"check_option" (though there's no point in doing likewise for the
boolean options, since they default to true, if present, and false
otherwise).

Attached is an updated patch, incorporating those comments.

Barring any further comments, I think this is ready for commit.

Regards,
Dean
From 2f143cbfe185c723419a8fffcf5cbcd921f08ea7 Mon Sep 17 00:00:00 2001
From: Christoph Heiss 
Date: Mon, 7 Aug 2023 20:37:19 +0200
Subject: [PATCH v4] psql: Add tab completion for view options.

Add support for tab completion of WITH (...) options to CREATE VIEW,
and the corresponding SET/RESET (...) support to ALTER VIEW.

Christoph Heiss, reviewed by Melih Mutlu, Vignesh C, Jim Jones,
Mikhail Gribkov, David Zhang, and me.

Discussion: https://postgr.es/m/a2075c5a-66f9-a564-f038-9ac044b03...@c8h4.io
---
 src/bin/psql/tab-complete.c | 50 ++---
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 006e10f5d2..049801186c 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1329,6 +1329,13 @@ static const char *const table_storage_parameters[] = {
 	NULL
 };
 
+/* Optional parameters for CREATE VIEW and ALTER VIEW */
+static const char *const view_optional_parameters[] = {
+	"check_option",
+	"security_barrier",
+	"security_invoker",
+	NULL
+};
 
 /* Forward declaration of functions */
 static char **psql_completion(const char *text, int start, int end);
@@ -2216,8 +2223,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("TO");
 	/* ALTER VIEW  */
 	else if (Matches("ALTER", "VIEW", MatchAny))
-		COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
-	  "SET SCHEMA");
+		COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME", "RESET", "SET");
 	/* ALTER VIEW xxx RENAME */
 	else if (Matches("ALTER", "VIEW", MatchAny, "RENAME"))
 		COMPLETE_WITH_ATTR_PLUS(prev2_wd, "COLUMN", "TO");
@@ -2233,6 +2239,21 @@ psql_completion(const char *text, int start, int end)
 	/* ALTER VIEW xxx RENAME COLUMN yyy */
 	else if (Matches("ALTER", "VIEW", MatchAny, "RENAME", "COLUMN", MatchAnyExcept("TO")))
 		COMPLETE_WITH("TO");
+	/* ALTER VIEW xxx RESET ( */
+	else if (Matches("ALTER", "VIEW", MatchAny, "RESET"))
+		COMPLETE_WITH("(");
+	/* Complete ALTER VIEW xxx SET with "(" or "SCHEMA" */
+	else if (Matches("ALTER", "VIEW", MatchAny, "SET"))
+		COMPLETE_WITH("(", "SCHEMA");
+	/* ALTER VIEW xxx SET|RESET ( yyy [= zzz] ) */
+	else if (Matches("ALTER", "VIEW", MatchAny, "SET|RESET", "("))
+		COMPLETE_WITH_LIST(view_optional_parameters);
+	else if (Matches("ALTER", "VIEW", MatchAny, "SET", "(", MatchAny))
+		COMPLETE_WITH("=");
+	else if (Matches("ALTER", "VIEW", MatchAny, "SET", "(", "check_option", "="))
+		COMPLETE_WITH("local", "cascaded");
+	else if (Matches("ALTER", "VIEW", MatchAny, "SET", "(", "security_barrier|security_invoker", "="))
+		COMPLETE_WITH("true", "false");
 
 	/* ALTER MATERIALIZED VIEW  */
 	else if (Matches("ALTER", "MATERIALIZED", "VIEW", MatchAny))
@@ -3531,14 +3552,35 @@ psql_completion(const char *text, int start, int end)
 	}
 
 /* CREATE VIEW --- is allowed inside CREATE SCHEMA, so use TailMatches */
-	/* Complete CREATE [ OR REPLACE ] VIEW  with AS */
+	/* Complete CREATE [ OR REPLACE ] VIEW  with AS or WITH */
 	else if (TailMatches("CREATE", "VIEW", MatchAny) ||
 			 TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny))
-		COMPLETE_WITH("AS");
+		COMPLETE_WITH("AS", "WITH");
 	/* Complete "CREATE [ OR REPLACE ] VIEW  AS with "SELECT" */
 	else if (TailMatches("CREATE", "VIEW", MatchAny, "AS") ||
 			 TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny, "AS"))
 		COMPLETE_WITH("SELECT");
+	/* CREATE [ OR REPLACE ] VIEW  WITH ( yyy [= zzz] ) */
+	else if (TailMatches("CREATE", "VIEW", MatchAny, "WITH") ||
+			 TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny, "WITH"))
+		COMPLETE_WITH("(");
+	else if (TailMatches("CREATE", "VIEW", MatchAny, "WITH", "(") ||
+			 TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny, "WITH", "("))
+		COMPLETE_WITH_LIST(view_optional_parameters);
+	else if (TailMatches("CREATE", "VIEW", MatchAny, "WITH", "(", "check_option") ||
+			 TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny,

pgsql: Guard against overflow in interval_mul() and interval_div().

2023-11-18 Thread Dean Rasheed

Guard against overflow in interval_mul() and interval_div().

Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.

Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.

Given that these errors are relatively easy to hit, back-patch to all
supported branches.

Per bug #18200 from Alexander Lakhin, and subsequent investigation.

Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org

Branch
--
REL_12_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/f499d2b20b42c34a3941ca284ed58b95c0ce330c

Modified Files
--
src/backend/utils/adt/timestamp.c  | 70 --
src/test/regress/expected/interval.out | 13 +++
src/test/regress/sql/interval.sql  |  8 
3 files changed, 70 insertions(+), 21 deletions(-)

pgsql: Guard against overflow in interval_mul() and interval_div().

2023-11-18 Thread Dean Rasheed

Guard against overflow in interval_mul() and interval_div().

Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.

Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.

Given that these errors are relatively easy to hit, back-patch to all
supported branches.

Per bug #18200 from Alexander Lakhin, and subsequent investigation.

Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org

Branch
--
REL_14_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/2ffcebdba4cfa70fb1b6049ce85343e569304049

Modified Files
--
src/backend/utils/adt/timestamp.c  | 69 +++---
src/test/regress/expected/interval.out | 13 +++
src/test/regress/sql/interval.sql  |  8 
3 files changed, 69 insertions(+), 21 deletions(-)

pgsql: Guard against overflow in interval_mul() and interval_div().

2023-11-18 Thread Dean Rasheed

Guard against overflow in interval_mul() and interval_div().

Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.

Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.

Given that these errors are relatively easy to hit, back-patch to all
supported branches.

Per bug #18200 from Alexander Lakhin, and subsequent investigation.

Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org

Branch
--
REL_13_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/428770aadca974dd08d4dd002567edd79cd0e940

Modified Files
--
src/backend/utils/adt/timestamp.c  | 70 --
src/test/regress/expected/interval.out | 13 +++
src/test/regress/sql/interval.sql  |  8 
3 files changed, 70 insertions(+), 21 deletions(-)

pgsql: Guard against overflow in interval_mul() and interval_div().

2023-11-18 Thread Dean Rasheed

Guard against overflow in interval_mul() and interval_div().

Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.

Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.

Given that these errors are relatively easy to hit, back-patch to all
supported branches.

Per bug #18200 from Alexander Lakhin, and subsequent investigation.

Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/b218fbb7a35fcf31539bfad12732038fe082a2eb

Modified Files
--
src/backend/utils/adt/timestamp.c  | 103 +++--
src/test/regress/expected/interval.out |  13 +
src/test/regress/sql/interval.sql  |   8 +++
3 files changed, 80 insertions(+), 44 deletions(-)

pgsql: Guard against overflow in interval_mul() and interval_div().

2023-11-18 Thread Dean Rasheed

Guard against overflow in interval_mul() and interval_div().

Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.

Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.

Given that these errors are relatively easy to hit, back-patch to all
supported branches.

Per bug #18200 from Alexander Lakhin, and subsequent investigation.

Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org

Branch
--
REL_15_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/2851aa7d1fc3bd3dfac00ff2250a7e029ed6499f

Modified Files
--
src/backend/utils/adt/timestamp.c  | 69 +++---
src/test/regress/expected/interval.out | 13 +++
src/test/regress/sql/interval.sql  |  8 
3 files changed, 69 insertions(+), 21 deletions(-)

pgsql: Guard against overflow in interval_mul() and interval_div().

2023-11-18 Thread Dean Rasheed

Guard against overflow in interval_mul() and interval_div().

Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.

Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.

Given that these errors are relatively easy to hit, back-patch to all
supported branches.

Per bug #18200 from Alexander Lakhin, and subsequent investigation.

Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org

Branch
--
REL_16_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/72d0c135bd7a9ab4602ea1bece0054bb1e8d372d

Modified Files
--
src/backend/utils/adt/timestamp.c  | 69 +++---
src/test/regress/expected/interval.out | 13 +++
src/test/regress/sql/interval.sql  |  8 
3 files changed, 69 insertions(+), 21 deletions(-)

Re: MERGE ... RETURNING

2023-11-18 Thread Dean Rasheed

On Fri, 17 Nov 2023 at 04:30, jian he  wrote:
>
> I think it should be:
> +   You will require the SELECT privilege on any column(s)
> +   of the data_source and
> +   target_table_name referred to
> +   in any condition or expression.
>

Ah, of course. As always, I'm blind to grammatical errors in my own
text, no matter how many times I read it. Thanks for checking!

Pushed.

The v13 patch still applies on top of this, so I won't re-post it.

Regards,
Dean

pgsql: doc: improve description of privileges for MERGE and update glos

2023-11-18 Thread Dean Rasheed

doc: improve description of privileges for MERGE and update glossary.

On the MERGE page, the description of the privileges required could be
taken to imply that the SELECT privilege is required on all columns of
the data source, whereas actually it is only required on the columns
referred to by conditions or expressions in the MERGE command. Re-word
it to make that a little clearer, and mention expressions as well as
conditions.

Also, add a glossary entry for MERGE, and nearby on the glossary page,
mention MERGE in the list of commands that cannot update a
materialized view.

Noted by Jian He. Patch by me, reviewed by Jian He.

Discussion: 
https://postgr.es/m/CACJufxHuSoRXKwr0MtSFLXuT2nFVWcVfEWhxg7qdP9h%2Bs3a%2BUw%40mail.gmail.com

Branch
--
REL_16_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/3b6728910ace14c64ffa43b538a84508f4fe6408

Modified Files
--
doc/src/sgml/glossary.sgml  | 21 +++--
doc/src/sgml/ref/merge.sgml | 10 +-
2 files changed, 24 insertions(+), 7 deletions(-)

pgsql: doc: improve description of privileges for MERGE and update glos

2023-11-18 Thread Dean Rasheed

doc: improve description of privileges for MERGE and update glossary.

On the MERGE page, the description of the privileges required could be
taken to imply that the SELECT privilege is required on all columns of
the data source, whereas actually it is only required on the columns
referred to by conditions or expressions in the MERGE command. Re-word
it to make that a little clearer, and mention expressions as well as
conditions.

Also, add a glossary entry for MERGE, and nearby on the glossary page,
mention MERGE in the list of commands that cannot update a
materialized view.

Noted by Jian He. Patch by me, reviewed by Jian He.

Discussion: 
https://postgr.es/m/CACJufxHuSoRXKwr0MtSFLXuT2nFVWcVfEWhxg7qdP9h%2Bs3a%2BUw%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/4bc8f29088f8a2e9c89e6c0819337c4c8bf5f20a

Modified Files
--
doc/src/sgml/glossary.sgml  | 21 +++--
doc/src/sgml/ref/merge.sgml | 10 +-
2 files changed, 24 insertions(+), 7 deletions(-)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1515 matches

Mail list logo