pg_dump: Remove obsolete trigger support

2024-01-08 Thread Peter Eisentraut
In 30e7c175b81, support for pre-9.2 servers was removed from pg_dump. 
But I found that a lot of dead code was left for supporting dumping 
triggers from those old versions, presumably because that code was not 
behind straightforward versioned "if" branches.  This patch removes the 
rest of the unneeded code.From 4fe58a45989d8cebb05d174e5e331160bf406023 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 8 Jan 2024 23:57:09 +0100
Subject: [PATCH] pg_dump: Remove obsolete trigger support

Remove for dumping triggers from pre-9.2 servers.  This should have
been removed as part of 30e7c175b81.
---
 src/bin/pg_dump/pg_dump.c | 194 +-
 src/bin/pg_dump/pg_dump.h |  10 --
 2 files changed, 2 insertions(+), 202 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 22d1e6cf922..a7d840d4cc8 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -7988,18 +7988,8 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int 
numTables)
i_oid,
i_tgrelid,
i_tgname,
-   i_tgfname,
-   i_tgtype,
-   i_tgnargs,
-   i_tgargs,
-   i_tgisconstraint,
-   i_tgconstrname,
-   i_tgconstrrelid,
-   i_tgconstrrelname,
i_tgenabled,
i_tgispartition,
-   i_tgdeferrable,
-   i_tginitdeferred,
i_tgdef;
 
/*
@@ -8038,7 +8028,6 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int 
numTables)
 */
appendPQExpBuffer(query,
  "SELECT t.tgrelid, t.tgname, "
- "t.tgfoid::pg_catalog.regproc 
AS tgfname, "
  
"pg_catalog.pg_get_triggerdef(t.oid, false) AS tgdef, "
  "t.tgenabled, t.tableoid, 
t.oid, "
  "t.tgparentid <> 0 AS 
tgispartition\n"
@@ -8062,7 +8051,6 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int 
numTables)
 */
appendPQExpBuffer(query,
  "SELECT t.tgrelid, t.tgname, "
- "t.tgfoid::pg_catalog.regproc 
AS tgfname, "
  
"pg_catalog.pg_get_triggerdef(t.oid, false) AS tgdef, "
  "t.tgenabled, t.tableoid, 
t.oid, t.tgisinternal as tgispartition\n"
  "FROM 
unnest('%s'::pg_catalog.oid[]) AS src(tbloid)\n"
@@ -8083,7 +8071,6 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int 
numTables)
 */
appendPQExpBuffer(query,
  "SELECT t.tgrelid, t.tgname, "
- "t.tgfoid::pg_catalog.regproc 
AS tgfname, "
  
"pg_catalog.pg_get_triggerdef(t.oid, false) AS tgdef, "
  "t.tgenabled, t.tableoid, 
t.oid, t.tgisinternal as tgispartition "
  "FROM 
unnest('%s'::pg_catalog.oid[]) AS src(tbloid)\n"
@@ -8102,7 +8089,6 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int 
numTables)
/* See above about pretty=true in pg_get_triggerdef */
appendPQExpBuffer(query,
  "SELECT t.tgrelid, t.tgname, "
- "t.tgfoid::pg_catalog.regproc 
AS tgfname, "
  
"pg_catalog.pg_get_triggerdef(t.oid, false) AS tgdef, "
  "t.tgenabled, false as 
tgispartition, "
  "t.tableoid, t.oid "
@@ -8121,18 +8107,8 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int 
numTables)
i_oid = PQfnumber(res, "oid");
i_tgrelid = PQfnumber(res, "tgrelid");
i_tgname = PQfnumber(res, "tgname");
-   i_tgfname = PQfnumber(res, "tgfname");
-   i_tgtype = PQfnumber(res, "tgtype");
-   i_tgnargs = PQfnumber(res, "tgnargs");
-   i_tgargs = PQfnumber(res, "tgargs");
-   i_tgisconstraint = PQfnumber(res, "tgisconstraint");
-   i_tgconstrname = PQfnumber(res, "tgconstrname");
-   i_tgconstrrelid = PQfnumber(res, "tgconstrrelid");
-   i_tgconstrrelname = PQfnumber(res, "tgconstrrelname");
i_tgenabled = 

Re: Make psql ignore trailing semicolons in \sf, \ef, etc

2024-01-08 Thread Laurenz Albe
On Mon, 2024-01-08 at 15:48 -0500, Tom Lane wrote:
> Is this enough of a bug to deserve back-patching?  I'm not sure.

I like the patch, but I wouldn't back-patch it.  I'd call the current
behavior a slight inconsistency rather than an outright bug, and I think
that we should be conservative with back-patching.

Yours,
Laurenz Albe




Re: INFORMATION_SCHEMA note

2024-01-08 Thread Daniel Gustafsson
> On 9 Jan 2024, at 00:54, Tatsuo Ishii  wrote:
> 
>>> On 4 Jan 2024, at 13:39, Tatsuo Ishii  wrote:
>> 
 Attached is the patch that does this.
>> 
>> I don't think the patch was attached?
>> 
>>> Any objection?
>> 
>> I didn't study the RFC in depth but as expected it seems to back up your 
>> change
>> so the change seems reasonable.
> 
> Oops. Sorry. Patch attached.

That's exactly what I expected it to be, and it LGTM.

--
Daniel Gustafsson





Re: Parallelize correlated subqueries that execute within each worker

2024-01-08 Thread vignesh C
On Tue, 4 Jul 2023 at 06:56, James Coleman  wrote:
>
> On Sun, Jun 11, 2023 at 10:23 PM James Coleman  wrote:
> >
> > ...
> > > And while trying the v9 patch I came across a crash with the query
> > > below.
> > >
> > > set min_parallel_table_scan_size to 0;
> > > set parallel_setup_cost to 0;
> > > set parallel_tuple_cost to 0;
> > >
> > > explain (costs off)
> > > select * from pg_description t1 where objoid in
> > > (select objoid from pg_description t2 where t2.description = 
> > > t1.description);
> > >QUERY PLAN
> > > 
> > >  Seq Scan on pg_description t1
> > >Filter: (SubPlan 1)
> > >SubPlan 1
> > >  ->  Gather
> > >Workers Planned: 2
> > >->  Parallel Seq Scan on pg_description t2
> > >  Filter: (description = t1.description)
> > > (7 rows)
> > >
> > > select * from pg_description t1 where objoid in
> > > (select objoid from pg_description t2 where t2.description = 
> > > t1.description);
> > > WARNING:  terminating connection because of crash of another server 
> > > process
> > >
> > > Seems something is wrong when extracting the argument from the Param in
> > > parallel worker.
> >
> > With what I'm trying to change I don't think this plan should ever be
> > generated since it means we'd have to pass a param from the outer seq
> > scan into the parallel subplan, which we can't do (currently).
> >
> > I've attached the full backtrace to the email, but as you hinted at
> > the parallel worker is trying to get the param (in this case
> > detoasting it), but the param doesn't exist on the worker, so it seg
> > faults. Looking at this further I think there's an existing test case
> > that exposes the misplanning here (the one right under the comment
> > "Parallel Append is not to be used when the subpath depends on the
> > outer param" in select_parallel.sql), but it doesn't seg fault because
> > the param is an integer, doesn't need to be detoasted, and therefore
> > (I think) we skate by (but probably with wrong results in depending on
> > the dataset).
> >
> > Interestingly this is one of the existing test queries my original
> > patch approach didn't change, so this gives me something specific to
> > work with improving the path. Thanks for testing this and bringing
> > this to my attention!
>
> Here's what I've found debugging this:
>
> There's only a single gather path ever created when planning this
> query, making it easy to know which one is the problem. That gather
> path is created with this stacktrace:
>
> frame #0: 0x000105291590
> postgres`create_gather_path(root=0x00013081ae78,
> rel=0x00013080c8e8, subpath=0x00013081c080,
> target=0x00013081c8c0, required_outer=0x,
> rows=0x) at pathnode.c:1971:2
> frame #1: 0x000105208e54
> postgres`generate_gather_paths(root=0x00013081ae78,
> rel=0x00013080c8e8, override_rows=false) at allpaths.c:3097:4
> frame #2: 0x0001052090ec
> postgres`generate_useful_gather_paths(root=0x00013081ae78,
> rel=0x00013080c8e8, override_rows=false) at allpaths.c:3241:2
> frame #3: 0x000105258754
> postgres`apply_scanjoin_target_to_paths(root=0x00013081ae78,
> rel=0x00013080c8e8, scanjoin_targets=0x00013081c978,
> scanjoin_targets_contain_srfs=0x,
> scanjoin_target_parallel_safe=true, tlist_same_exprs=true) at
> planner.c:7696:3
> frame #4: 0x0001052533cc
> postgres`grouping_planner(root=0x00013081ae78, tuple_fraction=0.5)
> at planner.c:1611:3
> frame #5: 0x000105251e9c
> postgres`subquery_planner(glob=0x0001308188d8,
> parse=0x00013080caf8, parent_root=0x00013080cc38,
> hasRecursion=false, tuple_fraction=0.5) at planner.c:1062:2
> frame #6: 0x00010526b134
> postgres`make_subplan(root=0x00013080cc38,
> orig_subquery=0x00013080ff58, subLinkType=ANY_SUBLINK,
> subLinkId=0, testexpr=0x00013080d848, isTopQual=true) at
> subselect.c:221:12
> frame #7: 0x000105268b8c
> postgres`process_sublinks_mutator(node=0x00013080d6d8,
> context=0x00016b0998f8) at subselect.c:1950:10
> frame #8: 0x000105268ad8
> postgres`SS_process_sublinks(root=0x00013080cc38,
> expr=0x00013080d6d8, isQual=true) at subselect.c:1923:9
> frame #9: 0x0001052527b8
> postgres`preprocess_expression(root=0x00013080cc38,
> expr=0x00013080d6d8, kind=0) at planner.c:1169:10
> frame #10: 0x000105252954
> postgres`preprocess_qual_conditions(root=0x00013080cc38,
> jtnode=0x00013080d108) at planner.c:1214:14
> frame #11: 0x000105251580
> postgres`subquery_planner(glob=0x0001308188d8,
> parse=0x000137010d68, parent_root=0x,
> hasRecursion=false, tuple_fraction=0) at planner.c:832:2
> frame #12: 0x00010525042c
> postgres`standard_planner(parse=0x000137010d68,
> query_string="explain (costs 

RE: speed up a logical replica setup

2024-01-08 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> > > I don't see any harm in users giving those information but we should
> > > have some checks to ensure that the server is in standby mode and is
> > > running locally. The other related point is do we need to take input
> > > for the target cluster directory from the user? Can't we fetch that
> > > information once we are connected to standby?
> >
> > I think that functions like inet_client_addr() may be able to use, but it 
> > returns
> > NULL only when the connection is via a Unix-domain socket. Can we restrict
> > pg_subscriber to use such a socket?
> >
> 
> Good question. So, IIUC, this tool has a requirement to run locally
> where standby is present because we want to write reconvery.conf file.
> I am not sure if it is a good idea to have a restriction to use only
> the unix domain socket as users need to set up the standby for that by
> configuring unix_socket_directories. It is fine if we can't ensure
> that it is running locally but we should at least ensure that the
> server is a physical standby node to avoid the problems as Shlok has
> reported.

While thinking more about it, I found that we did not define the policy
whether user must not connect to the target while running pg_subscriber. What
should be? If it should be avoided, some parameters like listen_addresses and
unix_socket_permissions should be restricted like start_postmaster() in
pg_upgrade/server.c. Also, the port number should be changed to another value
as well.

Personally, I vote to reject connections during the pg_subscriber.

> On a related point, I see that the patch stops the standby server (if
> it is running) before starting with subscriber-side steps. I was
> wondering if users can object to it that there was some important data
> replication in progress which this tool has stopped. Now, OTOH,
> anyway, once the user uses pg_subscriber, the standby server will be
> converted to a subscriber, so it may not be useful as a physical
> replica. Do you or others have any thoughts on this matter?

I assumed that connections should be closed before running pg_subscriber. If so,
it may be better to just fail the command when the physical standby has already
been started. There is no answer  whether data replication and user queries
should stop. Users should choose the stop option based on their policy and then
pg_subscriebr can start postmaster.
pg_upgrade does the same thing in setup().



Further comment:
According to the doc, currently pg_subscriber is listed in the client 
application.
But based on the definition, I felt it should be at "PostgreSQL Server 
Applications"
page. How do you think? The definition is:

>
This part contains reference information for PostgreSQL server applications and
support utilities. These commands can only be run usefully on the host where the
database server resides. Other utility programs are listed in PostgreSQL Client
Applications.
>

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Relation bulk write facility

2024-01-08 Thread vignesh C
On Sat, 25 Nov 2023 at 06:49, Heikki Linnakangas  wrote:
>
> On 19/11/2023 02:04, Andres Freund wrote:
> > On 2023-11-17 11:37:21 +0100, Heikki Linnakangas wrote:
> >> The new facility makes it easier to optimize bulk loading, as the
> >> logic for buffering, WAL-logging, and syncing the relation only needs
> >> to be implemented once. It's also less error-prone: We have had a
> >> number of bugs in how a relation is fsync'd - or not - at the end of a
> >> bulk loading operation. By centralizing that logic to one place, we
> >> only need to write it correctly once.
> >
> > One thing I'd like to use the centralized handling for is to track such
> > writes in pg_stat_io. I don't mean as part of the initial patch, just that
> > that's another reason I like the facility.
>
> Oh I didn't realize they're not counted at the moment.
>
> >> +bulkw = bulkw_start_smgr(dst, forkNum, use_wal);
> >> +
> >>  nblocks = smgrnblocks(src, forkNum);
> >>
> >>  for (blkno = 0; blkno < nblocks; blkno++)
> >>  {
> >> +Pagepage;
> >> +
> >>  /* If we got a cancel signal during the copy of the data, 
> >> quit */
> >>  CHECK_FOR_INTERRUPTS();
> >>
> >> -smgrread(src, forkNum, blkno, buf.data);
> >> +page = bulkw_alloc_buf(bulkw);
> >> +smgrread(src, forkNum, blkno, page);
> >>
> >>  if (!PageIsVerifiedExtended(page, blkno,
> >>  
> >> PIV_LOG_WARNING | PIV_REPORT_STAT))
> >> @@ -511,30 +514,9 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation 
> >> dst,
> >>   * page this is, so we have to log the full page including 
> >> any unused
> >>   * space.
> >>   */
> >> -if (use_wal)
> >> -log_newpage(>smgr_rlocator.locator, forkNum, 
> >> blkno, page, false);
> >> -
> >> -PageSetChecksumInplace(page, blkno);
> >> -
> >> -/*
> >> - * Now write the page.  We say skipFsync = true because 
> >> there's no
> >> - * need for smgr to schedule an fsync for this write; we'll 
> >> do it
> >> - * ourselves below.
> >> - */
> >> -smgrextend(dst, forkNum, blkno, buf.data, true);
> >> +bulkw_write(bulkw, blkno, page, false);
> >
> > I wonder if bulkw_alloc_buf() is a good name - if you naively read this
> > change, it looks like it'll just leak memory. It also might be taken to be
> > valid until freed, which I don't think is the case?
>
> Yeah, I'm not very happy with this interface. The model is that you get
> a buffer to write to by calling bulkw_alloc_buf(). Later, you hand it
> over to bulkw_write(), which takes ownership of it and frees it later.
> There is no other function to free it, although currently the buffer is
> just palloc'd so you could call pfree on it.
>
> However, I'd like to not expose that detail to the callers. I'm
> imagining that in the future we might optimize further, by having a
> larger e.g. 1 MB buffer, and carve the 8kB blocks from that. Then
> opportunistically, if you fill the buffers sequentially, bulk_write.c
> could do one smgrextend() to write the whole 1 MB chunk.
>
> I renamed it to bulkw_get_buf() now, and made it return a new
> BulkWriteBuffer typedef instead of a plain Page. The point of the new
> typedef is to distinguish a buffer returned by bulkw_get_buf() from a
> Page or char[BLCKSZ] that you might palloc on your own. That indeed
> revealed some latent bugs in gistbuild.c where I had mixed up buffers
> returned by bulkw_alloc_buf() and palloc'd buffers.
>
> (The previous version of this patch called a different struct
> BulkWriteBuffer, but I renamed that to PendingWrite; see below. Don't be
> confused!)
>
> I think this helps a little, but I'm still not very happy with it. I'll
> give it some more thought after sleeping over it, but in the meanwhile,
> I'm all ears for suggestions.
>
> >> +/*-
> >> + *
> >> + * bulk_write.c
> >> + *Efficiently and reliably populate a new relation
> >> + *
> >> + * The assumption is that no other backends access the relation while we 
> >> are
> >> + * loading it, so we can take some shortcuts.  Alternatively, you can use 
> >> the
> >> + * buffer manager as usual, if performance is not critical, but you must 
> >> not
> >> + * mix operations through the buffer manager and the bulk loading 
> >> interface at
> >> + * the same time.
> >
> >  From "Alternatively" onward this is is somewhat confusing.
>
> Rewrote that to just "Do not mix operations through the regular buffer
> manager and the bulk loading interface!"
>
> >> + * One tricky point is that because we bypass the buffer manager, we need 
> >> to
> >> + * register the relation for fsyncing at the next checkpoint ourselves, 
> >> and
> >> + * make sure that the relation is correctly fsync by us or 

Re: SQL:2011 application time

2024-01-08 Thread vignesh C
On Sat, 6 Jan 2024 at 05:50, Paul Jungwirth  
wrote:
>
> Getting caught up on reviews from November and December:
>
> On 11/19/23 22:57, jian he wrote:
>  >
>  > I believe the following part should fail. Similar tests on
>  > src/test/regress/sql/generated.sql. line begin 347.
>  >
>  > drop table if exists gtest23a,gtest23x cascade;
>  > CREATE TABLE gtest23a (x int4range, y int4range,
>  > CONSTRAINT gtest23a_pk PRIMARY KEY (x, y WITHOUT OVERLAPS));
>  > CREATE TABLE gtest23x (a int4range, b int4range GENERATED ALWAYS AS
>  > ('empty') STORED,
>  > FOREIGN KEY (a, PERIOD b ) REFERENCES gtest23a(x, PERIOD y) ON UPDATE
>  > CASCADE);  -- should be error?
>
> Okay, I've added a restriction for temporal FKs too. But note this will
> change once the PERIODs patch (the last one here) is finished. When the
> generated column is for a PERIOD, there will be logic to "reroute" the
> updates to the constituent start/end columns instead.
>
>  > begin;
>  > drop table if exists fk, pk cascade;
>  > CREATE TABLE pk (id int4range, valid_at int4range,
>  > CONSTRAINT pk_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
>  > );
>  > CREATE TABLE fk (
>  > id int4range,valid_at tsrange, parent_id int4range,
>  > CONSTRAINT fk FOREIGN KEY (parent_id, valid_at)
>  >  REFERENCES pk
>  > );
>  > rollback;
>  > --
>  > the above query will return an error: number of referencing and
>  > referenced columns for foreign key disagree.
>  > but if you look at it closely, primary key and foreign key columns both 
> are two!
>  > The error should be saying valid_at should be specified with "PERIOD".
>
> Ah okay, thanks for the clarification! This is tricky because the user
> left out the PERIOD on the fk side, and left out the entire pk side, so
> those columns are just implicit. So there is no PERIOD anywhere.
> But I agree that if the pk has WITHOUT OVERLAPS, we should expect a
> corresponding PERIOD modifier on the fk side and explain that that's
> what's missing. The attached patches include that.
>
>  > I found out other issues in v18.
>  > I first do `git apply` then  `git diff --check`, there is a white
>  > space error in v18-0005.
>
> Fixed, thanks!
>
>  > You also need to change update.sgml and delete.sgml Outputs 
> part.
>  > Since at most, it can return 'UPDATE 3' or 'DELETE 3'.
>
> This doesn't sound correct to me. An UPDATE or DELETE can target many
> rows. Also I don't think the inserted "leftovers" should be included in
> these counts. They represent the rows updated/deleted.
>
>  > --the following query should work?
>  > drop table pk;
>  > CREATE table pk(a numrange PRIMARY key,b text);
>  > insert into pk values('[1,10]');
>  > create or replace function demo1() returns void as $$
>  > declare lb numeric default 1; up numeric default 3;
>  > begin
>  >  update pk for portion of a from lb to up set b = 'lb_to_up';
>  >  return;
>  > end
>  > $$ language plpgsql;
>  > select * from demo1();
>
> Hmm this is a tough one. It is correct that the `FROM __ TO __` values cannot 
> be column references.
> They are computed up front, not per row. One reason is they are used to 
> search the table. In fact
> the standard basically allows nothing but literal strings here. See section 
> 14.14, page 971 then
> look up  on page 348 and  on page 
> 308. The most
> flexibility you get is you can add/subtract an interval to the datetime 
> literal. We are already well
> past that by allowing expressions, (certain) functions, parameters, etc.
>
> OTOH in your plpgsql example they are not really columns. They just get 
> represented as ColumnRefs
> and then passed to transformColumnRef. I'm surprised plpgsql does it that 
> way. As a workaround you
> could use `EXECUTE format(...)`, but I'd love to make that work as you show 
> instead. I'll keep
> working on this one but it's not done yet. Perhaps I can move the restriction 
> into
> analysis/planning. If anyone has any advice it'd be welcome.
>
> On 12/6/23 05:22, jian he wrote:
>  > this TODO:
>  >   * TODO: It sounds like FOR PORTION OF might need to do something here 
> too?
>  > based on comments on ExprContext. I refactor a bit, and solved this TODO.
>
> The patch looks wrong to me. We need to range targeted by `FROM __
> TO __` to live for the whole statement, not just one tuple (see just
> above). That's why it gets computed in the Init function node.
>
> I don't think that TODO is needed anymore at all. Older versions of the
> patch had more expressions besides this one, and I think it was those I
> was concerned about. So I've removed the comment here.
>
>  > tring to the following TODO:
>  > // TODO: Need to save context->mtstate->mt_transition_capture? (See
>  > comment on ExecInsert)
>  >
>  > but failed.
>  > I also attached the trial, and also added the related test.
>  >
>  > You can also use the test to check portion update with insert trigger
>  > with "referencing old table as old_table new table as new_table"
>  > situation.
>
> Thank you for the test 

Re: Skipping schema changes in publication

2024-01-08 Thread vignesh C
On Fri, 20 Jan 2023 at 15:30, vignesh C  wrote:
>
> On Wed, 16 Nov 2022 at 15:35, vignesh C  wrote:
> >
> > On Wed, 16 Nov 2022 at 09:34, Ian Lawrence Barwick  
> > wrote:
> > >
> > > 2022年11月7日(月) 22:39 vignesh C :
> > > >
> > > > On Fri, 4 Nov 2022 at 08:19, Ian Lawrence Barwick  
> > > > wrote:
> > > > >
> > > > > Hi
> > > > >
> > > > > cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 
> > > > > is
> > > > > currently underway, this would be an excellent time to update the 
> > > > > patch.
> > > > >
> > > > > [1] http://cfbot.cputube.org/patch_40_3646.log
> > > >
> > > > Here is an updated patch which is rebased on top of HEAD.
> > >
> > > Thanks for the updated patch.
> > >
> > > While reviewing the patch backlog, we have determined that this patch adds
> > > one or more TAP tests but has not added the test to the "meson.build" 
> > > file.
> >
> > Thanks, I have updated the meson.build to include the TAP test. The
> > attached patch has the changes for the same.
>
> The patch was not applying on top of HEAD, attached a rebased version.

As I did not see much interest from others, I'm withdrawing this patch
for now. But if there is any interest others in future, I would be
more than happy to work on this feature.

Regards,
Vignesh




Re: Speed up transaction completion faster after many relations are accessed in a transaction

2024-01-08 Thread vignesh C
On Thu, 9 Nov 2023 at 21:48, Heikki Linnakangas  wrote:
>
> On 18/09/2023 07:08, David Rowley wrote:
> > On Fri, 15 Sept 2023 at 22:37, Heikki Linnakangas  wrote:
> >>> I've added a call to LockAssertNoneHeld(false) in there.
> >>
> >> I don't see it in the patch?
> >
> > hmm. I must've git format-patch before committing that part.
> >
> > I'll try that again... see attached.
>
> This needed a rebase after my ResourceOwner refactoring. Attached.
>
> A few quick comments:
>
> - It would be nice to add a test for the issue that you fixed in patch
> v7, i.e. if you prepare a transaction while holding session-level locks.
>
> - GrantLockLocal() now calls MemoryContextAlloc(), which can fail if you
> are out of memory. Is that handled gracefully or is the lock leaked?

CFBot shows one of the test has aborted at [1] with:
[20:54:28.535] Core was generated by `postgres: subscriber: logical
replication apply worker for subscription 16397 '.
[20:54:28.535] Program terminated with signal SIGABRT, Aborted.
[20:54:28.535] #0  __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
[20:54:28.535] Download failed: Invalid argument.  Continuing without
source file ./signal/../sysdeps/unix/sysv/linux/raise.c.
[20:54:28.627]
[20:54:28.627] Thread 1 (Thread 0x7f0ea02d1a40 (LWP 50984)):
[20:54:28.627] #0  __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
...
...
[20:54:28.627] #2  0x5618e989d62f in ExceptionalCondition
(conditionName=conditionName@entry=0x5618e9b40f70
"dlist_is_empty(&(MyProc->myProcLocks[i]))",
fileName=fileName@entry=0x5618e9b40ec0
"../src/backend/storage/lmgr/proc.c", lineNumber=lineNumber@entry=856)
at ../src/backend/utils/error/assert.c:66
[20:54:28.627] No locals.
[20:54:28.627] #3  0x5618e95e6847 in ProcKill (code=, arg=) at ../src/backend/storage/lmgr/proc.c:856
[20:54:28.627] i = 
[20:54:28.627] proc = 
[20:54:28.627] procgloballist = 
[20:54:28.627] __func__ = "ProcKill"
[20:54:28.627] #4  0x5618e959ebcc in shmem_exit
(code=code@entry=1) at ../src/backend/storage/ipc/ipc.c:276
[20:54:28.627] __func__ = "shmem_exit"
[20:54:28.627] #5  0x5618e959ecd0 in proc_exit_prepare
(code=code@entry=1) at ../src/backend/storage/ipc/ipc.c:198
[20:54:28.627] __func__ = "proc_exit_prepare"
[20:54:28.627] #6  0x5618e959ee8e in proc_exit (code=code@entry=1)
at ../src/backend/storage/ipc/ipc.c:111
[20:54:28.627] __func__ = "proc_exit"
[20:54:28.627] #7  0x5618e94aa54d in BackgroundWorkerMain () at
../src/backend/postmaster/bgworker.c:805
[20:54:28.627] local_sigjmp_buf = {{__jmpbuf =
{94665009627112, -3865857745677845768, 0, 0, 140732736634980, 1,
3865354362587970296, 7379258256398875384}, __mask_was_saved = 1,
__saved_mask = {__val = {18446744066192964099, 94665025527920,
94665025527920, 94665025527920, 0, 94665025528120, 8192, 1,
94664997686410, 94665009627040, 94664997622076, 94665025527920, 1, 0,
0, 140732736634980
[20:54:28.627] worker = 0x5618eb37c570
[20:54:28.627] entrypt = 
[20:54:28.627] __func__ = "BackgroundWorkerMain"
[20:54:28.627] #8  0x5618e94b495c in do_start_bgworker
(rw=rw@entry=0x5618eb3b73c8) at
../src/backend/postmaster/postmaster.c:5697
[20:54:28.627] worker_pid = 
[20:54:28.627] __func__ = "do_start_bgworker"
[20:54:28.627] #9  0x5618e94b4c32 in maybe_start_bgworkers () at
../src/backend/postmaster/postmaster.c:5921
[20:54:28.627] rw = 0x5618eb3b73c8
[20:54:28.627] num_launched = 0
[20:54:28.627] now = 0
[20:54:28.627] iter = {cur = 0x5618eb3b79a8, next =
0x5618eb382a20, prev = 0x5618ea44a980 }
[20:54:28.627] #10 0x5618e94b574a in process_pm_pmsignal () at
../src/backend/postmaster/postmaster.c:5073
[20:54:28.627] __func__ = "process_pm_pmsignal"
[20:54:28.627] #11 0x5618e94b5f4a in ServerLoop () at
../src/backend/postmaster/postmaster.c:1760

[1] - https://cirrus-ci.com/task/5118173163290624?logs=cores#L51

Regards,
Vignesh




Re: Removing unneeded self joins

2024-01-08 Thread Alexander Korotkov
On Tue, Jan 9, 2024 at 6:00 AM Alexander Lakhin  wrote:
> 09.01.2024 01:09, Alexander Korotkov wrote:
> > Fixed in 30b4955a46.
>
> Thank you for fixing that!
>
> I've found another anomaly coined with d3d55ce57. This query:
> CREATE TABLE t(a int PRIMARY KEY, b int);
> INSERT INTO t VALUES  (1, 1), (2, 1);
>
> WITH t1 AS (SELECT * FROM t)
> UPDATE t SET b = t1.b + 1 FROM t1
> WHERE t.a = t1.a RETURNING t.a, t1.b;
>
> gives "ERROR:  variable not found in subplan target lists" on d3d55ce57, but
> starting from a7928a57b it gives an incorrect result:
>   a | b
> ---+---
>   1 | 2
>   2 | 2
> (2 rows)

I see.  It seems to be not safe to apply SJE to the modify table
target relation because it could use a different snapshot for the
RETURNING clause.  I think we should just forbid SJE to involve the
modify table target relation.  I'm planning to fix this later today.

--
Regards,
Alexander Korotkov




Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-08 Thread Michael Paquier
On Mon, Jan 08, 2024 at 03:50:47PM -0500, Robert Haas wrote:
> Hmm, interesting. I haven't had time to study this fully today, but I
> think 0001 looks fine and could just be committed. Hooray for killing
> useless variables with dumb names.

I've been looking at 0001 a couple of weeks ago and thought that it
was fine because there's only one caller of lazy_scan_prune() and one
caller of lazy_scan_noprune() so all the code paths were covered.

+   /* rel truncation is unsafe */
+   if (hastup)
+   vacrel->nonempty_pages = blkno + 1;

Except for this comment that I found misleading because this is not
about the fact that truncation is unsafe, it's about correctly
tracking the the last block where we have tuples to ensure a correct
truncation.  Perhaps this could just reuse "Remember the location of 
the last page with nonremovable tuples"?  If people object to that,
feel free.
--
Michael


signature.asc
Description: PGP signature


RE: Random pg_upgrade test failure on drongo

2024-01-08 Thread Hayato Kuroda (Fujitsu)
Dear Amit, Alexander,

> > We get the effect discussed when the background writer process decides to
> > flush a file buffer for pg_largeobject during stage 1.
> > (Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
> > the result must be the same.)
> > And another important factor is shared_buffers = 1MB (set during the test).
> > With the default setting of 128MB I couldn't see the failure.
> >
> > It can be reproduced easily (on old Windows versions) just by running
> > pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
> > default cluster)).
> > If an old cluster contains dozen of databases, this increases the failure
> > probability significantly (with 10 additional databases I've got failures
> > on iterations 4, 1, 6).
> >
> 
> I don't have an old Windows environment to test but I agree with your
> analysis and theory. The question is what should we do for these new
> random BF failures? I think we should set bgwriter_lru_maxpages to 0
> and checkpoint_timeout to 1hr for these new tests. Doing some invasive
> fix as part of this doesn't sound reasonable because this is an
> existing problem and there seems to be another patch by Thomas that
> probably deals with the root cause of the existing problem [1] as
> pointed out by you.
> 
> [1] - https://commitfest.postgresql.org/40/3951/

Based on the suggestion by Amit, I have created a patch with the alternative
approach. This just does GUC settings. The reported failure is only for
003_logical_slots, but the patch also includes changes for the recently added
test, 004_subscription. IIUC, there is a possibility that 004 would fail as 
well.

Per our understanding, this patch can stop random failures. Alexander, can you
test for the confirmation?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



0001-Add-GUC-settings-to-avoid-random-failures-on-Windows.patch
Description:  0001-Add-GUC-settings-to-avoid-random-failures-on-Windows.patch


Re: SQL:2011 application time

2024-01-08 Thread jian he
On Tue, Jan 9, 2024 at 2:54 AM Paul Jungwirth
 wrote:
>
> On 1/8/24 06:54, jian he wrote:
>  > On Fri, Jan 5, 2024 at 1:06 PM jian he  wrote:
>  >
>  > range_intersect returns the intersection of two ranges.
>  > I think here we are doing the opposite.
>  > names the main SQL function "range_not_intersect" and the internal
>  > function as "range_not_intersect_internal" should be fine.
>  > so people don't need to understand the meaning of "portion".
>
> Thank you for helping me figure out a name here! I realize that can be a 
> bike-sheddy kind of
> discussion, so let me share some of my principles.
>
> Range and multirange are highly mathematically "pure", and that's something I 
> value in them. It
> makes them more general-purpose, less encumbered by edge cases, easier to 
> combine, and easier to
> reason about. Preserving that close connection to math is a big goal.
>
> What I've called `without_portion` is (like) a closed form of minus (hence 
> `@-` for the operator).
> Minus isn't closed under everything (e.g. ranges), so `without_portion` adds 
> arrays---much as to
> close subtraction we add negative numbers and to close division we add 
> rationals). We get the same
> effect from multiranges, but that only buys us range support. It would be 
> awesome to support
> arbitrary types: ranges, multiranges, mdranges, boxes, polygons, inets, etc., 
> so I think an array is
> the way to go here. And then each array element is a "leftover". What do we 
> call a closed form of
> minus that returns arrays?
>
> Of course "minus" is already taken (and you wouldn't expect it to return 
> arrays anyway), which is
> why I'm thinking about names like "without" or "except". Or maybe 
> "multi-minus". I still think
> "without portion" is the closest to capturing everything above (and avoids 
> ambiguity with other SQL
> operations). And the "portion" ties the operator to `FOR PORTION OF`, which 
> is its purpose. But I
> wouldn't be surprised if there were something better.
>

Thanks for the deep explanation. I think the name
range_without_portion is better than my range_not_intersect.
I learned a lot.
I also googled " bike-sheddy". haha.

src5=# select range_without_portion(numrange(1.0,3.0,'[]'),
numrange(1.5,2.0,'(]'));
   range_without_portion
---
 {"[1.0,1.5]","(2.0,3.0]"}
(1 row)

src5=# \gdesc
Column |   Type
---+---
 range_without_portion | numeric[]
(1 row)

src5=# \df range_without_portion
 List of functions
   Schema   | Name  | Result data type | Argument data
types | Type
+---+--+-+--
 pg_catalog | range_without_portion | anyarray | anyrange,
anyrange  | func
(1 row)

so apparently, you cannot from (anyrange, anyrange) get anyarray the
element type is anyrange.
I cannot find the documented explanation in
https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC

anyrange is POLYMORPHIC, anyarray is POLYMORPHIC,
but I suppose, getting an anyarray the element type is anyrange would be hard.




Re: Add a perl function in Cluster.pm to generate WAL

2024-01-08 Thread Bertrand Drouvot
Hi,

On Tue, Jan 09, 2024 at 01:59:08PM +0900, Michael Paquier wrote:
> On Mon, Jan 08, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote:
> > Yes, I've added (VERBOSE) and also cut down the test to catch the failure 
> > faster.

Thanks Alexander!

> > The difference between a successful and a failed run:
> >     tuples: 1 removed, 15 remain, 0 are dead but not yet removable
> > [...]
> >     tuples: 0 removed, 16 remain, 1 are dead but not yet removable
> 
> Yep, it's clear that the horizon is not stable.

Yeap.

> 
> > With FREEZE, 10 iterations with 20 tests in parallel succeeded for me
> > (while without it, I get failures on iterations 1,2).
> > 
> > [1] 
> > https://www.postgresql.org/message-id/b2a1f7d0-7629-72c0-2da7-e9c4e336b010%40gmail.com
> 
> Alexander, does the test gain in stability once you begin using the
> patch posted on [2], mentioned by Bertrand?

Alexander, pleae find attached v3 which is more or less a rebased version of it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 055349da9eba780caadeb3615749b3b5453a59c4 Mon Sep 17 00:00:00 2001
From: bdrouvot 
Date: Tue, 9 Jan 2024 05:08:35 +
Subject: [PATCH v3] Fix 035_standby_logical_decoding.pl race condition

We want to ensure that vacuum was able to remove dead rows (aka no other
transactions holding back global xmin) before testing for slots invalidation on
the standby.
---
 .../t/035_standby_logical_decoding.pl | 56 +--
 1 file changed, 27 insertions(+), 29 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 8bc39a5f03..9be1ff9b58 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -238,6 +238,22 @@ sub check_for_invalidation
 	) or die "Timed out waiting confl_active_logicalslot to be updated";
 }
 
+# Launch $sql and wait for a new snapshot that has a newer horizon before
+# doing the vacuum with $vac_option on $to_vac.
+sub wait_until_vacuum_can_remove
+{
+	my ($vac_option, $sql, $to_vac) = @_;
+
+	my $xid = $node_primary->safe_psql('testdb', qq[$sql
+	select txid_current();]);
+
+	$node_primary->poll_query_until('testdb',
+		"SELECT (select txid_snapshot_xmin(txid_current_snapshot()) - $xid) > 0")
+	  or die "new snapshot does not have a newer horizon";
+
+	$node_primary->safe_psql('testdb', qq[VACUUM $vac_option verbose $to_vac;
+		  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal;]);
+}
 
 # Initialize primary node
 
@@ -248,6 +264,7 @@ $node_primary->append_conf(
 wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
+autovacuum = off
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -468,13 +485,8 @@ reactive_slots_change_hfs_and_wait_for_xmins('behaves_ok_', 'vacuum_full_',
 	0, 1);
 
 # This should trigger the conflict
-$node_primary->safe_psql(
-	'testdb', qq[
-  CREATE TABLE conflict_test(x integer, y text);
-  DROP TABLE conflict_test;
-  VACUUM full pg_class;
-  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
-]);
+wait_until_vacuum_can_remove('full', 'CREATE TABLE conflict_test(x integer, y text);
+ DROP TABLE conflict_test;', 'pg_class');
 
 $node_primary->wait_for_replay_catchup($node_standby);
 
@@ -550,13 +562,8 @@ reactive_slots_change_hfs_and_wait_for_xmins('vacuum_full_', 'row_removal_',
 	0, 1);
 
 # This should trigger the conflict
-$node_primary->safe_psql(
-	'testdb', qq[
-  CREATE TABLE conflict_test(x integer, y text);
-  DROP TABLE conflict_test;
-  VACUUM pg_class;
-  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
-]);
+wait_until_vacuum_can_remove('', 'CREATE TABLE conflict_test(x integer, y text);
+			 DROP TABLE conflict_test;', 'pg_class');
 
 $node_primary->wait_for_replay_catchup($node_standby);
 
@@ -588,13 +595,8 @@ reactive_slots_change_hfs_and_wait_for_xmins('row_removal_',
 	'shared_row_removal_', 0, 1);
 
 # Trigger the conflict
-$node_primary->safe_psql(
-	'testdb', qq[
-  CREATE ROLE create_trash;
-  DROP ROLE create_trash;
-  VACUUM pg_authid;
-  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
-]);
+wait_until_vacuum_can_remove('', 'CREATE ROLE create_trash;
+			 DROP ROLE create_trash;', 'pg_authid');
 
 $node_primary->wait_for_replay_catchup($node_standby);
 
@@ -625,14 +627,10 @@ reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
 	'no_conflict_', 0, 1);
 
 # This should not trigger a conflict
-$node_primary->safe_psql(
-	'testdb', qq[
-  CREATE TABLE conflict_test(x integer, y text);
-  INSERT INTO conflict_test(x,y) SELECT s, s::text FROM generate_series(1,4) s;
-  UPDATE conflict_test set x=1, y=1;
-  VACUUM conflict_test;
-  INSERT INTO flush_wal DEFAULT VALUES; -- see 

Re: verify predefined LWLocks have entries in wait_event_names.txt

2024-01-08 Thread Michael Paquier
On Tue, Jan 09, 2024 at 04:55:07AM +, Bertrand Drouvot wrote:
> Thanks! v6 looks good to me.

WFM.  Thanks for putting in place this sanity check when compiling.
--
Michael


signature.asc
Description: PGP signature


Re: Commitfest 2024-01 first week update

2024-01-08 Thread vignesh C
On Mon, 8 Jan 2024 at 22:50, Jelte Fennema-Nio  wrote:
>
> On Mon, 8 Jan 2024 at 07:22, vignesh C  wrote:
> > Here is a list of "Needs review" entries for which there has not been
> > much communication on the thread and needs help in proceeding further.
>
> Thank you for creating these lists. It's definitely helpful to see
> what to focus my reviewing effort on. I noticed they are missing one
> of my commitfest entries though:

This is just one list for this week, I will be focussing on a
different set of patches for review in the next week.

> Add non-blocking version of PQcancel | Jelte Fennema-Nio
>
> imho this patch is pretty much finished, but it has only received
> English spelling feedback in the last 8 months (I addressed the
> feedback).

Thanks, One of us will have a look at this patch.

Regards,
Vignesh




Re: speed up a logical replica setup

2024-01-08 Thread Amit Kapila
On Mon, Jan 8, 2024 at 12:35 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > On Fri, Jan 5, 2024 at 3:36 PM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > I love your proposal, so I want to join the review. Here are my first 
> > > comments.
> > >
> > > 01.
> > > Should we restrict that `--subscriber-conninfo` must not have hostname or 
> > > IP?
> > > We want users to execute pg_subscriber on the target, right?
> > >
> >
> > I don't see any harm in users giving those information but we should
> > have some checks to ensure that the server is in standby mode and is
> > running locally. The other related point is do we need to take input
> > for the target cluster directory from the user? Can't we fetch that
> > information once we are connected to standby?
>
> I think that functions like inet_client_addr() may be able to use, but it 
> returns
> NULL only when the connection is via a Unix-domain socket. Can we restrict
> pg_subscriber to use such a socket?
>

Good question. So, IIUC, this tool has a requirement to run locally
where standby is present because we want to write reconvery.conf file.
I am not sure if it is a good idea to have a restriction to use only
the unix domain socket as users need to set up the standby for that by
configuring unix_socket_directories. It is fine if we can't ensure
that it is running locally but we should at least ensure that the
server is a physical standby node to avoid the problems as Shlok has
reported.

On a related point, I see that the patch stops the standby server (if
it is running) before starting with subscriber-side steps. I was
wondering if users can object to it that there was some important data
replication in progress which this tool has stopped. Now, OTOH,
anyway, once the user uses pg_subscriber, the standby server will be
converted to a subscriber, so it may not be useful as a physical
replica. Do you or others have any thoughts on this matter?

> > >
> > > 05.
> > > I found that the connection string for each subscriptions have a setting
> > > "fallback_application_name=pg_subscriber". Can we remove it?
> > >
> > > ```
> > > postgres=# SELECT subconninfo FROM pg_subscription;
> > >subconninfo
> > >
> > -
> > >  user=postgres port=5431 fallback_application_name=pg_subscriber
> > dbname=postgres
> > > (1 row)
> > > ```
> >
> > Can that help distinguish the pg_subscriber connection on the publisher?
> >
>
> Note that this connection string is used between the publisher instance and 
> the
> subscriber instance (not pg_subscriber client application). Also, the
> fallback_application_name would be replaced to the name of subscriber in
> run_apply_worker()->walrcv_connect(). Actually the value would not be used.
>

Fair point. It is not clear what other purpose this can achieve,
probably Euler has something in mind for this.

-- 
With Regards,
Amit Kapila.




Re: Add a perl function in Cluster.pm to generate WAL

2024-01-08 Thread Michael Paquier
On Mon, Jan 08, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote:
> Yes, I've added (VERBOSE) and also cut down the test to catch the failure 
> faster.
> The difference between a successful and a failed run:
>     tuples: 1 removed, 15 remain, 0 are dead but not yet removable
> [...]
>     tuples: 0 removed, 16 remain, 1 are dead but not yet removable

Yep, it's clear that the horizon is not stable.

> With FREEZE, 10 iterations with 20 tests in parallel succeeded for me
> (while without it, I get failures on iterations 1,2).
> 
> [1] 
> https://www.postgresql.org/message-id/b2a1f7d0-7629-72c0-2da7-e9c4e336b010%40gmail.com

Alexander, does the test gain in stability once you begin using the
patch posted on [2], mentioned by Bertrand?

(Also, perhaps we'd better move the discussion to the other thread
where the patch has been sent.)

[2]: 
https://www.postgresql.org/message-id/d40d015f-03a4-1cf2-6c1f-2b9aca860...@gmail.com
--
Michael


signature.asc
Description: PGP signature


Re: verify predefined LWLocks have entries in wait_event_names.txt

2024-01-08 Thread Bertrand Drouvot
Hi,

On Mon, Jan 08, 2024 at 04:02:12PM -0600, Nathan Bossart wrote:
> Sorry for the noise.  I spent some more time tidying this up for commit,
> which I am hoping to do in the next day or two.

Thanks! v6 looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Adding facility for injection points (or probe points?) for more advanced tests

2024-01-08 Thread Michael Paquier
(Compiled two separate review emails into a single one)

On Tue, Jan 02, 2024 at 03:36:12PM +0530, Ashutosh Bapat wrote:
> This discussion has not been addressed in v6. I think the interface
> needs to be documented in the order below
> INJECTION_POINT - this declares an injection point - i.e. a place in
> code where an external code can be injected (and run).
> InjectionPointAttach() - this is used to associate ("attach") external
> code to an injection point.
> InjectionPointDetach() - this is used to disassociate ("detach")
> external code from an injection point.
>
> [arguments about doc organization]
>
> Even if an INJECTION_POINT is not "declared" attach would succeed but
> doesn't do anything. I think this needs to be made clear in the
> documentation. Better if we could somehow make Attach() fail if the
> specified injection point is not "declared" using INJECTION_POINT. Of
> course we don't want to bloat the hash table with all "declared"
> injection points even if they aren't being attached to and hence not
> used.

Okay, I can see your point.  I have reorganized the docs in the
following order:
- INJECTION_POINT
- Attach
- Detach

> I think, exposing the current injection point strings as
> #defines and encouraging users to use these macros instead of string
> literals will be a good start.

Nah, I disagree with this one actually.  It is easy to grep for the
macro INJECTION_POINT to be able to achieve the same research job, and
this would make the code more inconsistent when callbacks are run
within extensions which don't care about a #define in a backend
header.

> With the current implementation it's possible to "declare" injection
> point with same name at multiple places. It's useful but is it
> intended?

Yes.  I would recommend not doing that, but I don't see why there
would be a point in restricting that, either.

> /* Field sizes */
> #define INJ_NAME_MAXLEN 64
> #define INJ_LIB_MAXLEN 128
> #define INJ_FUNC_MAXLEN 128
> I think these limits should be either documented or specified in the
> error messages for users to fix their code in case of
> errors/unexpected behaviour.

Adding them to the error messages when attaching is a good idea.
Done.

> This is not an array entry anymore. I think we should rename
> InjectionPointEntry as SharedInjectionPointEntry and InjectionPointArrayEntry
> as LocalInjectionPointEntry.

Yep, fixed.

> +/* utilities to handle the local array cache */
> +static void
> +injection_point_cache_add(const char *name,
> + InjectionPointCallback callback)
> +{
> ... snip ...
> +
> + entry = (InjectionPointCacheEntry *)
> + hash_search(InjectionPointCache, name, HASH_ENTER, );
> +
> + if (!found)
> 
> The function is called only when the injection point is not found in the local
> cache. So this condition will always be true. An Assert will help to make it
> clear and also prevent an unintended callback replacement.

Right, as coded that seems pointless to make the found conditional.  I
think that I coded it this way when doing some earlier work with this
code, and finished with a simpler thing.

> +#ifdef USE_INJECTION_POINTS
> +static bool
> +file_exists(const char *name)
> 
> There's similar function in jit.c and dfmgr.c. Can we not reuse that code?

This has been mentioned in a different comment.  Refactored as of
0001, but there is something here related to EACCES for the JIT path.
Seems weird to me that we would not fail if the JIT library cannot be
accessed when stat() fails.

> + /* Save the entry */
> + memcpy(entry_by_name->name, name, sizeof(entry_by_name->name));
> + entry_by_name->name[INJ_NAME_MAXLEN - 1] = '\0';
> + memcpy(entry_by_name->library, library, sizeof(entry_by_name->library));
> + entry_by_name->library[INJ_LIB_MAXLEN - 1] = '\0';
> + memcpy(entry_by_name->function, function, sizeof(entry_by_name->function));
> + entry_by_name->function[INJ_FUNC_MAXLEN - 1] = '\0';
> 
> Most of the code is using strncpy instead of memcpy. Why is this code 
> different?

strncpy() is less used in the backend code.  It comes to a matter of
taste, IMO.

> + injection_callback = injection_point_cache_get(name);
> + if (injection_callback == NULL)
> + {
> + char path[MAXPGPATH];
> +
> + /* Found, so just run the callback registered */
> 
> The condition indicates that the callback was not found. Comment looks wrong.

Fixed.

> Consider case
> 
> Backend 2
> InjectionPointAttach("xyz", "abc", "pqr");
> 
> Backend 1
> INJECTION_POINT("xyz");
> 
> Backend 2
> InjectionPointDetach("xyz");
> InjectionPointAttach("xyz", "uvw", "lmn");
> 
> Backend 1
> INJECTION_POINT("xyz");
> 
> IIUC, the last INJECTION_POINT would run abc.pqr instead of uvw.lmn.
> Am I correct?

Yeah, that's an intended design choice to keep the code simpler and
faster as there is no need to track the library and function names in
the local caches or implement something similar to invalidation
messages for this facility because it would impact performance anyway
in the call paths.  In short, just 

Re: Removing unneeded self joins

2024-01-08 Thread Alexander Lakhin

09.01.2024 01:09, Alexander Korotkov wrote:

Fixed in 30b4955a46.


Thank you for fixing that!

I've found another anomaly coined with d3d55ce57. This query:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES  (1, 1), (2, 1);

WITH t1 AS (SELECT * FROM t)
UPDATE t SET b = t1.b + 1 FROM t1
WHERE t.a = t1.a RETURNING t.a, t1.b;

gives "ERROR:  variable not found in subplan target lists" on d3d55ce57, but
starting from a7928a57b it gives an incorrect result:
 a | b
---+---
 1 | 2
 2 | 2
(2 rows)


Best regards,
Alexander




Re: introduce dynamic shared memory registry

2024-01-08 Thread Amul Sul
On Mon, Jan 8, 2024 at 10:48 PM Nathan Bossart 
wrote:

> On Mon, Jan 08, 2024 at 11:13:42AM +0530, Amul Sul wrote:
> > +void *
> > +dsm_registry_init_or_attach(const char *key, size_t size,
> >
> > I think the name could be simple as dsm_registry_init() like we use
> > elsewhere e.g. ShmemInitHash() which doesn't say attach explicitly.
>
> That seems reasonable to me.
>
> > Similarly, I think dshash_find_or_insert() can be as simple as
> > dshash_search() and
> > accept HASHACTION like hash_search().
>
> I'm not totally sure what you mean here.  If you mean changing the dshash
> API, I'd argue that's a topic for another thread.
>

Yes, you are correct. I didn't realize that existing code -- now sure, why
wouldn't we implemented as the dynahash. Sorry for the noise.

Regards,
Amul


Re: Random pg_upgrade test failure on drongo

2024-01-08 Thread Amit Kapila
On Mon, Jan 8, 2024 at 9:36 PM Jim Nasby  wrote:
>
> On 1/4/24 10:19 PM, Amit Kapila wrote:
> > On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin  wrote:
> >>
> >> 03.01.2024 14:42, Amit Kapila wrote:
> >>>
> >>
>  And the internal process is ... background writer (BgBufferSync()).
> 
>  So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
>  got 20 x 10 tests passing.
> 
>  Thus, it we want just to get rid of the test failure, maybe it's enough 
>  to
>  add this to the test's config...
> 
> >>> What about checkpoints? Can't it do the same while writing the buffers?
> >>
> >> As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
> >> the desired effect, but I think it's not impossible in principle.
> >> More details below.
> >> What happens during the pg_upgrade execution is essentially:
> >> 1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
> >> -- this command flushes file buffers as well
> >> 2) ALTER DATABASE postgres OWNER TO ...
> >> 3) COMMENT ON DATABASE "postgres" IS ...
> >> 4) -- For binary upgrade, preserve pg_largeobject and index 
> >> relfilenodes
> >>   SELECT 
> >> pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
> >>   SELECT 
> >> pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
> >>   TRUNCATE pg_catalog.pg_largeobject;
> >> --  ^^^ here we can get the error "could not create file "base/5/2683": 
> >> File exists"
> >> ...
> >>
> >> We get the effect discussed when the background writer process decides to
> >> flush a file buffer for pg_largeobject during stage 1.
> >> (Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
> >> the result must be the same.)
> >> And another important factor is shared_buffers = 1MB (set during the test).
> >> With the default setting of 128MB I couldn't see the failure.
> >>
> >> It can be reproduced easily (on old Windows versions) just by running
> >> pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
> >> default cluster)).
> >> If an old cluster contains dozen of databases, this increases the failure
> >> probability significantly (with 10 additional databases I've got failures
> >> on iterations 4, 1, 6).
> >>
> >
> > I don't have an old Windows environment to test but I agree with your
> > analysis and theory. The question is what should we do for these new
> > random BF failures? I think we should set bgwriter_lru_maxpages to 0
> > and checkpoint_timeout to 1hr for these new tests. Doing some invasive
> > fix as part of this doesn't sound reasonable because this is an
> > existing problem and there seems to be another patch by Thomas that
> > probably deals with the root cause of the existing problem [1] as
> > pointed out by you.
> >
> > [1] - https://commitfest.postgresql.org/40/3951/
>
> Isn't this just sweeping the problem (non-POSIX behavior on SMB and
> ReFS) under the carpet? I realize that synthetic test workloads like
> pg_upgrade in a loop aren't themselves real-world scenarios, but what
> about other cases? Even if we're certain it's not possible for these
> issues to wedge a server, it's still not a good experience for users to
> get random, unexplained IO-related errors...
>

The point is that this is an existing known Windows behavior and that
too only in certain versions. The fix doesn't seem to be
straightforward, so it seems advisable to avoid random BF failures by
having an appropriate configuration.

-- 
With Regards,
Amit Kapila.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-08 Thread Masahiko Sawada
On Wed, Jan 3, 2024 at 11:10 PM John Naylor  wrote:
>
> On Tue, Jan 2, 2024 at 8:01 PM Masahiko Sawada  wrote:
>
> > I agree that we expose RT_LOCK_* functions and have tidstore use them,
> > but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
> > calls part. I think that even if we expose them, we will still need to
> > do something like "if (TidStoreIsShared(ts))
> > shared_rt_lock_share(ts->tree.shared)", no?
>
> I'll come back to this topic separately.
>
> > I've attached a new patch set. From v47 patch, I've merged your
> > changes for radix tree, and split the vacuum integration patch into 3
> > patches: simply replaces VacDeadItems with TidsTore (0007 patch), and
> > use a simple TID array for one-pass strategy (0008 patch), and replace
> > has_lpdead_items with "num_offsets > 0" (0009 patch), while
> > incorporating your review comments on the vacuum integration patch
>
> Nice!
>
> > (sorry for making it difficult to see the changes from v47 patch).
>
> It's actually pretty clear. I just have a couple comments before
> sharing my latest cleanups:
>
> (diff'ing between v47 and v48):
>
> --   /*
> -* In the shared case, TidStoreControl and radix_tree are backed by 
> the
> -* same DSA area and rt_memory_usage() returns the value including 
> both.
> -* So we don't need to add the size of TidStoreControl separately.
> -*/
> if (TidStoreIsShared(ts))
> -   return sizeof(TidStore) +
> shared_rt_memory_usage(ts->tree.shared);
> +   rt_mem = shared_rt_memory_usage(ts->tree.shared);
> +   else
> +   rt_mem = local_rt_memory_usage(ts->tree.local);
>
> -   return sizeof(TidStore) + sizeof(TidStore) +
> local_rt_memory_usage(ts->tree.local);
> +   return sizeof(TidStore) + sizeof(TidStoreControl) + rt_mem;
>
> Upthread, I meant that I don't see the need to include the size of
> these structs *at all*. They're tiny, and the blocks/segments will
> almost certainly have some empty space counted in the total anyway.
> The returned size is already overestimated, so this extra code is just
> a distraction.

Agreed.

>
> - if (result->num_offsets + bmw_popcount(w) > result->max_offset)
> + if (result->num_offsets + (sizeof(bitmapword) * BITS_PER_BITMAPWORD)
> >= result->max_offset)
>
> I believe this math is wrong. We care about "result->num_offsets +
> BITS_PER_BITMAPWORD", right?
> Also, it seems if the condition evaluates to equal, we still have
> enough space, in which case ">" the max is the right condition.

Oops, you're right. Fixed.

>
> - if (off < 1 || off > MAX_TUPLES_PER_PAGE)
> + if (off < 1 || off > MaxOffsetNumber)
>
> This can now use OffsetNumberIsValid().

Fixed.

>
> > 0013 to 0015 patches are also updates from v47 patch.
>
> > I'm thinking that we should change the order of the patches so that
> > tidstore patch requires the patch for changing DSA segment sizes. That
> > way, we can remove the complex max memory calculation part that we no
> > longer use from the tidstore patch.
>
> I don't think there is any reason to have those calculations at all at
> this point. Every patch in every version should at least *work
> correctly*, without kludging m_w_m and without constraining max
> segment size. I'm fine with the latter remaining in its own thread,
> and I hope we can consider it an enhancement that respects the admin's
> configured limits more effectively, and not a pre-requisite for not
> breaking. I *think* we're there now, but it's hard to tell since 0015
> was at the very end. As I said recently, if something still fails, I'd
> like to know why. So for v49, I took the liberty of removing the DSA
> max segment patches for now, and squashing v48-0015.

Fair enough.

>
> In addition for v49, I have quite a few cleanups:
>
> 0001 - This hasn't been touched in a very long time, but I ran
> pgindent and clarified a comment
> 0002 - We no longer need to isolate the rightmost bit anywhere, so
> removed that part and revised the commit message accordingly.

Thanks.

>
> radix tree:
> 0003 - v48 plus squashed v48-0013
> 0004 - Removed or adjusted WIP, FIXME, TODO items. Some were outdated,
> and I fixed most of the rest.
> 0005 - Remove the RT_PTR_LOCAL macro, since it's not really useful anymore.
> 0006 - RT_FREE_LEAF only needs the allocated pointer, so pass that. A
> bit simpler.
> 0007 - Uses the same idea from a previous cleanup of RT_SET, for RT_DELETE.
> 0008 - Removes a holdover from the multi-value leaves era.
> 0009 - It occurred to me that we need to have unique names for memory
> contexts for different instantiations of the template. This is one way
> to do it, by using the configured RT_PREFIX in the context name. I
> also took an extra step to make the size class fanout show up
> correctly on different platforms, but that's probably overkill and
> undesirable, and I'll probably use only the class name next time.
> 0010/11 - Make the array functions less surprising and with more

Re: the s_lock_stuck on perform_spin_delay

2024-01-08 Thread Andy Fan


Hi,
Robert Haas  writes:

> On Sun, Jan 7, 2024 at 9:52 PM Andy Fan  wrote:
>> > I think we should add cassert-only infrastructure tracking whether we
>> > currently hold spinlocks, are in a signal handler and perhaps a few other
>> > states. That'd allow us to add assertions like:
>> ..
>> > - no lwlocks or ... while in signal handlers
>>
>> I *wish* lwlocks should *not* be held while in signal handlers since it
>> inspired me for a direction of a low-frequency internal bug where a
>> backend acuqire a LWLock when it has acuqired it before. However when I
>> read more document and code, I am thinking this should not be a
>> problem.
>
> It's not safe to acquire an LWLock in a signal handler unless we know
> that the code that the signal handler is interrupting can't already be
> doing that. Consider this code from LWLockAcquire:

Thanks for the explaination! I can follow the sistuation you descirbe
here, then I found I asked a bad question because I didn't clarify what
"signal handlers" I was refering to, sorry about that!

In your statement, I guess you are talking about the signal handler from
Linux. However I *assumed* such handlers are doing pretty similar stuff
like set a 'GlobalVarialbe=true'. If my assumption was right, I think
that should not be take cared. For example:

spin_or_lwlock_acquire();
... (linux signal handler may be invovked here no matther what ... code is)
spin_or_lwlock_relase()

Since the linux signal hander are pretty simply, so it can come back to
'spin_or_lwlock_relase' anyway. (However my assumption may be wrong and
thanks for highlight this, and it is helpful for me to debug my internal
bug!)

The singler handler I was refering to is 'CHECK_FOR_INTERRUPTS', Based
on this, spin_lock and lwlock are acted pretty differently. 

spin_lock_acuqire();
CHECK_FOR_INTERRUPT();
spin_lock_release();

Since CHECK_FOR_INTERRUPT usually goes to the ERROR system which makes it
is hard to go back to 'spin_lock_release()', then spin lock leaks! so
CHECK_FOR_INTERRUPT is the place I Assert *spin lock* should not be
handled in my patch. and I understood what Andres was talking about is
the same thing. (Of course I can add the "Assert no spin lock is held"
into every linux single handler as well).

Based on the above, I asked my question in my previous post, where I am
not sure if we should do the same('Assert no-lwlock should be held') for
*lwlock* in CHECK_FOR_INTERRUPT since lwlocks can be released no matter
where CHECK_FOR_INTERRUPT jump to.

-- 
Best Regards
Andy Fan





Re: add AVX2 support to simd.h

2024-01-08 Thread John Naylor
On Tue, Jan 9, 2024 at 12:37 AM Nathan Bossart  wrote:
>
> > I suspect that there could be a regression lurking for some inputs
> > that the benchmark doesn't look at: pg_lfind32() currently needs to be
> > able to read 4 vector registers worth of elements before taking the
> > fast path. There is then a tail of up to 15 elements that are now
> > checked one-by-one, but AVX2 would increase that to 31. That's getting
> > big enough to be noticeable, I suspect. It would be good to understand
> > that case (n*32 + 31), because it may also be relevant now. It's also
> > easy to improve for SSE2/NEON for v17.
>
> Good idea.  If it is indeed noticeable, we might be able to "fix" it by
> processing some of the tail with shorter vectors.  But that probably means
> finding a way to support multiple vector sizes on the same build, which
> would require some work.

What I had in mind was an overlapping pattern I've seen in various
places: do one iteration at the beginning, then subtract the
aligned-down length from the end and do all those iterations. And
one-by-one is only used if the total length is small.




Re: Synchronizing slots from primary to standby

2024-01-08 Thread Peter Smith
Here are some review comments for patch v57-0001.

==
doc/src/sgml/protocol.sgml

1. CREATE_REPLICATION_SLOT ... FAILOVER

+   
+FAILOVER [ boolean ]
+
+ 
+  If true, the slot is enabled to be synced to the physical
+  standbys so that logical replication can be resumed after failover.
+ 
+
+   

This syntax says passing the boolean value is optional. So the default
needs to the specified here in the docs (like what the TWO_PHASE
option does).

~~~

2. ALTER_REPLICATION_SLOT ... FAILOVER

+  
+   
+FAILOVER [ boolean ]
+
+ 
+  If true, the slot is enabled to be synced to the physical
+  standbys so that logical replication can be resumed after failover.
+ 
+
+   
+  

This syntax says passing the boolean value is optional. So it needs to
be specified here in the docs that not passing a value would be the
same as passing the value true.

==
doc/src/sgml/ref/alter_subscription.sgml

3.
+   If failover
+   is enabled, you can temporarily disable it in order to execute
these commands.

/in order to/to/

~~~

4.
+ 
+  When altering the
+  slot_name,
+  the failover property of the new slot may
differ from the
+  failover
+  parameter specified in the subscription. When creating the slot,
+  ensure the slot failover property matches the
+  failover
+  parameter value of the subscription.
+ 

4a.
the failover property of the new slot may differ

Maybe it would be more clear if that said "the failover property value
of the named slot...".

~

4b.
In the "failover property matches" part should that failover also be
rendered as  like before in the same paragraph?

==
doc/src/sgml/system-views.sgml

5.
+ 
+  
+   failover bool
+  
+  
+   True if this logical slot is enabled to be synced to the physical
+   standbys so that logical replication can be resumed from the new primary
+   after failover. Always false for physical slots.
+  
+ 

/True if this logical slot is enabled.../True if this is a logical
slot enabled.../

==
src/backend/commands/subscriptioncmds.c

6. CreateSubscription

+ /*
+ * Even if failover is set, don't create the slot with failover
+ * enabled. Will enable it once all the tables are synced and
+ * ready. The intention is that if failover happens at the time of
+ * table-sync, user should re-launch the subscription instead of
+ * relying on main slot (if synced) with no table-sync data
+ * present. When the subscription has no tables, leave failover as
+ * false to allow ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
+ * work.
+ */
+ if (opts.failover && !opts.copy_data && tables != NIL)
+ failover_enabled = true;

AFAICT it might be possible for this to set failover_enabled = true if
copy_data is false. So failover_enabled would be true when later
calling:
  walrcv_create_slot(wrconn, opts.slot_name, false, twophase_enabled,
failover_enabled, CRS_NOEXPORT_SNAPSHOT, NULL);

Isn't that contrary to what this comment said: "Even if failover is
set, don't create the slot with failover enabled"

~~~

7. AlterSubscription. case ALTER_SUBSCRIPTION_OPTIONS:

+ if (!sub->slotname)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot set failover for subscription that does not have slot name")));

/for subscription that does not have slot name/for a subscription that
does not have a slot name/

==
.../libpqwalreceiver/libpqwalreceiver.c

8.
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("could not alter replication slot \"%s\"",
+ slotname)));

This used to display the error message like
pchomp(PQerrorMessage(conn->streamConn)) but it was removed. Is it OK?

==
src/backend/replication/logical/tablesync.c

9.
+ if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING)
+ ereport(LOG,
+ /* translator: %s is a subscription option */
+ (errmsg("logical replication apply worker for subscription \"%s\"
will restart so that %s can be enabled",
+ MySubscription->name, "two_phase")));
+
+ if (MySubscription->failoverstate == LOGICALREP_FAILOVER_STATE_PENDING)
+ ereport(LOG,
+ /* translator: %s is a subscription option */
+ (errmsg("logical replication apply worker for subscription \"%s\"
will restart so that %s can be enabled",
+ MySubscription->name, "failover")));

Those errors have multiple %s, so the translator's comment should say
"the 2nd %s is a..."

~~~

10.
 void
-UpdateTwoPhaseState(Oid suboid, char new_state)
+EnableTwoPhaseFailoverTriState(Oid suboid, bool enable_twophase,
+bool enable_failover)

I felt the function name was a bit confusing. Maybe it is simpler to
call it like "EnableTriState" or "EnableSubTriState" -- the parameters
anyway specify what actual state(s) will be set.

==
src/backend/replication/logical/worker.c

11.
+ /* 

Re: Built-in CTYPE provider

2024-01-08 Thread Jeremy Schneider
On 12/28/23 6:57 PM, Jeff Davis wrote:
> 
> Attached a more complete version that fixes a few bugs, stabilizes the
> tests, and improves the documentation. I optimized the performance, too
> -- now it's beating both libc's "C.utf8" and ICU "en-US-x-icu" for both
> collation and case mapping (numbers below).
> 
> It's really nice to finally be able to have platform-independent tests
> that work on any UTF-8 database.

Thanks for all your work on this, Jeff

I didn't know about the Unicode stability policy. Since it's formal
policy, I agree this provides some assumptions we can safely build on.

I'm working my way through these patches but it's taking a little time
for me. I hadn't tracked with the "builtin" thread last summer so I'm
coming up to speed on that now too. I'm hopeful that something along
these lines gets into pg17. The pg17 cycle is going to start heating up
pretty soon.

I agree with merging the threads, even though it makes for a larger
patch set. It would be great to get a unified "builtin" provider in
place for the next major.

I also still want to parse my way through your email reply about the two
groups of callers, and what this means for user experience.

https://www.postgresql.org/message-id/7774b3a64f51b3375060c29871cf2b02b3e85dab.camel%40j-davis.com

> Let's separate it into groups.
> (1) Callers that use a collation OID or pg_locale_t:
> (2) A long tail of callers that depend on what LC_CTYPE/LC_COLLATE are
> set to, or use ad-hoc ASCII-only semantics:

In the first list it seems that some callers might be influenced by a
COLLATE clause or table definition while others always take the database
default? It still seems a bit odd to me if different providers can be
used for different parts of a single SQL. But it might not be so bad - I
haven't fully thought through it yet and I'm still kicking the tires on
my test build over here.

Is there any reason we couldn't commit the minor cleanup (patch 0001)
now? It's less than 200 lines and pretty straightforward.

I wonder if, after a year of running the builtin provider in production,
whether we might consider adding to the builtin provider a few locales
with simple but more reasonable ordering for european and asian
languages? Maybe just grabbing a current version of DUCET with no
intention of ever updating it? I don't know how bad sorting is with
plain DUCET for things like french or spanish or german, but surely it's
not as unusable as code point order? Anyone who needs truly accurate or
updated or customized linguistic sorting can always go to ICU, and take
responsibility for their ICU upgrades, but something basic and static
might meet the needs of 99% of postgres users indefinitely.

By the way - my coworker Josh (who I don't think posts much on the
hackers list here, but shares an unhealthy inability to look away from
database unicode disasters) passed along this link today which I think
is a fantastic list of surprises about programming and strings
(generally unicode).

https://jeremyhussell.blogspot.com/2017/11/falsehoods-programmers-believe-about.html#main

Make sure to click the link to show the counterexamples and discussion,
that's the best part.

-Jeremy


PS. I was joking around today that the the second best part is that it's
proof that people named Jeremy are always brilliant within their field.
 Josh said its just a subset of "always trust people whose names start
with J" which seems fair. Unfortunately I can't yet think of a way to
shoehorn the rest of the amazing PG hackers on this thread into the joke.

-- 
http://about.me/jeremy_schneider





Re: Adding facility for injection points (or probe points?) for more advanced tests

2024-01-08 Thread Michael Paquier
On Fri, Jan 05, 2024 at 03:00:25PM +0530, Dilip Kumar wrote:
> Some comments in 0001, mostly cosmetics
> 
> 1.
> +/* utilities to handle the local array cache */
> +static void
> +injection_point_cache_add(const char *name,
> +   InjectionPointCallback callback)
> 
> I think the comment for this function should be more specific about
> adding an entry to the local injection_point_cache_add.  And add
> comments for other functions as well e.g. injection_point_cache_get

And it is not an array anymore.  Note InjectionPointArrayEntry that
still existed.

> 2.
> +typedef struct InjectionPointEntry
> +{
> + char name[INJ_NAME_MAXLEN]; /* hash key */
> + char library[INJ_LIB_MAXLEN]; /* library */
> + char function[INJ_FUNC_MAXLEN]; /* function */
> +} InjectionPointEntry;
> 
> Some comments would be good for the structure

Sure.  I've spent more time documenting things in injection_point.c,
addressing any inconsistencies.

> 3.
> 
> +static bool
> +file_exists(const char *name)
> +{
> + struct stat st;
> +
> + Assert(name != NULL);
> + if (stat(name, ) == 0)
> + return !S_ISDIR(st.st_mode);
> + else if (!(errno == ENOENT || errno == ENOTDIR))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not access file \"%s\": %m", name)));
> + return false;
> +}
> 
> dfmgr.c has a similar function so can't we reuse it by making that
> function external?

Yes.  Note that jit.c has an extra copy of it.  I was holding on the
refactoring, but let's bite the bullet and have a single routine.
I've moved that into a 0001 that builds on top of the rest.

> 4.
> + if (found)
> + {
> + LWLockRelease(InjectionPointLock);
> + elog(ERROR, "injection point \"%s\" already defined", name);
> + }
> +
> ...
> +#else
> + elog(ERROR, "Injection points are not supported by this build");
> 
> Better to use similar formatting for error output, Injection vs
> injection (better not to capitalize the first letter for consistency
> pov)

Fixed.

> 5.
> + * Check first the shared hash table, and adapt the local cache
> + * depending on that as it could be possible that an entry to run
> + * has been removed.
> + */
> 
> What if the entry is removed after we have released the
> InjectionPointLock? Or this would not cause any harm?

With an entry found in the shmem table?  I don't really think that we
need to care about such cases, TBH, because the injection point would
have been found in the table to start with.  This comes down to if we
should try to hold InjectionPointLock while calling the callback, and
that may not be a good idea in some cases if you'd expect a high
concurrency on the callback running.

> 0004:
> 
> I think
> test_injection_points_wake() and test_injection_wait() can be moved as
> part of 0002

Nah.  I intend to keep the introduction of this API where it becomes
relevant.  Perhaps this could also use an isolation test?  This could
always be polished once we agree on 0001 and 0002.

(I'll post a v6 a bit later, there are more comments posted here and
there.)
--
Michael


signature.asc
Description: PGP signature


Re: add function argument names to regex* functions.

2024-01-08 Thread Dian Fay
On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote:
> On Mon, Jan 8, 2024 at 8:44 AM Dian Fay  wrote:
> > The `regexp_replace` summary in table 9.10 is mismatched and still
> > specifies the first parameter name as `string` instead of `source`.
> > Since all the other functions use `string`, should `regexp_replace` do
> > the same or is this a case where an established "standard" diverges?
>
> got it. Thanks for pointing it out.
>
> in functions-matching.html
> if I change source to
> string then
> there are no markup "string" and markup "string", it's kind of
> slightly confusing.
>
> So does the following refactored description of regexp_replace make sense:
>
>  The string is returned unchanged if
>  there is no match to the pattern.  If there 
> is a
>  match, the string is returned with the
>  replacement string substituted for the 
> matching
>  substring.  The replacement string can contain
>  \n, where
> n is 1
>  through 9, to indicate that the source substring matching the
>  n'th parenthesized subexpression of
> the pattern should be
>  inserted, and it can contain \ to indicate that 
> the
>  substring matching the entire pattern should be inserted.  Write
>  \\ if you need to put a literal backslash in
> the replacement
>  text.

That change makes sense to me! I'll see about the section refactoring
after this lands.




Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

2024-01-08 Thread Michael Paquier
On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote:
> I'm not a Windows expert, but my guess is that 0001 is a very good
> idea. I hope someone who is a Windows expert will comment on that.

I am +1 on 0001.  It is just something we've never anticipated when
these wrappers around cmd in pg_ctl were written.

> 0002 seems problematic to me. One potential issue is that it would
> break if someone renamed postgres.exe to something else -- although
> that's probably not really a serious problem.

We do a find_other_exec_or_die() on "postgres" with what could be a
custom execution path.  So we're actually sure that the binary will be
there in the start path, no?  I don't like much the hardcoded
dependency to .exe here.

> A bigger issue is that
> it seems like it would break if someone used pg_ctl to start several
> instances in different data directories on the same machine. If I'm
> understanding correctly, that currently mostly works, and this would
> break it.

Not having the guarantee that a single shell_pid is associated to a
single postgres.exe would be a problem.  Now the patch includes this
code:
+   if (ppe.th32ParentProcessID == shell_pid &&
+   strcmp("postgres.exe", ppe.szExeFile) == 0)
+   {
+   if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+   multiple_children = true;
+   pm_pid = ppe.th32ProcessID;
+   }

Which is basically giving this guarantee?  multiple_children should
never happen once the autorun part is removed.  Is that right?

+* The launcher shell might start other cmd.exe instances or programs
+* besides postgres.exe. Veryfying the program file name is essential.

With the autorun part of cmd.exe removed, what's still relevant here?
s/Veryfying/Verifying/.

Perhaps 0002 should make more efforts in documenting things like
th32ProcessID and th32ParentProcessID.
--
Michael


signature.asc
Description: PGP signature


Re: Add support for __attribute__((returns_nonnull))

2024-01-08 Thread Michael Paquier
On Mon, Jan 08, 2024 at 05:04:58PM -0600, Tristan Partin wrote:
> The idea I had in mind initially was PGLC_localeconv(), but I couldn't
> prove that anything changed with the annotation added. The second patch
> in my previous email was attempt at deriving real-world benefit, but
> nothing I did seemed to change anything. Perhaps you can test it and see
> if anything changes for you.

There are a bunch of places in the tree where we return NULL just to
keep the compiler quiet, and where we should never return NULL, as in:
return NULL; /* keep compiler quiet */

See bbstreamer_gzip.c as one example for non-gzip builds.  Perhaps you
could look at one of these places and see if the marker shows
benefits?
--
Michael


signature.asc
Description: PGP signature


Re: Make psql ignore trailing semicolons in \sf, \ef, etc

2024-01-08 Thread Tom Lane
"Tristan Partin"  writes:
> On Mon Jan 8, 2024 at 2:48 PM CST, Tom Lane wrote:
>> +(isascii((unsigned char) 
>> mybuf.data[mybuf.len - 1]) &&
>> + isspace((unsigned char) 
>> mybuf.data[mybuf.len - 1]

> Seems like if there was going to be any sort of casting, it would be to 
> an int, which is what the man page says for these two function, though 
> isascii(3) explicitly mentions "unsigned char."

Casting to unsigned char is our standard pattern for using these
functions.  If "char" is signed (which is the only case in which
this changes anything) then casting to int would imply sign-extension
of the char's high-order bit, which is exactly what must not happen
in order to produce a legal value to be passed to these functions.
POSIX says:

The c argument is an int, the value of which the application shall
ensure is a character representable as an unsigned char or equal
to the value of the macro EOF. If the argument has any other
value, the behavior is undefined.

If we cast to unsigned char, then the subsequent implicit cast to int
will do zero-extension which is what we need.

> Small English nit-pick: I would drop the hyphen between semi and colons.

Me too, except that it's spelled like that in nearby comments.
Shall I change them all?

regards, tom lane




Re: INFORMATION_SCHEMA note

2024-01-08 Thread Tatsuo Ishii
>> On 4 Jan 2024, at 13:39, Tatsuo Ishii  wrote:
> 
>>> Attached is the patch that does this.
> 
> I don't think the patch was attached?
> 
>> Any objection?
> 
> I didn't study the RFC in depth but as expected it seems to back up your 
> change
> so the change seems reasonable.

Oops. Sorry. Patch attached.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml
index 0ca7d5a9e0..9e66be4e83 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -697,8 +697,8 @@
   
An encoding of some character repertoire.  Most older character
repertoires only use one encoding form, and so there are no
-   separate names for them (e.g., LATIN1 is an
-   encoding form applicable to the LATIN1
+   separate names for them (e.g., LATIN2 is an
+   encoding form applicable to the LATIN2
repertoire).  But for example Unicode has the encoding forms
UTF8, UTF16, etc. (not
all supported by PostgreSQL).  Encoding forms are not exposed


Re: Make psql ignore trailing semicolons in \sf, \ef, etc

2024-01-08 Thread Tristan Partin

On Mon Jan 8, 2024 at 2:48 PM CST, Tom Lane wrote:

We had a complaint (see [1], but it's not the first IIRC) about how
psql doesn't behave very nicely if one ends \sf or allied commands
with a semicolon:

regression=# \sf sin(float8);
ERROR:  expected a right parenthesis

This is a bit of a usability gotcha, since many other backslash
commands are forgiving about trailing semicolons.  I looked at
the code and found that it's actually trying to ignore semicolons,
by passing semicolon = true to psql_scan_slash_option.  But that
fails to work because it's also passing type = OT_WHOLE_LINE,
and the whole-line code path ignores the semicolon flag.  Probably
that's just because nobody needed to use that combination back in
the day.  There's another user of OT_WHOLE_LINE, exec_command_help,
which also wants this behavior and has written its own stripping
code to get it.  That seems pretty silly, so here's a quick finger
exercise to move that logic into psql_scan_slash_option.

Is this enough of a bug to deserve back-patching?  I'm not sure.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/CAEs%3D6D%3DnwX2wm0hjkaw6C_LnqR%2BNFtnnzbSzeZq-xcfi_ooKSw%40mail.gmail.com



 +/*
 + * In whole-line mode, we interpret semicolon = true 
as stripping
 + * trailing whitespace as well as semi-colons; this 
gives the
 + * nearest equivalent to what semicolon = true does 
in normal
 + * mode.  Note there's no concept of quoting in this 
mode.
 + */
 +if (semicolon)
 +{
 +while (mybuf.len > 0 &&
 +   (mybuf.data[mybuf.len - 1] == ';' 
||
 +(isascii((unsigned char) 
mybuf.data[mybuf.len - 1]) &&
 + isspace((unsigned char) 
mybuf.data[mybuf.len - 1]
 +{
 +mybuf.data[--mybuf.len] = '\0';
 +}
 +}


Seems like if there was going to be any sort of casting, it would be to 
an int, which is what the man page says for these two function, though 
isascii(3) explicitly mentions "unsigned char."


Small English nit-pick: I would drop the hyphen between semi and colons.

As for backpatching, seems useful in the sense that people can write the 
same script for all supported version of Postgres using the relaxed 
syntax.


--
Tristan Partin
Neon (https://neon.tech)




Re: Add support for __attribute__((returns_nonnull))

2024-01-08 Thread Tristan Partin

On Sun Dec 31, 2023 at 9:29 PM CST, John Naylor wrote:

On Thu, Dec 28, 2023 at 1:20 AM Tristan Partin  wrote:
> I recently wound up in a situation where I was checking for NULL return
> values of a function that couldn't ever return NULL because the
> inability to allocate memory was always elog(ERROR)ed (aborted).

It sounds like you have a ready example to test, so what happens with the patch?

As for whether any code generation changed, I'd start by checking if
anything in a non-debug binary changed at all.


The idea I had in mind initially was PGLC_localeconv(), but I couldn't
prove that anything changed with the annotation added. The second patch
in my previous email was attempt at deriving real-world benefit, but
nothing I did seemed to change anything. Perhaps you can test it and see
if anything changes for you.

--
Tristan Partin
Neon (https://neon.tech)




Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-01-08 Thread Jelte Fennema-Nio
Okay, attempt number 5 attached. The primary changes with the previous
version are:

1. Split up commits a bit differently. I think each commit now stands
on its own and is an incremental improvement that could be committed
without any of the later ones being necessary. Descriptions of why
each commit is useful can be found in the commit message. Especially
the first 3 (and even first 4) seem rather uncontroversial to me.

2. ParameterSet now works for normal backend parameters too. For
normal parameters it works just like SET does (so in a transactional
manner). For protocol extension parameters it still works too, but
there it errors when trying to change protocol extension parameters
within a transaction. Thus (imho) elegantly avoiding confusing
situations around rolling back protocol extension parameters.

3. As Tom suggested, it now uses a PGC_PROTOCOL context for the
protocol extension GUCs, instead of using a GUC_PROTOCOL_EXTENSION
flag. This definitely seems like the cleanest way of adding "protocol
only" parameters to the current GUC system.

4. _pq_.protocol_managed_params its list of GUCs can now only contain
GUCs that have the PGC_USERSET or PGC_SUSET context.

On Fri, 5 Jan 2024 at 17:28, Robert Haas  wrote:
> First, I don't see a reason to bump the protocol version.

It's still bumping the protocol version. I think this is a necessity
with the current changeset, since ParameterSet now applies to plain
GUCs too and. As I clarified in a previous email, this does **not**
break old clients, since the server silently downgrades when an older
protocol version is requested.

> Second, I don't really like the idea of selectively turning GUCs into
> protocol-managed parameters. I think there are a relatively small
> number of things that we'd ever want to manage on a protocol level,
> but this design would force us to make it work for any GUC whatsoever.

It now limits the possible GUCs that can be changed to PGC_USERSET and
PGC_SUSET. If desired, we could limit it even further by using an
allowlist of reasonable GUCs or set a flag on GUCs that can be
"upgraded" . Things that seem at least reasonable to me are "role",
"session_authorization", "client_encoding".

> If somebody makes a random GUC into a protocol-managed parameter and then
> somebody updates postgresql.conf and then the config file is reloaded,
> I guess we need to refuse to adopt the new value of that parameter?

This was actually really easy to do after changing to PGC_PROTOCOL.
This exact behaviour is needed for PGC_BACKEND parameters, so I simply
used that same small if statement for PGC_PROTOCOL.

If you still think we shouldn't do this, then the only other option I
can think of is to only allow GUCs with the GUC_DISALLOW_IN_FILE flag.
This would rule out adding client_encoding in the list though, but
using role and session_authorization would still be possible.


v5-0001-libpq-Remove-instance-of-hardcoded-protocol-versi.patch
Description: Binary data


v5-0002-libpq-Handle-NegotiateProtocolVersion-message-mor.patch
Description: Binary data


v5-0005-Bump-protocol-version-to-3.1.patch
Description: Binary data


v5-0004-libpq-Include-minor-version-number-in-PQprotocolV.patch
Description: Binary data


v5-0003-Prepare-server-code-for-addition-of-protocol-exte.patch
Description: Binary data


v5-0006-Add-protocol-message-to-change-parameters.patch
Description: Binary data


v5-0009-Add-tests-for-ParameterSet-and-_pq_.protocol_mana.patch
Description: Binary data


v5-0008-Add-_pq_.protocol_managed_params-protocol-extensi.patch
Description: Binary data


v5-0007-Add-GUC-contexts-for-protocol-extensions.patch
Description: Binary data


Re: Fix bogus Asserts in calc_non_nestloop_required_outer

2024-01-08 Thread Tom Lane
Robert Haas  writes:
> On Sat, Jan 6, 2024 at 4:08 PM Tom Lane  wrote:
>> The argument for the patch as proposed is that we should make the
>> mergejoin and hashjoin code paths do what the nestloop path is doing.
>> However, as I replied further down in that other thread, I'm not
>> exactly convinced that the nestloop path is doing the right thing.
>> I've not had time to look closer at that, though.

> I don't really understand what you were saying in your response there,
> or what you're saying here. It seems to me that if the path is
> parameterized by top relids, and the assertion is verifying that a
> certain set of non-toprelids i.e. otherrels isn't present, then
> obviously the assertion is going to pass, but it's not checking what
> it purports to be checking.

I think we're talking at cross-purposes.  What I was wondering about
(at least further down in the thread) was whether we shouldn't be
checking *both* the "real" and the "parent" relids to make sure they
don't overlap the parameterization sets.  But it's probably overkill.
After reading the code again I agree that the parent relids are more
useful to check.

However, I still don't like Richard's patch too much as-is, because
the Asserts are difficult to read/understand and even more difficult
to compare to the other code path.  I think we want to keep the
nestloop and not-nestloop paths as visually similar as possible,
so I propose we do it more like the attached (where I also borrowed
some of your wording for the comments).

A variant of this could be to ifdef out all the added code in
non-Assert builds:

Relidsouter_paramrels = PATH_REQ_OUTER(outer_path);
Relidsinner_paramrels = PATH_REQ_OUTER(inner_path);
Relidsrequired_outer;
#ifdef USE_ASSERT_CHECKING
Relidsinnerrelids;
Relidsouterrelids;

/*
 * Any parameterization of the input paths refers to topmost parents of
 * the relevant relations, because reparameterize_path_by_child() hasn't
 * been called yet.  So we must consider topmost parents of the relations
 * being joined, too, while checking for disallowed parameterization
 * cases.
 */
if (inner_path->parent->top_parent_relids)
innerrelids = inner_path->parent->top_parent_relids;
else
innerrelids = inner_path->parent->relids;

if (outer_path->parent->top_parent_relids)
outerrelids = outer_path->parent->top_parent_relids;
else
outerrelids = outer_path->parent->relids;

/* neither path can require rels from the other */
Assert(!bms_overlap(outer_paramrels, innerrelids));
Assert(!bms_overlap(inner_paramrels, outerrelids));
#endif
/* form the union ... */

but I'm not sure that's better.  Probably any reasonable compiler
will throw away the whole calculation upon noticing the outputs
are unused.

regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 3fd5a24fad..e9def9d540 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -730,8 +730,11 @@ try_nestloop_path(PlannerInfo *root,
 		return;
 
 	/*
-	 * Paths are parameterized by top-level parents, so run parameterization
-	 * tests on the parent relids.
+	 * Any parameterization of the input paths refers to topmost parents of
+	 * the relevant relations, because reparameterize_path_by_child() hasn't
+	 * been called yet.  So we must consider topmost parents of the relations
+	 * being joined, too, while determining parameterization of the result and
+	 * checking for disallowed parameterization cases.
 	 */
 	if (innerrel->top_parent_relids)
 		innerrelids = innerrel->top_parent_relids;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index f7ac71087a..8dbf790e89 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2360,6 +2360,9 @@ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel,
  * calc_nestloop_required_outer
  *	  Compute the required_outer set for a nestloop join path
  *
+ * Note: when considering a child join, the inputs nonetheless use top-level
+ * parent relids
+ *
  * Note: result must not share storage with either input
  */
 Relids
@@ -2394,11 +2397,30 @@ calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path)
 {
 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
+	Relids		innerrelids PG_USED_FOR_ASSERTS_ONLY;
+	Relids		outerrelids PG_USED_FOR_ASSERTS_ONLY;
 	Relids		required_outer;
 
+	/*
+	 * Any parameterization of the input paths refers to topmost parents of
+	 * the relevant relations, because reparameterize_path_by_child() hasn't
+	 * been called yet.  So we must consider topmost parents of the relations
+	 * being joined, too, while checking for disallowed parameterization
+	 * cases.
+	 */
+	if 

Re: Removing unneeded self joins

2024-01-08 Thread Alexander Korotkov
On Mon, Jan 8, 2024 at 10:20 PM Alexander Korotkov  wrote:
> On Mon, Jan 8, 2024 at 10:00 PM Alexander Lakhin  wrote:
> > Please look at the following query which produces an incorrect result since
> > d3d55ce57:
> > CREATE TABLE t(a int PRIMARY KEY, b int);
> > INSERT INTO t VALUES  (1, 1), (2, 1);
> > SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b 
> > > 0);
> >
> >   a | b
> > ---+---
> >   1 | 1
> > (1 row)
> >
> > I think that the expected result is:
> >   a | b
> > ---+---
> >   1 | 1
> >   2 | 1
> > (2 rows)
>
> Thank you for your report.  I'm looking at this now.

Fixed in 30b4955a46.

--
Regards,
Alexander Korotkov




Re: verify predefined LWLocks have entries in wait_event_names.txt

2024-01-08 Thread Nathan Bossart
Sorry for the noise.  I spent some more time tidying this up for commit,
which I am hoping to do in the next day or two.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 4e222c1a1fc1a2476746cb7b68c1d2a203816699 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 8 Jan 2024 15:44:44 -0600
Subject: [PATCH v6 1/1] Cross-check lists of predefined LWLocks.

Both lwlocknames.txt and wait_event_names.txt contain a list of all
the predefined LWLocks, i.e., those with predefined positions
within MainLWLockArray.  It is easy to miss one or the other,
especially since the list in wait_event_names.txt omits the "Lock"
suffix for all the LWLock wait events.  This commit adds a cross-
check of these lists to the script that generates lwlocknames.h.
If the lists do not match exactly, building will fail.

Suggested-by: Robert Haas
Reviewed-by: Robert Haas, Michael Paquier, Bertrand Drouvot
Discussion: https://postgr.es/m/20240102173120.GA1061678%40nathanxps13
---
 src/backend/Makefile  |  2 +-
 src/backend/storage/lmgr/Makefile |  4 +-
 .../storage/lmgr/generate-lwlocknames.pl  | 49 +++
 .../utils/activity/wait_event_names.txt   | 12 +
 src/include/storage/meson.build   |  4 +-
 5 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/src/backend/Makefile b/src/backend/Makefile
index 816461aa17..7d2ea7d54a 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -130,7 +130,7 @@ $(top_builddir)/src/port/libpgport_srv.a: | submake-libpgport
 parser/gram.h: parser/gram.y
 	$(MAKE) -C parser gram.h
 
-storage/lmgr/lwlocknames.h: storage/lmgr/generate-lwlocknames.pl storage/lmgr/lwlocknames.txt
+storage/lmgr/lwlocknames.h: storage/lmgr/generate-lwlocknames.pl storage/lmgr/lwlocknames.txt utils/activity/wait_event_names.txt
 	$(MAKE) -C storage/lmgr lwlocknames.h lwlocknames.c
 
 utils/activity/wait_event_types.h: utils/activity/generate-wait_event_types.pl utils/activity/wait_event_names.txt
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index c48ba943c4..504480e847 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -39,8 +39,8 @@ s_lock_test: s_lock.c $(top_builddir)/src/common/libpgcommon.a $(top_builddir)/s
 lwlocknames.c: lwlocknames.h
 	touch $@
 
-lwlocknames.h: $(top_srcdir)/src/backend/storage/lmgr/lwlocknames.txt generate-lwlocknames.pl
-	$(PERL) $(srcdir)/generate-lwlocknames.pl $<
+lwlocknames.h: $(top_srcdir)/src/backend/storage/lmgr/lwlocknames.txt $(top_srcdir)/src/backend/utils/activity/wait_event_names.txt generate-lwlocknames.pl
+	$(PERL) $(srcdir)/generate-lwlocknames.pl $^
 
 check: s_lock_test
 	./s_lock_test
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index bef5e16e11..06401dbf01 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -15,6 +15,7 @@ my $continue = "\n";
 GetOptions('outdir:s' => \$output_path);
 
 open my $lwlocknames, '<', $ARGV[0] or die;
+open my $wait_event_names, '<', $ARGV[1] or die;
 
 # Include PID in suffix in case parallel make runs this multiple times.
 my $htmp = "$output_path/lwlocknames.h.tmp$$";
@@ -30,6 +31,40 @@ print $c $autogen, "\n";
 
 print $c "const char *const IndividualLWLockNames[] = {";
 
+#
+# First, record the predefined LWLocks listed in wait_event_names.txt.  We'll
+# cross-check those with the ones in lwlocknames.txt.
+#
+my @wait_event_lwlocks;
+my $record_lwlocks = 0;
+
+while (<$wait_event_names>)
+{
+	chomp;
+
+	# Check for end marker.
+	last if /^# END OF PREDEFINED LWLOCKS/;
+
+	# Skip comments and empty lines.
+	next if /^#/;
+	next if /^\s*$/;
+
+	# Start recording LWLocks when we find the WaitEventLWLock section.
+	if (/^Section: ClassName - WaitEventLWLock$/)
+	{
+		$record_lwlocks = 1;
+		next;
+	}
+
+	# Go to the next line if we are not yet recording LWLocks.
+	next if not $record_lwlocks;
+
+	# Record the LWLock.
+	(my $waiteventname, my $waitevendocsentence) = split(/\t/, $_);
+	push(@wait_event_lwlocks, $waiteventname . "Lock");
+}
+
+my $i = 0;
 while (<$lwlocknames>)
 {
 	chomp;
@@ -50,6 +85,15 @@ while (<$lwlocknames>)
 	die "lwlocknames.txt not in order" if $lockidx < $lastlockidx;
 	die "lwlocknames.txt has duplicates" if $lockidx == $lastlockidx;
 
+	die "$lockname defined in lwlocknames.txt but missing from "
+	  . "wait_event_names.txt"
+	  if $i >= scalar @wait_event_lwlocks;
+	die "lists of predefined LWLocks do not match (first mismatch at "
+	  . "$wait_event_lwlocks[$i] in wait_event_names.txt and $lockname in "
+	  . "lwlocknames.txt)"
+	  if $wait_event_lwlocks[$i] ne $lockname;
+	$i++;
+
 	while ($lastlockidx < $lockidx - 1)
 	{
 		++$lastlockidx;
@@ -63,6 +107,11 @@ while (<$lwlocknames>)
 	print $h "#define $lockname ([$lockidx].lock)\n";
 }
 
+die
+  "$wait_event_lwlocks[$i] defined in 

Re: Adding a pg_get_owned_sequence function?

2024-01-08 Thread Nathan Bossart
On Mon, Jan 08, 2024 at 04:58:02PM +, Dagfinn Ilmari Mannsåker wrote:
> We can't make pg_get_serial_sequence(text, text) not work on identity
> columns any more, that would break existing users, and making the new
> function not work on serial columns would make it harder for people to
> migrate to it.  If you're suggesting adding two new functions,
> pg_get_identity_sequence(regclass, name) and
> pg_get_serial_sequence(regclass, name)¹, which only work on the type of
> column corresponding to the name, I don't think that's great for
> usability or migratability either.

I think these are reasonable concerns, but with this patch, we now have the
following functions:

pg_get_identity_sequence(table regclass, column name) -> regclass
pg_get_serial_sequence(table text, column text) -> text

If we only look at the names, it sure sounds like the first one only works
for identity columns, and the second only works for serial columns.  But
both work for identity _and_ serial.  The real differences between the two
are the parameter and return types.  Granted, this is described in the
documentation updates, but IMHO this is a kind-of bizarre state to end up
in.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-08 Thread Robert Haas
On Fri, Jan 5, 2024 at 3:34 PM Melanie Plageman
 wrote:
> Yes, attached is a patch set which does this. My previous patchset
> already reduced the number of places we unlock the buffer and update
> the freespace map in lazy_scan_heap(). This patchset combines the
> lazy_scan_prune() and lazy_scan_noprune() FSM update cases. I also
> have a version which moves the freespace map updates into
> lazy_scan_prune() and lazy_scan_noprune() -- eliminating all of these
> from lazy_scan_heap(). This is arguably more clear. But Andres
> mentioned he wanted fewer places unlocking the buffer and updating the
> FSM.

Hmm, interesting. I haven't had time to study this fully today, but I
think 0001 looks fine and could just be committed. Hooray for killing
useless variables with dumb names.

This part of 0002 makes me very, very uncomfortable:

+   /*
+* Update all line pointers per the record, and repair
fragmentation.
+* We always pass no_indexes as true, because we don't
know whether or
+* not this option was used when pruning. This reduces
the validation
+* done on replay in an assert build.
+*/
+   heap_page_prune_execute(buffer, true,

redirected, nredirected,
nowdead, ndead,

nowunused, nunused);

The problem that I have with this is that we're essentially saying
that it's ok to lie to heap_page_prune_execute because we know how
it's going to use the information, and therefore we know that the lie
is harmless. But that's not how things are supposed to work. We should
either find a way to tell the truth, or change the name of the
parameter so that it's not a lie, or change the function so that it
doesn't need this parameter in the first place, or something. I can
occasionally stomach this sort of lying as a last resort when there's
no realistic way of adapting the code being called, but that's surely
not the case here -- this is a newborn parameter, and it shouldn't be
a lie on day 1. Just imagine if some future developer thought that the
no_indexes parameter meant that the relation actually didn't have
indexes (the nerve of them!).

I took a VERY fast look through the rest of the patch set and I think
that the overall direction looks like it's probably reasonable, but
that's a very preliminary conclusion which I reserve the right to
revise after studying further. @Andres: Are you planning to
review/commit any of this? Are you hoping that I'm going to do that?
Somebody else want to jump in here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Make psql ignore trailing semicolons in \sf, \ef, etc

2024-01-08 Thread Tom Lane
We had a complaint (see [1], but it's not the first IIRC) about how
psql doesn't behave very nicely if one ends \sf or allied commands
with a semicolon:

regression=# \sf sin(float8);
ERROR:  expected a right parenthesis

This is a bit of a usability gotcha, since many other backslash
commands are forgiving about trailing semicolons.  I looked at
the code and found that it's actually trying to ignore semicolons,
by passing semicolon = true to psql_scan_slash_option.  But that
fails to work because it's also passing type = OT_WHOLE_LINE,
and the whole-line code path ignores the semicolon flag.  Probably
that's just because nobody needed to use that combination back in
the day.  There's another user of OT_WHOLE_LINE, exec_command_help,
which also wants this behavior and has written its own stripping
code to get it.  That seems pretty silly, so here's a quick finger
exercise to move that logic into psql_scan_slash_option.

Is this enough of a bug to deserve back-patching?  I'm not sure.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/CAEs%3D6D%3DnwX2wm0hjkaw6C_LnqR%2BNFtnnzbSzeZq-xcfi_ooKSw%40mail.gmail.com

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9103bc3465..2f74bfdc96 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1640,18 +1640,7 @@ exec_command_help(PsqlScanState scan_state, bool active_branch)
 	if (active_branch)
 	{
 		char	   *opt = psql_scan_slash_option(scan_state,
- OT_WHOLE_LINE, NULL, false);
-		size_t		len;
-
-		/* strip any trailing spaces and semicolons */
-		if (opt)
-		{
-			len = strlen(opt);
-			while (len > 0 &&
-   (isspace((unsigned char) opt[len - 1])
-	|| opt[len - 1] == ';'))
-opt[--len] = '\0';
-		}
+ OT_WHOLE_LINE, NULL, true);
 
 		helpSQL(opt, pset.popt.topt.pager);
 		free(opt);
@@ -3165,6 +3154,10 @@ ignore_slash_filepipe(PsqlScanState scan_state)
  * This *MUST* be used for inactive-branch processing of any slash command
  * that takes an OT_WHOLE_LINE option.  Otherwise we might consume a different
  * amount of option text in active and inactive cases.
+ *
+ * Note: although callers might pass "semicolon" as either true or false,
+ * we need not duplicate that here, since it doesn't affect the amount of
+ * input text consumed.
  */
 static void
 ignore_slash_whole_line(PsqlScanState scan_state)
diff --git a/src/bin/psql/psqlscanslash.l b/src/bin/psql/psqlscanslash.l
index 514977e59d..10a35dffc4 100644
--- a/src/bin/psql/psqlscanslash.l
+++ b/src/bin/psql/psqlscanslash.l
@@ -18,6 +18,8 @@
  */
 #include "postgres_fe.h"
 
+#include 
+
 #include "common.h"
 #include "psqlscanslash.h"
 
@@ -640,7 +642,22 @@ psql_scan_slash_option(PsqlScanState state,
 			termPQExpBuffer();
 			return NULL;
 		case xslashwholeline:
-			/* always okay */
+			/*
+			 * In whole-line mode, we interpret semicolon = true as stripping
+			 * trailing whitespace as well as semi-colons; this gives the
+			 * nearest equivalent to what semicolon = true does in normal
+			 * mode.  Note there's no concept of quoting in this mode.
+			 */
+			if (semicolon)
+			{
+while (mybuf.len > 0 &&
+	   (mybuf.data[mybuf.len - 1] == ';' ||
+		(isascii((unsigned char) mybuf.data[mybuf.len - 1]) &&
+		 isspace((unsigned char) mybuf.data[mybuf.len - 1]
+{
+	mybuf.data[--mybuf.len] = '\0';
+}
+			}
 			break;
 		default:
 			/* can't get here */
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index 5d61e4c7bb..4f3fd46420 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5323,7 +5323,7 @@ END
  LANGUAGE sql
  IMMUTABLE PARALLEL SAFE STRICT COST 1
 1   RETURN ($2 + $1)
-\sf ts_debug(text)
+\sf ts_debug(text);
 CREATE OR REPLACE FUNCTION pg_catalog.ts_debug(document text, OUT alias text, OUT description text, OUT token text, OUT dictionaries regdictionary[], OUT dictionary regdictionary, OUT lexemes text[])
  RETURNS SETOF record
  LANGUAGE sql
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index f199d624d3..c997106b9f 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1315,7 +1315,7 @@ drop role regress_psql_user;
 \sf information_schema._pg_index_position
 \sf+ information_schema._pg_index_position
 \sf+ interval_pl_time
-\sf ts_debug(text)
+\sf ts_debug(text);
 \sf+ ts_debug(text)
 
 -- AUTOCOMMIT


Re: Emitting JSON to file using COPY TO

2024-01-08 Thread Joe Conway

On 1/8/24 14:36, Dean Rasheed wrote:

On Thu, 7 Dec 2023 at 01:10, Joe Conway  wrote:


The attached should fix the CopyOut response to say one column.



Playing around with this, I found a couple of cases that generate an error:

COPY (SELECT 1 UNION ALL SELECT 2) TO stdout WITH (format json);

COPY (VALUES (1), (2)) TO stdout WITH (format json);

both of those generate the following:

ERROR:  record type has not been registered



Thanks -- will have a look

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Removing unneeded self joins

2024-01-08 Thread Alexander Korotkov
On Mon, Jan 8, 2024 at 10:00 PM Alexander Lakhin  wrote:
> Please look at the following query which produces an incorrect result since
> d3d55ce57:
> CREATE TABLE t(a int PRIMARY KEY, b int);
> INSERT INTO t VALUES  (1, 1), (2, 1);
> SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 
> 0);
>
>   a | b
> ---+---
>   1 | 1
> (1 row)
>
> I think that the expected result is:
>   a | b
> ---+---
>   1 | 1
>   2 | 1
> (2 rows)

Thank you for your report.  I'm looking at this now.

--
Regards,
Alexander Korotkov




Re: verify predefined LWLocks have entries in wait_event_names.txt

2024-01-08 Thread Nathan Bossart
On Mon, Jan 08, 2024 at 07:59:10AM +, Bertrand Drouvot wrote:
> +  input: [files(
> +'../../backend/storage/lmgr/lwlocknames.txt',
> +'../../backend/utils/activity/wait_event_names.txt')],
> 
> I think the "[" and "]" are not needed here.

D'oh!  Fixed in v5.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 53c492ac8b345eca1c5c245882073273ddde53c1 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 5 Jan 2024 00:07:40 -0600
Subject: [PATCH v5 1/1] verify lists of predefined LWLocks in lwlocknames.txt
 and wait_event_names.txt match

---
 src/backend/Makefile  |  2 +-
 src/backend/storage/lmgr/Makefile |  4 +-
 .../storage/lmgr/generate-lwlocknames.pl  | 48 +++
 .../utils/activity/wait_event_names.txt   | 12 +
 src/include/storage/meson.build   |  4 +-
 5 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/src/backend/Makefile b/src/backend/Makefile
index 816461aa17..7d2ea7d54a 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -130,7 +130,7 @@ $(top_builddir)/src/port/libpgport_srv.a: | submake-libpgport
 parser/gram.h: parser/gram.y
 	$(MAKE) -C parser gram.h
 
-storage/lmgr/lwlocknames.h: storage/lmgr/generate-lwlocknames.pl storage/lmgr/lwlocknames.txt
+storage/lmgr/lwlocknames.h: storage/lmgr/generate-lwlocknames.pl storage/lmgr/lwlocknames.txt utils/activity/wait_event_names.txt
 	$(MAKE) -C storage/lmgr lwlocknames.h lwlocknames.c
 
 utils/activity/wait_event_types.h: utils/activity/generate-wait_event_types.pl utils/activity/wait_event_names.txt
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index c48ba943c4..504480e847 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -39,8 +39,8 @@ s_lock_test: s_lock.c $(top_builddir)/src/common/libpgcommon.a $(top_builddir)/s
 lwlocknames.c: lwlocknames.h
 	touch $@
 
-lwlocknames.h: $(top_srcdir)/src/backend/storage/lmgr/lwlocknames.txt generate-lwlocknames.pl
-	$(PERL) $(srcdir)/generate-lwlocknames.pl $<
+lwlocknames.h: $(top_srcdir)/src/backend/storage/lmgr/lwlocknames.txt $(top_srcdir)/src/backend/utils/activity/wait_event_names.txt generate-lwlocknames.pl
+	$(PERL) $(srcdir)/generate-lwlocknames.pl $^
 
 check: s_lock_test
 	./s_lock_test
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index bef5e16e11..237c3b9678 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -15,6 +15,7 @@ my $continue = "\n";
 GetOptions('outdir:s' => \$output_path);
 
 open my $lwlocknames, '<', $ARGV[0] or die;
+open my $wait_event_names, '<', $ARGV[1] or die;
 
 # Include PID in suffix in case parallel make runs this multiple times.
 my $htmp = "$output_path/lwlocknames.h.tmp$$";
@@ -30,6 +31,42 @@ print $c $autogen, "\n";
 
 print $c "const char *const IndividualLWLockNames[] = {";
 
+#
+# First, record the predefined LWLocks listed in wait_event_names.txt.  We'll
+# cross-check those with the ones in lwlocknames.txt.
+#
+my @wait_event_lwlocks;
+my $record_lwlocks = 0;
+
+while (<$wait_event_names>)
+{
+	chomp;
+
+	# Check for end marker.
+	last if /^# END OF PREDEFINED LWLOCKS/;
+
+	# Skip comments and empty lines.
+	next if /^#/;
+	next if /^\s*$/;
+
+	# Start recording LWLocks when we find the WaitEventLWLock section.
+	if (/^Section: ClassName(.*)/)
+	{
+		my $section_name = $_;
+		$section_name =~ s/^.*- //;
+		$record_lwlocks = 1 if $section_name eq "WaitEventLWLock";
+		next;
+	}
+
+	# Go to the next line if we are not yet recording LWLocks.
+	next if not $record_lwlocks;
+
+	# Record the LWLock.
+	(my $waiteventname, my $waitevendocsentence) = split(/\t/, $_);
+	push(@wait_event_lwlocks, $waiteventname . "Lock");
+}
+
+my $i = 0;
 while (<$lwlocknames>)
 {
 	chomp;
@@ -50,6 +87,14 @@ while (<$lwlocknames>)
 	die "lwlocknames.txt not in order" if $lockidx < $lastlockidx;
 	die "lwlocknames.txt has duplicates" if $lockidx == $lastlockidx;
 
+	die $lockname . " defined in lwlocknames.txt but missing from wait_event_names.txt"
+	  unless $i < scalar(@wait_event_lwlocks);
+	die "lists of predefined LWLocks do not match (first mismatch at " .
+		$wait_event_lwlocks[$i] . " in wait_event_names.txt and " .
+		$lockname . " in lwlocknames.txt)"
+	  unless $wait_event_lwlocks[$i] eq $lockname;
+	$i++;
+
 	while ($lastlockidx < $lockidx - 1)
 	{
 		++$lastlockidx;
@@ -63,6 +108,9 @@ while (<$lwlocknames>)
 	print $h "#define $lockname ([$lockidx].lock)\n";
 }
 
+die $wait_event_lwlocks[$i] . " defined in wait_event_names.txt but missing from lwlocknames.txt"
+  unless scalar(@wait_event_lwlocks) eq $i;
+
 printf $c "\n};\n";
 print $h "\n";
 printf $h "#define NUM_INDIVIDUAL_LWLOCKS		%s\n", $lastlockidx + 1;
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 

Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-08 Thread Robert Haas
On Fri, Jan 5, 2024 at 3:57 PM Andres Freund  wrote:
> > I will be astonished if you can make this work well enough to avoid
> > huge regressions in plausible cases. There are plenty of cases where
> > we do a very thorough job opportunistically removing index tuples.
>
> These days the AM is often involved with that, via
> table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
> happen before physically removing the already-marked-killed index entries. We
> can't rely on being able to actually prune the heap page at that point, there
> might be other backends pinning it, but often we will be able to. If we were
> to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
> index again during "individual tuple pruning", if the to-be-marked-unused heap
> tuple is one of the tuples passed to heap_index_delete_tuples(). Which
> presumably will be very commonly the case.
>
> At least for nbtree, we are much more aggressive about marking index entries
> as killed, than about actually removing the index entries. "individual tuple
> pruning" would have to look for killed-but-still-present index entries, not
> just for "live" entries.

I don't want to derail this thread, but I don't really see what you
have in mind here. The first paragraph sounds like you're imagining
that while pruning the index entries we might jump over to the heap
and clean things up there, too, but that seems like it wouldn't work
if the table has more than one index. I thought you were talking about
starting with a heap tuple and bouncing around to every index to see
if we can find index pointers to kill in every one of them. That
*could* work out, but you only need one index to have been
opportunistically cleaned up in order for it to fail to work out.
There might well be some workloads where that's often the case, but
the regressions in the workloads where it isn't the case seem like
they would be rather substantial, because doing an extra lookup in
every index for each heap tuple visited sounds pricey.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: printing raw parse tree

2024-01-08 Thread Rafsun Masud

Yeah.  I think there's an unofficial policy for post-analysis parse
trees that we don't bother writing outfuncs for utility-statement
nodes (there are too many of 'em and they're not interesting enough)
but anything that can appear in or under DML commands should be
dumpable.  I'd favor the equivalent rule for raw parse trees --- if
we're missing anything DML-ish let's add it.

regards, tom lane

Is there a way to make debug_print_parse output in JSON format?
--
Regards,
Rafsun Masud




Re: Removing unneeded self joins

2024-01-08 Thread Alexander Lakhin

Hello Andrei and Alexander,

Please look at the following query which produces an incorrect result since
d3d55ce57:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES  (1, 1), (2, 1);
SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 0);

 a | b
---+---
 1 | 1
(1 row)

I think that the expected result is:
 a | b
---+---
 1 | 1
 2 | 1
(2 rows)

Best regards,
Alexander




Re: the s_lock_stuck on perform_spin_delay

2024-01-08 Thread Robert Haas
On Sun, Jan 7, 2024 at 9:52 PM Andy Fan  wrote:
> > I think we should add cassert-only infrastructure tracking whether we
> > currently hold spinlocks, are in a signal handler and perhaps a few other
> > states. That'd allow us to add assertions like:
> ..
> > - no lwlocks or ... while in signal handlers
>
> I *wish* lwlocks should *not* be held while in signal handlers since it
> inspired me for a direction of a low-frequency internal bug where a
> backend acuqire a LWLock when it has acuqired it before. However when I
> read more document and code, I am thinking this should not be a
> problem.

It's not safe to acquire an LWLock in a signal handler unless we know
that the code that the signal handler is interrupting can't already be
doing that. Consider this code from LWLockAcquire:

/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;

Imagine that we're executing this code and we get to the point where
we've set held_lwlocks[num_held_lwlocks].lock = lock and
held_lwlock[num_held_lwlocks].mode = mode, but we haven't yet
incremented num_held_lwlocks. Then a signal arrives and we jump into
the signal handler, which also calls LWLockAcquire(). Hopefully you
can see that the new lock and mode will be written over the old one,
and now held_lwlocks[] is corrupted. Oops.

Because the PostgreSQL code relies extensively on global variables, it
has problems like this all over the place. Another example is the
error-reporting infrastructure. ereport(yadda yadda...) fills out a
bunch of global variables before really going and doing anything. If a
signal handler were to fire and start another ereport(...), chaos
would ensue, for similar reasons as here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Emitting JSON to file using COPY TO

2024-01-08 Thread Dean Rasheed
On Thu, 7 Dec 2023 at 01:10, Joe Conway  wrote:
>
> The attached should fix the CopyOut response to say one column.
>

Playing around with this, I found a couple of cases that generate an error:

COPY (SELECT 1 UNION ALL SELECT 2) TO stdout WITH (format json);

COPY (VALUES (1), (2)) TO stdout WITH (format json);

both of those generate the following:

ERROR:  record type has not been registered

Regards,
Dean




Re: SQL:2011 application time

2024-01-08 Thread Paul Jungwirth

On 1/8/24 06:54, jian he wrote:
> On Fri, Jan 5, 2024 at 1:06 PM jian he  wrote:
>
> range_intersect returns the intersection of two ranges.
> I think here we are doing the opposite.
> names the main SQL function "range_not_intersect" and the internal
> function as "range_not_intersect_internal" should be fine.
> so people don't need to understand the meaning of "portion".

Thank you for helping me figure out a name here! I realize that can be a bike-sheddy kind of 
discussion, so let me share some of my principles.


Range and multirange are highly mathematically "pure", and that's something I value in them. It 
makes them more general-purpose, less encumbered by edge cases, easier to combine, and easier to 
reason about. Preserving that close connection to math is a big goal.


What I've called `without_portion` is (like) a closed form of minus (hence `@-` for the operator). 
Minus isn't closed under everything (e.g. ranges), so `without_portion` adds arrays---much as to 
close subtraction we add negative numbers and to close division we add rationals). We get the same 
effect from multiranges, but that only buys us range support. It would be awesome to support 
arbitrary types: ranges, multiranges, mdranges, boxes, polygons, inets, etc., so I think an array is 
the way to go here. And then each array element is a "leftover". What do we call a closed form of 
minus that returns arrays?


Using "not" suggests a function that returns true/false, but `@-` returns an array of things. So 
instead of "not" let's consider "complement". I think that's what you're expressing re intersection.


But `@-` is not the same as the complement of intersection. For one thing, `@-` is not commutative. 
`old_range @- target_portion` is not the same as `target_portion @- old_range`. But 
`complement(old_range * target_portion)` *is* the same as `complement(target_portion * old_range)`. 
Or from another angle: it's true that `old_range @- target_portion = old_range @- (old_range * 
target_portion)`, but the intersection isn't "doing" anything here. It's true that intersection and 
minus both "reduce" what you put in, but minus is more accurate.


So I think we want a name that captures that idea of "minus". Both "not" and "intersection" are 
misleading IMO.


Of course "minus" is already taken (and you wouldn't expect it to return arrays anyway), which is 
why I'm thinking about names like "without" or "except". Or maybe "multi-minus". I still think 
"without portion" is the closest to capturing everything above (and avoids ambiguity with other SQL 
operations). And the "portion" ties the operator to `FOR PORTION OF`, which is its purpose. But I 
wouldn't be surprised if there were something better.


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: psql JSON output format

2024-01-08 Thread Dean Rasheed
On Mon, 18 Dec 2023 at 16:34, Jelte Fennema-Nio  wrote:
>
> On Mon, 18 Dec 2023 at 16:38, Christoph Berg  wrote:
> > We'd want both patches even if they do the same thing on two different
> > levels, I'd say.
>
> Makes sense.
>

I can see the appeal in this feature. However, as it stands, this
isn't compatible with copy format json, and I think it would need to
duplicate quite a lot of the JSON output code in client-side code to
make it compatible.

Consider, for example:

CREATE TABLE foo(col json);
INSERT INTO foo VALUES ('"str_value"');

copy foo to stdout with (format json) produces this:

{"col":"str_value"}

which is as expected. However, psql -Jc "select * from foo" produces

[
{ "col": "\"str_value\"" }
]

The problem is, various datatypes such as boolean, number types, json,
and jsonb must not be quoted and escaped, since that would change them
to strings or double-encode them in the result. And then there are
domain types built on top of those types, and arrays, etc. See, for
example, the logic in json_categorize_type(). I think that trying to
duplicate that client-side is doomed to failure.

Regards,
Dean




Re: add AVX2 support to simd.h

2024-01-08 Thread Nathan Bossart
On Mon, Jan 08, 2024 at 02:01:39PM +0700, John Naylor wrote:
> On Thu, Nov 30, 2023 at 12:15 AM Nathan Bossart
>  wrote:
>>   writerssse2avx2 %
>>   25611951188-1
>>   512 9281054   +14
>>  1024 633 716   +13
>>  2048 332 420   +27
>>  4096 162 203   +25
>>  8192 162 182   +12
> 
> There doesn't seem to be any benefit at 256 at all. Is that expected
> and/or fine?

My unverified assumption is that the linear searches make up much less of
the benchmark at these lower client counts, so any improvements we make
here are unlikely to show up here.  IIRC even the hash table approach that
we originally explored for XidInMVCCSnapshot() didn't do much, if anything,
for the benchmark at lower client counts.

> Here, the peak throughput seems to be around 64 writers with or
> without the patch from a couple years ago, but the slope is shallower
> after that. It would be good to make sure that it can't regress near
> the peak, even with a "long tail" case (see next paragraph). The first
> benchmark above starts at 256, so we can't tell where the peak is. It
> might be worth it to also have a microbenchmark because the systemic
> one has enough noise to obscure what's going on unless there are a
> very large number of writers. We know what a systemic benchmark can
> tell us on extreme workloads past the peak, and the microbenchmark
> would tell us "we need to see X improvement here in order to see Y
> improvement in the system benchmark".

Yes, will do.

> I suspect that there could be a regression lurking for some inputs
> that the benchmark doesn't look at: pg_lfind32() currently needs to be
> able to read 4 vector registers worth of elements before taking the
> fast path. There is then a tail of up to 15 elements that are now
> checked one-by-one, but AVX2 would increase that to 31. That's getting
> big enough to be noticeable, I suspect. It would be good to understand
> that case (n*32 + 31), because it may also be relevant now. It's also
> easy to improve for SSE2/NEON for v17.

Good idea.  If it is indeed noticeable, we might be able to "fix" it by
processing some of the tail with shorter vectors.  But that probably means
finding a way to support multiple vector sizes on the same build, which
would require some work.

> Also, by reading 4 registers per loop iteration, that's 128 bytes on
> AVX2. I'm not sure that matters, but we shouldn't assume it doesn't.
> Code I've seen elsewhere reads a fixed 64-byte block, and then uses 1,
> 2, or 4 registers to handle it, depending on architecture. Whether or
> not that's worth it in this case, this patch does mean future patches
> will have to wonder if they have to do anything differently depending
> on vector length, whereas now they don't. That's not a deal-breaker,
> but it is a trade-off to keep in mind.

Yeah.  Presently, this AVX2 patch just kicks the optimization down the road
a bit for the existing use-cases, so you don't start using the vector
registers until there's more data to work with, which might not even be
noticeable.  But it's conceivable that vector length could matter at some
point, even if it doesn't matter much now.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-08 Thread Peter Geoghegan
On Fri, Jan 5, 2024 at 12:59 PM Melanie Plageman
 wrote:
> > But you can just as easily turn this argument on its head, can't you?
> > In general, except for HOT tuples, line pointers are marked dead by
> > pruning and unused by vacuum. Here you want to turn it on its head and
> > make pruning do what would normally be vacuum's responsibility.
>
> I actually think we are going to want to stop referring to these steps
> as pruning and vacuuming. It is confusing because vacuuming refers to
> the whole process of doing garbage collection on the table and also to
> the specific step of setting dead line pointers unused. If we called
> these steps say, pruning and reaping, that may be more clear.

What about index VACUUM records? Should they be renamed to REAP records, too?

> Vacuuming consists of three phases -- the first pass, index vacuuming,
> and the second pass. I don't think we should dictate what happens in
> each pass. That is, we shouldn't expect only pruning to happen in the
> first pass and only reaping to happen in the second pass.

Why not? It's not self-evident that it matters much either way. I
don't see it being worth the complexity (which is not to say that I'm
opposed to what you're trying to do here).

Note that we only need a cleanup for the first heap pass right now
(not the second heap pass). So if you're going to prune in the second
heap pass, you're going to have to add a mechanism that makes it safe
(by acquiring a cleanup lock once we decide that we actually want to
prune, say). Or maybe you'd just give up on the fact that we don't
need cleanup locks for the second hea pass these days instead (which
seems like a step backwards).

> For example,
> I think Andres has previously proposed doing another round of pruning
> after index vacuuming. The second pass/third phase is distinguished
> primarily by being after index vacuuming.

I think of the second pass/third phase as being very similar to index vacuuming.

Both processes/phases don't require cleanup locks (actually nbtree
VACUUM does require cleanup locks, but the reasons why are pretty
esoteric, and not shared by every index AM). And, both
processes/phases don't need to generate their own recovery conflicts.
Neither type of WAL record requires a snapshotConflictHorizon field of
its own, since we can safely assume that some PRUNE record must have
taken care of all that earlier on.

-- 
Peter Geoghegan




Re: Fix bogus Asserts in calc_non_nestloop_required_outer

2024-01-08 Thread Robert Haas
On Sat, Jan 6, 2024 at 4:08 PM Tom Lane  wrote:
> > Well, this explanation made sense to me:
> > https://www.postgresql.org/message-id/CAMbWs4-%2BGs0HJ9ouBUb%3DqwHsGCXxG%2B92eJzLOpCkedvgtOWQ%3DQ%40mail.gmail.com
>
> The argument for the patch as proposed is that we should make the
> mergejoin and hashjoin code paths do what the nestloop path is doing.
> However, as I replied further down in that other thread, I'm not
> exactly convinced that the nestloop path is doing the right thing.
> I've not had time to look closer at that, though.

I don't really understand what you were saying in your response there,
or what you're saying here. It seems to me that if the path is
parameterized by top relids, and the assertion is verifying that a
certain set of non-toprelids i.e. otherrels isn't present, then
obviously the assertion is going to pass, but it's not checking what
it purports to be checking. The ostensible purpose of the assertion is
to check that neither side is parameterized by the other side, because
a non-nestloop can't satisfy a parameterization. But with the
assertion as currently formulated, it would pass even if one side
*were* parameterized by the other side, because outer_paramrels and
inner_paramrels would contain child relations, and the set that it was
being compared to would contain only top-parents, and so they wouldn't
overlap.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Commitfest 2024-01 first week update

2024-01-08 Thread Jelte Fennema-Nio
On Mon, 8 Jan 2024 at 07:22, vignesh C  wrote:
> Here is a list of "Needs review" entries for which there has not been
> much communication on the thread and needs help in proceeding further.

Thank you for creating these lists. It's definitely helpful to see
what to focus my reviewing effort on. I noticed they are missing one
of my commitfest entries though:

Add non-blocking version of PQcancel | Jelte Fennema-Nio

imho this patch is pretty much finished, but it has only received
English spelling feedback in the last 8 months (I addressed the
feedback).




Re: introduce dynamic shared memory registry

2024-01-08 Thread Nathan Bossart
On Mon, Jan 08, 2024 at 11:13:42AM +0530, Amul Sul wrote:
> +void *
> +dsm_registry_init_or_attach(const char *key, size_t size,
> 
> I think the name could be simple as dsm_registry_init() like we use
> elsewhere e.g. ShmemInitHash() which doesn't say attach explicitly.

That seems reasonable to me.

> Similarly, I think dshash_find_or_insert() can be as simple as
> dshash_search() and
> accept HASHACTION like hash_search().

I'm not totally sure what you mean here.  If you mean changing the dshash
API, I'd argue that's a topic for another thread.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: introduce dynamic shared memory registry

2024-01-08 Thread Nathan Bossart
On Mon, Jan 08, 2024 at 10:53:17AM +0530, Bharath Rupireddy wrote:
> 1. I think we need to add some notes about this new way of getting
> shared memory for external modules in the Shared Memory and
> LWLocks section in xfunc.sgml? This will at least tell there's
> another way for external modules to get shared memory, not just with
> the shmem_request_hook and shmem_startup_hook. What do you think?

Good call.  I definitely think this stuff ought to be documented.  After a
quick read, I also wonder if it'd be worth spending some time refining that
section.

> 2. FWIW, I'd like to call this whole feature "Support for named DSM
> segments in Postgres". Do you see anything wrong with this?

Why do you feel it should be renamed?  I don't see anything wrong with it,
but I also don't see any particular advantage with that name compared to
"dynamic shared memory registry."

> 3. IIUC, this feature eventually makes both shmem_request_hook and
> shmem_startup_hook pointless, no? Or put another way, what's the
> significance of shmem request and startup hooks in lieu of this new
> feature? I think it's quite possible to get rid of the shmem request
> and startup hooks (of course, not now but at some point in future to
> not break the external modules), because all the external modules can
> allocate and initialize the same shared memory via
> dsm_registry_init_or_attach and its init_callback. All the external
> modules will then need to call dsm_registry_init_or_attach in their
> _PG_init callbacks and/or in their bg worker's main functions in case
> the modules intend to start up bg workers. Am I right?

Well, modules might need to do a number of other things (e.g., adding
hooks) that can presently only be done when preloaded, in which case I
doubt there's much benefit from switching to the DSM registry.  I don't
really intend for it to replace the existing request/startup hooks, but
you're probably right that most, if not all, could use the registry
instead.  IMHO this is well beyond the scope of this thread, though.

> 4. With the understanding in comment #3, can pg_stat_statements and
> test_slru.c get rid of its shmem_request_hook and shmem_startup_hook
> and use dsm_registry_init_or_attach? It's not that this patch need to
> remove them now, but just asking if there's something in there that
> makes this new feature unusable.

It might be possible, but IIUC you'd still need a way to know whether the
library was preloaded, i.e., all the other necessary hooks were in place.
It's convenient to just be able to check whether the shared memory was set
up for this purpose.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add a perl function in Cluster.pm to generate WAL

2024-01-08 Thread Alexander Lakhin

Hello Bertrand,

08.01.2024 10:34, Bertrand Drouvot wrote:

If one is able to reproduce, would it be possible to change the test and launch
the vacuum in verbose mode? That way, we could see if this is somehow due to [1]
(means something holding global xmin).


Yes, I've added (VERBOSE) and also cut down the test to catch the failure 
faster.
The difference between a successful and a failed run:
2024-01-08 11:58:30.679 UTC [668363] 035_standby_logical_decoding.pl INFO:  vacuuming 
"testdb.pg_catalog.pg_authid"
2024-01-08 11:58:30.679 UTC [668363] 035_standby_logical_decoding.pl INFO:  finished vacuuming 
"testdb.pg_catalog.pg_authid": index scans: 1

    pages: 0 removed, 1 remain, 1 scanned (100.00% of total)
    tuples: 1 removed, 15 remain, 0 are dead but not yet removable
    removable cutoff: 767, which was 0 XIDs old when operation ended
    new relfrozenxid: 767, which is 39 XIDs ahead of previous value
    frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
    index scan needed: 1 pages from table (100.00% of total) had 1 dead 
item identifiers removed
---
2024-01-08 12:00:59.903 UTC [669119] 035_standby_logical_decoding.pl LOG:  
statement: VACUUM (VERBOSE) pg_authid;
2024-01-08 12:00:59.904 UTC [669119] 035_standby_logical_decoding.pl INFO:  vacuuming 
"testdb.pg_catalog.pg_authid"
2024-01-08 12:00:59.904 UTC [669119] 035_standby_logical_decoding.pl INFO:  finished vacuuming 
"testdb.pg_catalog.pg_authid": index scans: 0

    pages: 0 removed, 1 remain, 1 scanned (100.00% of total)
    tuples: 0 removed, 16 remain, 1 are dead but not yet removable
    removable cutoff: 766, which was 1 XIDs old when operation ended
    new relfrozenxid: 765, which is 37 XIDs ahead of previous value
    frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
    index scan not needed: 0 pages from table (0.00% of total) had 0 dead 
item identifiers removed

The difference in WAL is essentially the same as I observed before [1].
rmgr: Transaction len (rec/tot): 82/    82, tx:    766, lsn: 0/0403D418, prev 0/0403D3D8, desc: COMMIT 
2024-01-08 11:58:30.679035 UTC; inval msgs: catcache 11 catcache 10
rmgr: Heap2   len (rec/tot): 57/    57, tx:  0, lsn: 0/0403D470, prev 0/0403D418, desc: PRUNE 
snapshotConflictHorizon: 766, nredirected: 0, ndead: 1, isCatalogRel: T, nunused: 0, redirected: [], dead: [16], unused: 
[], blkref #0: rel 1664/0/1260 blk 0
rmgr: Btree   len (rec/tot): 52/    52, tx:  0, lsn: 0/0403D4B0, prev 0/0403D470, desc: VACUUM ndeleted: 
1, nupdated: 0, deleted: [1], updated: [], blkref #0: rel 1664/0/2676 blk 1
rmgr: Btree   len (rec/tot): 52/    52, tx:  0, lsn: 0/0403D4E8, prev 0/0403D4B0, desc: VACUUM ndeleted: 
1, nupdated: 0, deleted: [16], updated: [], blkref #0: rel 1664/0/2677 blk 1
rmgr: Heap2   len (rec/tot): 50/    50, tx:  0, lsn: 0/0403D520, prev 0/0403D4E8, desc: VACUUM nunused: 
1, unused: [16], blkref #0: rel 1664/0/1260 blk 0
rmgr: Heap2   len (rec/tot): 64/  8256, tx:  0, lsn: 0/0403D558, prev 0/0403D520, desc: VISIBLE 
snapshotConflictHorizon: 0, flags: 0x07, blkref #0: rel 1664/0/1260 fork vm blk 0 FPW, blkref #1: rel 1664/0/1260 blk 0
rmgr: Heap    len (rec/tot):    225/   225, tx:  0, lsn: 0/0403F5B0, prev 0/0403D558, desc: INPLACE off: 26, 
blkref #0: rel 1663/16384/16410 blk 4

vs
rmgr: Transaction len (rec/tot): 82/    82, tx:    766, lsn: 0/0403F480, prev 0/0403F440, desc: COMMIT 
2024-01-08 12:00:59.901582 UTC; inval msgs: catcache 11 catcache 10
rmgr: XLOG    len (rec/tot): 49/  8241, tx:  0, lsn: 0/0403F4D8, prev 0/0403F480, desc: FPI_FOR_HINT , 
blkref #0: rel 1664/0/1260 fork fsm blk 2 FPW
rmgr: XLOG    len (rec/tot): 49/  8241, tx:  0, lsn: 0/04041528, prev 0/0403F4D8, desc: FPI_FOR_HINT , 
blkref #0: rel 1664/0/1260 fork fsm blk 1 FPW
rmgr: XLOG    len (rec/tot): 49/  8241, tx:  0, lsn: 0/04043578, prev 0/04041528, desc: FPI_FOR_HINT , 
blkref #0: rel 1664/0/1260 fork fsm blk 0 FPW
rmgr: Heap    len (rec/tot):    225/   225, tx:  0, lsn: 0/040455C8, prev 0/04043578, desc: INPLACE off: 26, 
blkref #0: rel 1663/16384/16410 blk 4


(Complete logs and the modified test are attached.)

With FREEZE, 10 iterations with 20 tests in parallel succeeded for me
(while without it, I get failures on iterations 1,2).

[1] 
https://www.postgresql.org/message-id/b2a1f7d0-7629-72c0-2da7-e9c4e336b010%40gmail.com

Best regards,
Alexander

035-sld.tar.gz
Description: application/gzip


Re: Adding a pg_get_owned_sequence function?

2024-01-08 Thread Dagfinn Ilmari Mannsåker
vignesh C  writes:

> On Tue, 24 Oct 2023 at 22:00, Nathan Bossart  wrote:
>>
>> On Tue, Sep 12, 2023 at 03:53:28PM +0100, Dagfinn Ilmari Mannsåker wrote:
>> > Tom Lane  writes:
>> >> It's possible that we could get away with just summarily changing
>> >> the argument type from text to regclass.  ISTR that we did exactly
>> >> that with nextval() years ago, and didn't get too much push-back.
>> >> But we couldn't do the same for the return type.  Also, this
>> >> approach does nothing for the concern about the name being
>> >> misleading.
>> >
>> > Maybe we should go all the way the other way, and call it
>> > pg_get_identity_sequence() and claim that "serial" is a legacy form of
>> > identity columns?
>>
>> Hm.  Could we split it into two functions, pg_get_owned_sequence() and
>> pg_get_identity_sequence()?  I see that commit 3012061 [0] added support
>> for identity columns to pg_get_serial_sequence() because folks expected
>> that to work, so maybe that's a good reason to keep them together.  If we
>> do elect to keep them combined, I'd be okay with renaming it to
>> pg_get_identity_sequence() along with your other proposed changes.

We can't make pg_get_serial_sequence(text, text) not work on identity
columns any more, that would break existing users, and making the new
function not work on serial columns would make it harder for people to
migrate to it.  If you're suggesting adding two new functions,
pg_get_identity_sequence(regclass, name) and
pg_get_serial_sequence(regclass, name)¹, which only work on the type of
column corresponding to the name, I don't think that's great for
usability or migratability either.

> I have changed the status of the patch to "Waiting on Author" as
> Nathan's comments have not yet been followed up.

Thanks for the nudge, here's an updated patch that together with the
above addresses them.  I've changed the commitfest entry back to "Needs
review".

> Regards,
> Vignesh

- ilmari

>From 75ed637ec4ed84ac92eb7385fd295b7d5450a450 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Fri, 9 Jun 2023 19:55:58 +0100
Subject: [PATCH v2] Add pg_get_identity_sequence function
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The docs docs say `pg_get_serial_sequence` should have been called
`pg_get_owned_sequence`, but identity columns' sequences are not owned
in the sense of `ALTER SEQUENCE … SET OWNED BY`, so instead call it
`pg_get_identity_sequence`.  This gives us the opportunity to change
the return and table argument types to `regclass` and the column
argument to `name`, instead of using `text` everywhere.  This matches
what's in catalogs, and requires less explaining than the rules for
`pg_get_serial_sequence`.
---
 doc/src/sgml/func.sgml | 37 +++---
 src/backend/utils/adt/ruleutils.c  | 69 +++---
 src/include/catalog/pg_proc.dat|  3 ++
 src/test/regress/expected/identity.out |  6 +++
 src/test/regress/expected/sequence.out |  6 +++
 src/test/regress/sql/identity.sql  |  1 +
 src/test/regress/sql/sequence.sql  |  1 +
 7 files changed, 98 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cec21e42c0..ff4fa5a0c4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24589,13 +24589,13 @@
   

 
- pg_get_serial_sequence
+ pg_get_identity_sequence
 
-pg_get_serial_sequence ( table text, column text )
-text
+pg_get_identity_sequence ( table regclass, column name )
+regclass


-Returns the name of the sequence associated with a column,
+Returns the sequence associated with identity or serial column,
 or NULL if no sequence is associated with the column.
 If the column is an identity column, the associated sequence is the
 sequence internally created for that column.
@@ -24604,10 +24604,31 @@
 it is the sequence created for that serial column definition.
 In the latter case, the association can be modified or removed
 with ALTER SEQUENCE OWNED BY.
-(This function probably should have been
-called pg_get_owned_sequence; its current name
-reflects the fact that it has historically been used with serial-type
-columns.)  The first parameter is a table name with optional
+The result is suitable for passing to the sequence functions (see
+).
+   
+   
+A typical use is in reading the current value of the sequence for an
+identity or serial column, for example:
+
+SELECT currval(pg_get_identity_sequence('sometable', 'id'));
+
+   
+  
+
+  
+   
+
+ pg_get_serial_sequence
+
+pg_get_serial_sequence ( table text, column text )
+text
+   
+   
+A backwards-compatibility wrapper
+for 

Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2024-01-08 Thread Tom Lane
Richard Guo  writes:
> On Sun, Jan 7, 2024 at 6:41 AM Tom Lane  wrote:
>> Thanks for the report!  I guess we need something like the attached.

> +1.

Pushed, thanks for looking at it.

>> I'm surprised that this hasn't been noticed before; was the case
>> really unreachable before?

> It seems that this case is only reachable with Vars of an INSERT target
> relation, and it seems that there is no other way to reference such a
> Var other than using CTE.

I'm a little uncomfortable with that conclusion, but for the moment
I refrained from back-patching.  We can always add the patch to v16
later if we find it's not so unreachable.  (Before v16, there was
no find_base_rel here at all.)

regards, tom lane




Re: pg_stat_statements and "IN" conditions

2024-01-08 Thread Dmitry Dolgov
> On Sat, Jan 06, 2024 at 09:04:54PM +0530, vignesh C wrote:
>
> CFBot shows documentation build has failed at [1] with:
> [07:44:55.531] time make -s -j${BUILD_JOBS} -C doc
> [07:44:57.987] postgres.sgml:572: element xref: validity error : IDREF
> attribute linkend references an unknown ID
> "guc-query-id-const-merge-threshold"
> [07:44:58.179] make[2]: *** [Makefile:70: postgres-full.xml] Error 4
> [07:44:58.179] make[2]: *** Deleting file 'postgres-full.xml'
> [07:44:58.181] make[1]: *** [Makefile:8: all] Error 2
> [07:44:58.182] make: *** [Makefile:16: all] Error 2
>
> [1] - https://cirrus-ci.com/task/6688578378399744

Indeed, after moving the configuration option to pgss I forgot to update
its reference in the docs. Thanks for noticing, will update soon.




Re: Random pg_upgrade test failure on drongo

2024-01-08 Thread Jim Nasby

On 1/4/24 10:19 PM, Amit Kapila wrote:

On Thu, Jan 4, 2024 at 5:30 PM Alexander Lakhin  wrote:


03.01.2024 14:42, Amit Kapila wrote:





And the internal process is ... background writer (BgBufferSync()).

So, I tried just adding bgwriter_lru_maxpages = 0 to postgresql.conf and
got 20 x 10 tests passing.

Thus, it we want just to get rid of the test failure, maybe it's enough to
add this to the test's config...


What about checkpoints? Can't it do the same while writing the buffers?


As we deal here with pg_upgrade/pg_restore, it must not be very easy to get
the desired effect, but I think it's not impossible in principle.
More details below.
What happens during the pg_upgrade execution is essentially:
1) CREATE DATABASE "postgres" WITH TEMPLATE = template0 OID = 5 ...;
-- this command flushes file buffers as well
2) ALTER DATABASE postgres OWNER TO ...
3) COMMENT ON DATABASE "postgres" IS ...
4) -- For binary upgrade, preserve pg_largeobject and index relfilenodes
  SELECT 
pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid);
  SELECT 
pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid);
  TRUNCATE pg_catalog.pg_largeobject;
--  ^^^ here we can get the error "could not create file "base/5/2683": File 
exists"
...

We get the effect discussed when the background writer process decides to
flush a file buffer for pg_largeobject during stage 1.
(Thus, if a checkpoint somehow happened to occur during CREATE DATABASE,
the result must be the same.)
And another important factor is shared_buffers = 1MB (set during the test).
With the default setting of 128MB I couldn't see the failure.

It can be reproduced easily (on old Windows versions) just by running
pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the
default cluster)).
If an old cluster contains dozen of databases, this increases the failure
probability significantly (with 10 additional databases I've got failures
on iterations 4, 1, 6).



I don't have an old Windows environment to test but I agree with your
analysis and theory. The question is what should we do for these new
random BF failures? I think we should set bgwriter_lru_maxpages to 0
and checkpoint_timeout to 1hr for these new tests. Doing some invasive
fix as part of this doesn't sound reasonable because this is an
existing problem and there seems to be another patch by Thomas that
probably deals with the root cause of the existing problem [1] as
pointed out by you.

[1] - https://commitfest.postgresql.org/40/3951/


Isn't this just sweeping the problem (non-POSIX behavior on SMB and 
ReFS) under the carpet? I realize that synthetic test workloads like 
pg_upgrade in a loop aren't themselves real-world scenarios, but what 
about other cases? Even if we're certain it's not possible for these 
issues to wedge a server, it's still not a good experience for users to 
get random, unexplained IO-related errors...

--
Jim Nasby, Data Architect, Austin TX





Re: brininsert optimization opportunity

2024-01-08 Thread Alvaro Herrera
On 2023-Dec-12, Tomas Vondra wrote:

> I propose we do a much simpler thing instead - allow the cache may be
> initialized / cleaned up repeatedly, and make sure it gets reset at
> convenient place (typically after index_insert calls that don't go
> through execIndexing). That'd mean the cleanup does not need to happen
> very far from the index_insert(), which makes the reasoning much easier.
> 0002 does this.

I'm not in love with this 0002 patch; I think the layering after 0001 is
correct in that the insert_cleanup call should remain in validate_index
and called after the whole thing is done, but 0002 changes things so
that now every table AM has to worry about doing this correctly; and a
bug of omission will not be detected unless you have a BRIN index on
such a table and happen to use CREATE INDEX CONCURRENTLY.  So a
developer has essentially zero chance to do things correctly, which I
think we'd rather avoid.

So I think we should do 0001 and perhaps some further tweaks to the
original brininsert optimization commit: I think the aminsertcleanup
callback should receive the indexRelation as first argument; and also I
think it's not index_insert_cleanup() job to worry about whether
ii_AmCache is NULL or not, but instead the callback should be invoked
always, and then it's aminsertcleanup job to do nothing if ii_AmCache is
NULL.  That way, the index AM API doesn't have to worry about which
parts of IndexInfo (or the indexRelation) is aminsertcleanup going to
care about.  If we decide to change this, then the docs also need a bit
of tweaking I think.

Lastly, I kinda disagree with the notion that only some of the callers
of aminsert should call aminsertcleanup, even though btree doesn't have
an aminsertcleanup and thus it can't affect TOAST or catalogs.  Maybe we
can turn index_insert_cleanup into an inline function, which can quickly
do nothing if aminsertcleanup isn't defined.  Then we no longer have the
layering violation where we assume that btree doesn't care.  But the
proposed change in this paragraph can be maybe handled separately to
avoid confusing things with the bugfix in the two paragraphs above.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-08 Thread Alvaro Herrera
On 2024-Jan-08, Dilip Kumar wrote:

> On Mon, Jan 8, 2024 at 4:55 PM Alvaro Herrera  wrote:
> >
> > The more I look at TransactionGroupUpdateXidStatus, the more I think
> > it's broken, and while we do have some tests, I don't have confidence
> > that they cover all possible cases.
> >
> > Or, at least, if this code is good, then it hasn't been sufficiently
> > explained.
> 
> Any thought about a case in which you think it might be broken, I mean
> any vague thought might also help where you think it might not work as
> expected so that I can also think in that direction.  It might be
> possible that I might not be thinking of some perspective that you are
> thinking and comments might be lacking from that point of view.

Eh, apologies.  This email was an unfinished draft that I had laying
around before the holidays which I intended to discard it but somehow
kept around, and just now I happened to press the wrong key combination
and it ended up being sent instead.  We had some further discussion,
after which I no longer think that there is a problem here, so please
ignore this email.

I'll come back to this patch later this week.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")




Re: Escape output of pg_amcheck test

2024-01-08 Thread Peter Eisentraut

On 08.01.24 15:04, Aleksander Alekseev wrote:

[...] so I quickly wrote some (wrong) instrumentation to try to test your patch.


Yep, it confused me too at first.

Since the encoding happens right before exit() call, maybe it's worth
changing $b in-place in order to make the code slightly more readable
for most of us :)


My patch originally had the old-style

my $b_escaped = $b;
$b_escaped =~ s/.../;

... sprintf(..., $b_escaped);

but then I learned about the newish /r modifier and thought it was 
cooler. :)





INSERT performance: less CPU when no indexes or triggers

2024-01-08 Thread Adam S
I've been thinking about INSERT performance and noticed that copyfrom.c
(COPY FROM) performs ~4 unnecessary pointer-deferences per record in the
case when there's no indexes and no AFTER ROW INSERT triggers (i.e. when
you just want to load data really fast!).

I moved the for-loop inside the per-batch if-checks and got a little
speedup. Obviously, this only matters for CPU-bound INSERTs with very
narrow tables - if there's other overhead (including parsing), this gain
disappears into the noise. I'm not a regular contributor, apologies in
advance if I got something wrong, and no worries if this is too small to
bother. My patch below passes "make check". I'll of course post other wins
as I find them, but this one seemed easy.

My reference test comes from a conversation on HN (
https://news.ycombinator.com/item?id=38864213 ) loading 100M tiny records
from COPY TO ... BINARY on a GCP c2d-standard-8:
https://gcloud-compute.com/c2-standard-8.html (8 vCPU, 32GB, network SSD).

time sh -c "echo \"drop table if exists tbl; create unlogged table tbl(city
int2, temp int2);copy tbl FROM '/home/asah/citydata.bin' binary;\" |
./pg/bin/postgres --single -D tmp -p  postgres";

results from 3 runs:
real 0m26.488s, user 0m14.745s, sys 0m3.299s
real 0m28.978s, user 0m14.010s, sys 0m3.288s
real 0m28.920s, user 0m14.028s, sys 0m3.201s
==>
real 0m24.483s, user 0m13.280s, sys 0m3.305s
real 0m28.668s, user 0m13.095s, sys 0m3.501s
real 0m28.306s, user 0m13.032s, sys 0m3.505s


On my mac m1 air,

real 0m11.922s, user 0m10.220s, sys 0m1.302s
real 0m12.761s, user 0m10.137s, sys 0m1.401s
real 0m12.734s, user 0m10.146s, sys 0m1.376s
==>
real 0m12.173s, user 0m9.785s, sys 0m1.221s
real 0m12.462s, user 0m9.691s, sys 0m1.393s
real 0m12.266s, user 0m9.719s, sys 0m1.390s


patch: (passes "make check" - feel free to drop/replace my comments of
course)

diff --git a/src/backend/commands/copyfrom.c
b/src/backend/commands/copyfrom.c
index 37836a769c..d3783678e0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -421,13 +421,14 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo
*miinfo,
   buffer->bistate);
MemoryContextSwitchTo(oldcontext);

-   for (i = 0; i < nused; i++)
-   {
/*
 * If there are any indexes, update them for all the
inserted
 * tuples, and run AFTER ROW INSERT triggers.
 */
if (resultRelInfo->ri_NumIndices > 0)
+   {
+   /* expensive inner loop hidden by if-check */
+   for (i = 0; i < nused; i++)
{
List   *recheckIndexes;

@@ -441,6 +442,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,

 cstate->transition_capture);
list_free(recheckIndexes);
}
+   }

/*
 * There's no indexes, but see if we need to run AFTER ROW
INSERT
@@ -449,15 +451,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo
*miinfo,
else if (resultRelInfo->ri_TrigDesc != NULL &&

 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||

resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+   {
+   /* expensive inner loop hidden by if-check */
+   for (i = 0; i < nused; i++)
{
cstate->cur_lineno = buffer->linenos[i];
ExecARInsertTriggers(estate, resultRelInfo,

 slots[i], NIL,

 cstate->transition_capture);
-   }

ExecClearTuple(slots[i]);
}
+   }

/* Update the row counter and progress of the COPY command
*/
*processed += nused;

hope this helps,
adam


Re: SQL:2011 application time

2024-01-08 Thread jian he
On Fri, Jan 5, 2024 at 1:06 PM jian he  wrote:
>
> On Tue, Jan 2, 2024 at 9:59 AM Paul Jungwirth
>  wrote:
> >
> > On 12/31/23 00:51, Paul Jungwirth wrote:
> > > That's it for now.
> >
> > Here is another update. I fixed FOR PORTION OF on partitioned tables, in 
> > particular when the attnums
> > are different from the root partition.
> >
> > Rebased to cea89c93a1.
> >
>
> Hi.
>
> +/*
> + * range_without_portion_internal - Sets outputs and outputn to the ranges
> + * remaining and their count (respectively) after subtracting r2 from r1.
> + * The array should never contain empty ranges.
> + * The outputs will be ordered. We expect that outputs is an array of
> + * RangeType pointers, already allocated with two slots.
> + */
> +void
> +range_without_portion_internal(TypeCacheEntry *typcache, RangeType *r1,
> +   RangeType *r2, RangeType **outputs, int *outputn)
> I am confused with the name "range_without_portion", I think
> "range_not_overlap" would be better.
>

range_intersect returns the intersection of two ranges.
I think here we are doing the opposite.
names the main SQL function "range_not_intersect" and the internal
function as "range_not_intersect_internal" should be fine.
so people don't need to understand the meaning of "portion".




Re: add function argument names to regex* functions.

2024-01-08 Thread jian he
On Mon, Jan 8, 2024 at 8:44 AM Dian Fay  wrote:
>
> On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
> > On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby  wrote:
> > >
> > > On 1/3/24 5:05 PM, Dian Fay wrote:
> > >
> > > Another possibility is `index`, which is relatively short and not a
> > > reserved keyword ^1. `position` is not as precise but would avoid the
> > > conceptual overloading of ordinary indices.
> > >
> > > I'm not a fan of "index" since that leaves the question of
> > > whether it's 0 or 1 based. "Position" is a bit better, but I think
> > > Jian's suggestion of "occurance" is best.
> > >
> > > We do have precedent for one-based `index` in Postgres: array types are
> > > 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> > > and easy to misspell (I looked it up after typing it just now and it
> > > _still_ feels off).
> > >
> > > How's "instance"?
> > >
> > > Presumably someone referencing arguments by name would have just looked 
> > > up the names via \df or whatever, so presumably misspelling wouldn't be a 
> > > big issue. But I think "instance" is OK as well.
> > >
> > > --
> > > Jim Nasby, Data Architect, Austin TX
> >
> > regexp_instr: It has the syntax regexp_instr(string, pattern [, start
> > [, N [, endoption [, flags [, subexpr ])
> > oracle:
> > REGEXP_INSTR (source_char, pattern,  [, position [, occurrence [,
> > return_opt  [, match_param  [, subexpr ] )
> >
> > "string" and "source_char" are almost the same descriptive, so maybe
> > there is no need to change.
> > "start" is better than "position", imho.
> > "return_opt" is better than "endoption", (maybe we need change, for
> > now I didn't)
> > "flags" cannot be changed to "match_param", given it quite everywhere
> > in functions-matching.html.
> >
> > similarly for function regexp_replace, oracle using "repplace_string",
> > we use "replacement"(mentioned in the doc).
> > so I don't think we need to change to "repplace_string".
> >
> > Based on how people google[0], I think `occurrence` is ok, even though
> > it's verbose.
> > to change from `N` to `occurrence`, we also need to change the doc,
> > that is why this patch is more larger.
> >
> >
> > [0]: 
> > https://www.google.com/search?q=regex+nth+match=regex+nth+match_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA=chrome=UTF-8
>
> The `regexp_replace` summary in table 9.10 is mismatched and still
> specifies the first parameter name as `string` instead of `source`.
> Since all the other functions use `string`, should `regexp_replace` do
> the same or is this a case where an established "standard" diverges?
>

got it. Thanks for pointing it out.

in functions-matching.html
if I change source to
string then
there are no markup "string" and markup "string", it's kind of
slightly confusing.

So does the following refactored description of regexp_replace make sense:

 The string is returned unchanged if
 there is no match to the pattern.  If there is a
 match, the string is returned with the
 replacement string substituted for the matching
 substring.  The replacement string can contain
 \n, where
n is 1
 through 9, to indicate that the source substring matching the
 n'th parenthesized subexpression of
the pattern should be
 inserted, and it can contain \ to indicate that the
 substring matching the entire pattern should be inserted.  Write
 \\ if you need to put a literal backslash in
the replacement
 text.

> I noticed the original documentation for some of these functions is
> rather disorganized; summaries explain `occurrence` without explaining
> the prior `start` parameter, and detailed documentation in 9.7 is
> usually a single paragraph per function running pell-mell through ifs
> and buts without section headings, so entries in table 9.10 have to
> reference the entire section 9.7.3 instead of their specific functions.
> It's out of scope here, but should I bring this up on pgsql-docs?

I got it.
in Table 9.10. Other String Functions and Operators, if we can
reference the specific function would be great.
As for now, in the browser, you need to use Ctrl+F to find the
detailed explanation in 9.7.3.
you can just bring your suggested or patch to pgsql-hack...@postgresql.org.




Re: Escape output of pg_amcheck test

2024-01-08 Thread Aleksander Alekseev
Hi,

> [...] so I quickly wrote some (wrong) instrumentation to try to test your 
> patch.

Yep, it confused me too at first.

Since the encoding happens right before exit() call, maybe it's worth
changing $b in-place in order to make the code slightly more readable
for most of us :)

-- 
Best regards,
Aleksander Alekseev




Re: weird GROUPING SETS and ORDER BY behaviour

2024-01-08 Thread David G. Johnston
On Monday, January 8, 2024, Geoff Winkless  wrote
>
>
> Mildly interesting: you can pass column positions to GROUP BY and
> ORDER BY but if you try to pass a position to GROUPING() (I wondered
> if that would help the engine somehow) it fails:
>

The symbol 1 is ambigious - it can be the number or a column reference.  In
a compound expression it is always the number, not the column reference.

David J.


Re: Escape output of pg_amcheck test

2024-01-08 Thread Mark Dilger



> On Jan 8, 2024, at 5:41 AM, Mark Dilger  wrote:
> 
> The /r modifier defeats the purpose of the patch, at least for my perl 
> version, perl 5, version 28, subversion 1 (v5.28.1).  With just the /aeg 
> modifier, it works fine.

Nevermind.  I might be wrong about that.  I didn't have a test case handy that 
would generate index corruption which would result in characters of the 
problematic class, and so I quickly wrote some (wrong) instrumentation to try 
to test your patch.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Escape output of pg_amcheck test

2024-01-08 Thread Mark Dilger




On 1/7/24 23:27, Peter Eisentraut wrote:
The pg_amcheck reports a skip message if the layout of the index does 
not match expectations.  That message includes the bytes that were 
expected and the ones that were found.  But the found ones are arbitrary 
bytes, which can have funny effects on the terminal when they are 
printed.  To avoid that, escape non-word characters before printing.



+   # escape non-word characters to avoid confusing the 
terminal
+   $b =~ s{(\W)}{ sprintf '\x%02x', ord($1) }aegr);


The /r modifier defeats the purpose of the patch, at least for my perl 
version, perl 5, version 28, subversion 1 (v5.28.1).  With just the /aeg 
modifier, it works fine.


--
Mark Dilger




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-08 Thread Dilip Kumar
On Mon, Jan 8, 2024 at 4:55 PM Alvaro Herrera  wrote:
>
> The more I look at TransactionGroupUpdateXidStatus, the more I think
> it's broken, and while we do have some tests, I don't have confidence
> that they cover all possible cases.
>
> Or, at least, if this code is good, then it hasn't been sufficiently
> explained.

Any thought about a case in which you think it might be broken, I mean
any vague thought might also help where you think it might not work as
expected so that I can also think in that direction.  It might be
possible that I might not be thinking of some perspective that you are
thinking and comments might be lacking from that point of view.

> If we have multiple processes trying to write bits to clog, and they are
> using different banks, then the LWLockConditionalAcquire will be able to
> acquire the bank lock

Do you think there is a problem with multiple processes getting the
lock? I mean they are modifying different CLOG pages so that can be
done concurrently right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: INFORMATION_SCHEMA note

2024-01-08 Thread Daniel Gustafsson
> On 4 Jan 2024, at 13:39, Tatsuo Ishii  wrote:

>> Attached is the patch that does this.

I don't think the patch was attached?

> Any objection?

I didn't study the RFC in depth but as expected it seems to back up your change
so the change seems reasonable.

--
Daniel Gustafsson





Re: Escape output of pg_amcheck test

2024-01-08 Thread Aleksander Alekseev
Hi,

> The pg_amcheck reports a skip message if the layout of the index does
> not match expectations.  That message includes the bytes that were
> expected and the ones that were found.  But the found ones are arbitrary
> bytes, which can have funny effects on the terminal when they are
> printed.  To avoid that, escape non-word characters before printing.

LGTM.

I didn't get the part about the /r modifier at first, but "man perlre" helped:

"""
r  - perform non-destructive substitution and return the new value
"""

The /a modifier requires Perl >= 5.14, which is fine [1].

[1]: https://www.postgresql.org/docs/current/install-requirements.html

-- 
Best regards,
Aleksander Alekseev




Re: weird GROUPING SETS and ORDER BY behaviour

2024-01-08 Thread Geoff Winkless
On Mon, 8 Jan 2024 at 11:12, Geoff Winkless  wrote:
> What's even more of a head-scratcher is why fixing this this then
> breaks the _first_ group's ORDERing.

Ignore that. Finger slippage - looking back I realised I forgot the
"=0" test after the GROUPING() call.

It looks like I'm going to go with

ORDER BY GROUPING(test1.n), test1.n, GROUPING(CONCAT()), CONCAT(...)

because it's easier to build the query sequentially that way than
putting all the GROUPING tests into a single ORDER, and it does seem
to work OK.

Geoff




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-08 Thread John Naylor
On Wed, Jan 3, 2024 at 9:10 PM John Naylor  wrote:
>
> On Tue, Jan 2, 2024 at 8:01 PM Masahiko Sawada  wrote:
>
> > I agree that we expose RT_LOCK_* functions and have tidstore use them,
> > but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
> > calls part. I think that even if we expose them, we will still need to
> > do something like "if (TidStoreIsShared(ts))
> > shared_rt_lock_share(ts->tree.shared)", no?
>
> I'll come back to this topic separately.

To answer your question, sure, but that "if (TidStoreIsShared(ts))"
part would be pushed down into a function so that only one place has
to care about it.

However, I'm starting to question whether we even need that. Meaning,
lock the tidstore separately. To "lock the tidstore" means to take a
lock, _separate_ from the radix tree's internal lock, to control
access to two fields in a separate "control object":

+typedef struct TidStoreControl
+{
+ /* the number of tids in the store */
+ int64 num_tids;
+
+ /* the maximum bytes a TidStore can use */
+ size_t max_bytes;

I'm pretty sure max_bytes does not need to be in shared memory, and
certainly not under a lock: Thinking of a hypothetical
parallel-prune-phase scenario, one way would be for a leader process
to pass out ranges of blocks to workers, and when the limit is
exceeded, stop passing out blocks and wait for all the workers to
finish.

As for num_tids, vacuum previously put the similar count in

@@ -176,7 +179,8 @@ struct ParallelVacuumState
  PVIndStats *indstats;

  /* Shared dead items space among parallel vacuum workers */
- VacDeadItems *dead_items;
+ TidStore *dead_items;

VacDeadItems contained "num_items". What was the reason to have new
infrastructure for that count? And it doesn't seem like access to it
was controlled by a lock -- can you confirm? If we did get parallel
pruning, maybe the count would belong inside PVShared?

The number of tids is not that tightly bound to the tidstore's job. I
believe tidbitmap.c (a possible future client) doesn't care about the
global number of tids -- not only that, but AND/OR operations can
change the number in a non-obvious way, so it would not be convenient
to keep an accurate number anyway. But the lock would still be
mandatory with this patch.

If we can make vacuum work a bit closer to how it does now, it'd be a
big step up in readability, I think. Namely, getting rid of all the
locking logic inside tidstore.c and let the radix tree's locking do
the right thing. We'd need to make that work correctly when receiving
pointers to values upon lookup, and I already shared ideas for that.
But I want to see if there is any obstacle in the way of removing the
tidstore control object and it's separate lock.




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-08 Thread Alvaro Herrera
The more I look at TransactionGroupUpdateXidStatus, the more I think
it's broken, and while we do have some tests, I don't have confidence
that they cover all possible cases.

Or, at least, if this code is good, then it hasn't been sufficiently
explained.

If we have multiple processes trying to write bits to clog, and they are
using different banks, then the LWLockConditionalAcquire will be able to
acquire the bank lock 

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The saddest aspect of life right now is that science gathers knowledge faster
 than society gathers wisdom."  (Isaac Asimov)




Re: brininsert optimization opportunity

2024-01-08 Thread Alvaro Herrera
On 2023-Dec-21, James Wang wrote:

> Hi All,  not sure how to "Specify thread msgid"  - choose one which i think 
> is close to my new feature request.

Hello James, based on the "Specify thread msgid" message it looks like
you were trying to request a feature using the Commitfest website?  That
won't work; the commitfest website is only intended as a tracker of
in-progress development work.  Without a Postgres code patch, that
website doesn't help you any.  What you have done amounts to hijacking
an unrelated mailing list thread, which is discouraged and frowned upon.

That said, sadly we don't have any official feature request system,
Please start a new thread by composing an entirely new message to
pgsql-hackers@lists.postgresql.org, and don't use the commitfest
website for it.

That said,

> query:
> 
> SELECT count(1) FROM table1 t1 JOIN table2 t2 ON t1.id = t2.id WHERE 
> t1.a_indexed_col='some_value' OR t2.a_indexed_col='some_vable';
> 
> can the server automatically replace the OR logic above with UNION please? 
> i.e. replace it with:
> 
> (SELECT count(1) FROM table1 t1 JOIN table2 t2 ON t1.id = t2.id WHERE 
> t1.a_indexed_col='some_value' )
> UNION
> (SELECT count(1) FROM table1 t1 JOIN table2 t2 ON t1.id = t2.id WHERE  
> t2.a_indexed_col='some_vable');

I have the feeling that this has already been discussed, but I can't
find anything useful in the mailing list archives.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)




Re: postgres_fdw fails to see that array type belongs to extension

2024-01-08 Thread David Geier

Hi,

On 12/27/23 18:38, Tom Lane wrote:

Hmm.  It seems odd that if an extension defines a type, the type is
listed as a member of the extension but the array type is not.
That makes it look like the array type is an externally-created
thing that happens to depend on the extension, when it's actually
part of the extension.  I'm surprised we've not run across other
misbehaviors traceable to that.

Agreed.

Of course, fixing it like that leads to needing to change the
contents of pg_depend, so it wouldn't be back-patchable.  But it
seems like the best way in the long run.


The attached patch just adds a 2nd dependency between the array type and 
the extension, using recordDependencyOnCurrentExtension(). It seems like 
that the other internal dependency on the element type must stay? If 
that seems reasonable I can add a test to modules/test_extensions. Can 
extensions in contrib use test extension in their own tests? It looks 
like postgres_fdw doesn't test any of the shippability logic.


--
David Geier
(ServiceNow)
From de23a4e9f1b0620a5204594139568cdcb3d57885 Mon Sep 17 00:00:00 2001
From: David Geier 
Date: Mon, 8 Jan 2024 10:58:21 +0100
Subject: [PATCH] Fix dependency of array of type owned by extension

---
 src/backend/catalog/pg_type.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
index d7167108fb..3544a3af1a 100644
--- a/src/backend/catalog/pg_type.c
+++ b/src/backend/catalog/pg_type.c
@@ -724,8 +724,10 @@ GenerateTypeDependencies(HeapTuple typeTuple,
 	if (OidIsValid(typeForm->typelem))
 	{
 		ObjectAddressSet(referenced, TypeRelationId, typeForm->typelem);
-		recordDependencyOn(, ,
-		   isImplicitArray ? DEPENDENCY_INTERNAL : DEPENDENCY_NORMAL);
+		recordDependencyOn(, , isImplicitArray ? DEPENDENCY_INTERNAL : DEPENDENCY_NORMAL);
+
+		if (makeExtensionDep)
+	recordDependencyOnCurrentExtension(, rebuild);
 	}
 }
 
-- 
2.39.2



Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2024-01-08 Thread Richard Guo
On Sun, Jan 7, 2024 at 6:41 AM Tom Lane  wrote:

> Alexander Lakhin  writes:
> > Please look at the following query:
> > CREATE TABLE t(i int);
> > INSERT INTO t VALUES (1);
> > VACUUM ANALYZE t;
>
> > WITH ir AS (INSERT INTO t VALUES (2) RETURNING i)
> > SELECT * FROM ir WHERE i = 2;
>
> > which produces ERROR:  no relation entry for relid 1
> > starting from f7816aec2.


Nice catch.


> Thanks for the report!  I guess we need something like the attached.


+1.


> I'm surprised that this hasn't been noticed before; was the case
> really unreachable before?


It seems that this case is only reachable with Vars of an INSERT target
relation, and it seems that there is no other way to reference such a
Var other than using CTE.

Thanks
Richard


Re: weird GROUPING SETS and ORDER BY behaviour

2024-01-08 Thread Geoff Winkless
On Mon, 8 Jan 2024 at 10:23, Geoff Winkless  wrote:

> Seems there was a reason why I thought that: per the documentation:
>
> "The arguments to the GROUPING function are not actually evaluated,
> but they must exactly match expressions given in the GROUP BY clause
> of the associated query level."
>
> https://www.postgresql.org/docs/16/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE

To throw a spanner in the works, it looks like it's not the test
itself that's failing: it's putting the ORDERing in a CASE at all that
fails.

... ORDER BY
  CASE WHEN GROUPING(test1.n) THEN 1 ELSE NULL END NULLS FIRST, CASE
WHEN true THEN 2 ELSE 2 END;
 n  | concat
+
 n1 |
 n2 |
 n3 |
 n4 |
 n5 |
| n3x3
| n5x1
| n2x4
| n1x5
| n4x2

but without the CASE works fine:

... ORDER BY
  CASE WHEN GROUPING(test1.n) THEN 1 ELSE NULL END NULLS FIRST, 2;
 n  | concat
+
 n4 |
 n2 |
 n3 |
 n5 |
 n1 |
| n1x5
| n2x4
| n3x3
| n4x2
| n5x1

What's even more of a head-scratcher is why fixing this this then
breaks the _first_ group's ORDERing.

It _looks_ like removing the CASE altogether and ordering by the
GROUPING value for all the grouping sets first:

ORDER BY
  GROUPING(test1.n,CONCAT(test1.n, (SELECT x FROM test2 WHERE
seq=test1.seq))), 1, 2;

actually works. I'm trying to figure out if that scales up or if it's
just dumb luck that it works for my example.

Geoff




Re: weird GROUPING SETS and ORDER BY behaviour

2024-01-08 Thread Geoff Winkless
On Sat, 6 Jan 2024 at 23:27, Geoff Winkless  wrote:

> Well yes. I assumed that since it's required that a group expression is in 
> the query itself that
> the grouping values were taken from the result set, I have to admit to some 
> surprise that
> they're calculated twice (three times?).

Seems there was a reason why I thought that: per the documentation:

"The arguments to the GROUPING function are not actually evaluated,
but they must exactly match expressions given in the GROUP BY clause
of the associated query level."

https://www.postgresql.org/docs/16/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE

Mildly interesting: you can pass column positions to GROUP BY and
ORDER BY but if you try to pass a position to GROUPING() (I wondered
if that would help the engine somehow) it fails:

SELECT
  test1.n,
  CONCAT(test1.n, (SELECT x FROM test2 WHERE seq=test1.seq))
FROM test1
GROUP BY
GROUPING SETS(
  1,
  2
)
ORDER BY
  CASE WHEN GROUPING(1)=0 THEN 1 ELSE NULL END NULLS FIRST,
  CASE WHEN GROUPING(2)=0 THEN 2 ELSE NULL END NULLS FIRST;

ERROR:  arguments to GROUPING must be grouping expressions of the
associated query level

Geoff




Re: Shared detoast Datum proposal

2024-01-08 Thread Andy Fan

Andy Fan  writes:

>>
>> One of the tests was aborted at CFBOT [1] with:
>> [09:47:00.735] dumping /tmp/cores/postgres-11-28182.core for
>> /tmp/cirrus-ci-build/build/tmp_install//usr/local/pgsql/bin/postgres
>> [09:47:01.035] [New LWP 28182]
>
> There was a bug in JIT part, here is the fix.  Thanks for taking care of
> this!

Fixed a GCC warning in cirrus-ci, hope everything is fine now.

-- 
Best Regards
Andy Fan

>From 6dc858fae29486bea9125e7a3fb43a7081e62097 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" 
Date: Wed, 27 Dec 2023 18:43:56 +0800
Subject: [PATCH v4 1/1] shared detoast feature.

---
 src/backend/executor/execExpr.c  |  65 ++-
 src/backend/executor/execExprInterp.c| 181 +++
 src/backend/executor/execTuples.c| 130 +
 src/backend/executor/execUtils.c |   5 +
 src/backend/executor/nodeHashjoin.c  |   2 +
 src/backend/executor/nodeMergejoin.c |   2 +
 src/backend/executor/nodeNestloop.c  |   1 +
 src/backend/jit/llvm/llvmjit_expr.c  |  26 +-
 src/backend/jit/llvm/llvmjit_types.c |   1 +
 src/backend/nodes/bitmapset.c|  13 +
 src/backend/optimizer/plan/createplan.c  |  73 ++-
 src/backend/optimizer/plan/setrefs.c | 529 +++
 src/include/executor/execExpr.h  |   6 +
 src/include/executor/tuptable.h  |  60 +++
 src/include/nodes/bitmapset.h|   1 +
 src/include/nodes/execnodes.h|   5 +
 src/include/nodes/plannodes.h|  50 ++
 src/test/regress/sql/shared_detoast_slow.sql |  70 +++
 18 files changed, 1114 insertions(+), 106 deletions(-)
 create mode 100644 src/test/regress/sql/shared_detoast_slow.sql

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 91df2009be..4aeec8419f 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -932,22 +932,81 @@ ExecInitExprRec(Expr *node, ExprState *state,
 }
 else
 {
+	int			attnum;
+	Plan	   *plan = state->parent ? state->parent->plan : NULL;
+
 	/* regular user column */
 	scratch.d.var.attnum = variable->varattno - 1;
 	scratch.d.var.vartype = variable->vartype;
+	attnum = scratch.d.var.attnum;
+
 	switch (variable->varno)
 	{
 		case INNER_VAR:
-			scratch.opcode = EEOP_INNER_VAR;
+
+			if (is_join_plan(plan) &&
+bms_is_member(attnum,
+			  ((JoinState *) state->parent)->inner_pre_detoast_attrs))
+			{
+/* debug purpose. */
+if (!jit_enabled)
+{
+	elog(INFO,
+		 "EEOP_INNER_VAR_TOAST: flags = %d costs=%.2f..%.2f, attnum: %d",
+		 state->flags,
+		 plan->startup_cost,
+		 plan->total_cost,
+		 attnum);
+}
+scratch.opcode = EEOP_INNER_VAR_TOAST;
+			}
+			else
+			{
+scratch.opcode = EEOP_INNER_VAR;
+			}
 			break;
 		case OUTER_VAR:
-			scratch.opcode = EEOP_OUTER_VAR;
+			if (is_join_plan(plan) &&
+bms_is_member(attnum,
+			  ((JoinState *) state->parent)->outer_pre_detoast_attrs))
+			{
+/* debug purpose. */
+if (!jit_enabled)
+{
+	elog(INFO,
+		 "EEOP_OUTER_VAR_TOAST: flags = %u costs=%.2f..%.2f, attnum: %d",
+		 state->flags,
+		 plan->startup_cost,
+		 plan->total_cost,
+		 attnum);
+}
+scratch.opcode = EEOP_OUTER_VAR_TOAST;
+			}
+			else
+scratch.opcode = EEOP_OUTER_VAR;
 			break;
 
 			/* INDEX_VAR is handled by default case */
 
 		default:
-			scratch.opcode = EEOP_SCAN_VAR;
+			if (is_scan_plan(plan) && bms_is_member(
+	attnum,
+	((ScanState *) state->parent)->scan_pre_detoast_attrs))
+			{
+if (!jit_enabled)
+{
+	elog(INFO,
+		 "EEOP_SCAN_VAR_TOAST: flags = %u costs=%.2f..%.2f, scanId: %d, attnum: %d",
+		 state->flags,
+		 plan->startup_cost,
+		 plan->total_cost,
+		 ((Scan *) plan)->scanrelid,
+		 attnum);
+}
+scratch.opcode = EEOP_SCAN_VAR_TOAST;
+			}
+			else
+scratch.opcode = EEOP_SCAN_VAR;
 			break;
 	}
 }
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 3c17cc6b1e..bd769fdeb6 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -57,6 +57,7 @@
 #include "postgres.h"
 
 #include "access/heaptoast.h"
+#include "access/detoast.h"
 #include "catalog/pg_type.h"
 #include "commands/sequence.h"
 #include "executor/execExpr.h"
@@ -157,6 +158,9 @@ static void ExecEvalRowNullInt(ExprState *state, ExprEvalStep *op,
 static Datum ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum ExecJustOuterVar(ExprState *state, ExprContext *econtext, bool *isnull);
 static Datum 

Re: Change GUC hashtable to use simplehash?

2024-01-08 Thread John Naylor
On Mon, Jan 8, 2024 at 2:24 PM Junwang Zhao  wrote:
>
> + * Portions Copyright (c) 2018-2023, PostgreSQL Global Development Group
>
> A kind reminder, it's already 2024 :)
>
> I'm also curious why the 2018, is there any convention for that?

The convention I followed was "blind copy-paste", but the first year
is supposed to be when the file entered the repo. Thanks, will fix.




Re: Multidimensional Histograms

2024-01-08 Thread Alexander Cheshev
Hi Andrei,

> Maybe my wording needed to be more precise. I didn't implement
> multidimensional histograms before, so I don't know how expensive they
> are. I meant that for dependency statistics over about six columns, we
> have a lot of combinations to compute.

Equi-Depth Histogram in a 6 dimensional case requires 6 times more
iterations. Postgres already uses Equi-Depth Histogram. Even if you
increase the number of buckets from 100 to 1000 then there will be no
overhead as the time complexity of Equi-Depth Histogram has no
dependence on the number of buckets. So, no overhead at all!

Regards,
Alexander Cheshev

On Mon, 8 Jan 2024 at 04:12, Andrei Lepikhov  wrote:
>
> On 8/1/2024 01:36, Tomas Vondra wrote:
> > On 1/7/24 18:26, Andrei Lepikhov wrote:
> >> On 7/1/2024 17:51, Tomas Vondra wrote:
> >>> On 1/7/24 11:22, Andrei Lepikhov wrote:
>  On 7/1/2024 06:54, Tomas Vondra wrote:
> > It's an interesting are for experiments, no doubt about it. And if you
> > choose to explore it, that's fine. But it's better to be aware it may
> > not end with a commit.
> > For the multi-dimensional case, I propose we first try to experiment
> > with the various algorithms, and figure out what works etc. Maybe
> > implementing them in python or something would be easier than C.
> 
>  Curiously, trying to utilize extended statistics for some problematic
>  cases, I am experimenting with auto-generating such statistics by
>  definition of indexes [1]. Doing that, I wanted to add some hand-made
>  statistics like a multidimensional histogram or just a histogram which
>  could help to perform estimation over a set of columns/expressions.
>  I realized that current hooks get_relation_stats_hook and
>  get_index_stats_hook are insufficient if I want to perform an estimation
>  over a set of ANDed quals on different columns.
>  In your opinion, is it possible to add a hook into the extended
>  statistics to allow for an extension to propose alternative estimation?
> 
>  [1] https://github.com/danolivo/pg_index_stats
> 
> >>>
> >>> No idea, I haven't thought about that very much. Presumably the existing
> >>> hooks are insufficient because they're per-attnum? I guess it would make
> >>> sense to have a hook for all the attnums of the relation, but I'm not
> >>> sure it'd be enough to introduce a new extended statistics kind ...
> >>
> >> I got stuck on the same problem Alexander mentioned: we usually have
> >> large tables with many uniformly distributed values. In this case, MCV
> >> doesn't help a lot.
> >>
> >> Usually, I face problems scanning a table with a filter containing 3-6
> >> ANDed quals. Here, Postgres multiplies selectivities and ends up with a
> >> less than 1 tuple selectivity. But such scans, in reality, mostly have
> >> some physical sense and return a bunch of tuples. It looks like the set
> >> of columns representing some value of composite type.
> >
> > Understood. That's a fairly common scenario, and it can easily end up
> > with rather terrible plan due to the underestimate.
> >
> >> Sometimes extended statistics on dependency helps well, but it expensive
> >> for multiple columns.
> >
> > Expensive in what sense? Also, how would a multidimensional histogram be
> > any cheaper?
> Maybe my wording needed to be more precise. I didn't implement
> multidimensional histograms before, so I don't know how expensive they
> are. I meant that for dependency statistics over about six columns, we
> have a lot of combinations to compute.
> >
> >> And sometimes I see that even a trivial histogram
> >> on a ROW(x1,x2,...) could predict a much more adequate value (kind of
> >> conservative upper estimation) for a clause like "x1=N1 AND x2=N2 AND
> >> ..." if somewhere in extension we'd transform it to ROW(x1,x2,...) =
> >> ROW(N1, N2,...).
> >
> > Are you guessing the histogram would help, or do you know it'd help? I
> > have no problem believing that for range queries, but I think it's far
> > less clear for simple equalities. I'm not sure a histograms would work
> > for that ...
>
> I added Teodor Sigaev to the CC of this email - He has much more user
> feedback on this technique. As I see, having a histogram over a set of
> columns, we have top selectivity estimation for the filter. I'm not sure
> how good a simple histogram is in that case, too, but according to my
> practice, it works, disallowing usage of too-optimistic query plans.
>
> > Maybe it'd be possible to track more stuff for each bucket, not just the
> > frequency, but also ndistinct for combinations of columns, and then use
> > that to do better equality estimates. Or maybe we could see how the
> > "expected" and "actual" bucket frequency compare, and use that to
> > correct the equality estimate? Not sure.
> Yes, it is in my mind, too. Having such experimental stuff as an
> extension(s) in GitHub, we could get some community feedback.
> >
> > But perhaps you have some 

Change comments of removing useless joins.

2024-01-08 Thread ywgrit
After reading the logic of removing useless join, I think the comment of
this might need to be changed: "Currently, join_is_removable only succeeds
if sjinfo's right hand is a single baserel. " could be changed to
"Currently, join_is_removable only succeeds if sjinfo's min_righthand is a
single baserel. ". Because the useless join in the query "select t1.* from
t1 left join (t2 left join t3 on t3.a=t2.b) on t2.a=t1.a;" would also be
eliminated. That is, the query will be converted to "select t1.* from t1;"
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6c02fe8908..70e0ae372f 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -112,7 +112,7 @@ restart:
 
 		/*
 		 * Currently, join_is_removable can only succeed when the sjinfo's
-		 * righthand is a single baserel.  Remove that rel from the query and
+		 * min_righthand is a single baserel.  Remove that rel from the query and
 		 * joinlist.
 		 */
 		innerrelid = bms_singleton_member(sjinfo->min_righthand);


Re: add AVX2 support to simd.h

2024-01-08 Thread John Naylor
On Sat, Jan 6, 2024 at 12:04 AM Nathan Bossart  wrote:

> I've been thinking about the configuration option approach.  ISTM that
> would be the most feasible strategy, at least for v17.  A couple things
> come to mind:
>
> * This option would simply map to existing compiler flags.  We already have
>   ways to provide those (-Dc_args in meson, CFLAGS in autoconf).  Perhaps
>   we'd want to provide our own shorthand for certain platforms (e.g., ARM),
>   but that will still just be shorthand for compiler flags.
>
> * Such an option would itself generate some maintenance cost.  That could
>   be worth it because it formalizes the Postgres support for those options,
>   but it's still one more thing to track.
>
> Another related option could be to simply document that we have support for
> some newer instructions that can be enabled by setting the aforementioned
> compiler flags.  That's perhaps a little less user-friendly, but it'd avoid
> the duplication and possibly reduce the maintenance cost.  I also wonder if
> it'd help prevent confusion when CFLAGS and this extra option conflict.

The last one might offer more graceful forward compatibility if the
multiple-binaries idea gets any traction some day, because at that
point the additional config options are not needed, I think.

Another consideration is which way would touch the fewest places to
work with Windows, which uses the spelling /arch:AVX2 etc.

One small thing I would hope for from the finial version of this is
the ability to inline things where we currently indirect depending on
a run-time check. That seems like "just work" on top of everything
else, and I don't think it makes a case for either of the above.




Re: Oversight in reparameterize_path_by_child leading to executor crash

2024-01-08 Thread Richard Guo
Thanks for the review!

On Sat, Jan 6, 2024 at 2:36 AM Robert Haas  wrote:

> Richard, I think it could be useful to put a better commit message
> into the patch file, describing both what problem is being fixed and
> what the design of the fix is. I gather that the problem is that we
> crash if the query contains a partioningwise join and also $SOMETHING,
> and the solution is to move reparameterization to happen at
> createplan() time but with a precheck that runs during path
> generation. Presumably, that means this is more than a minimal bug
> fix, because the bug could be fixed without splitting
> can-it-be-reparameterized to reparameterize-it in this way. Probably
> that's a performance optimization, so maybe it's worth clarifying
> whether that's just an independently good idea or whether it's a part
> of making the bug fix not regress performance.


Thanks for the suggestion.  Attached is an updated patch which is added
with a commit message that tries to explain the problem and the fix.

I think the macro names in path_is_reparameterizable_by_child could be
> better chosen. CHILD_PATH_IS_REPARAMETERIZABLE doesn't convey that the
> macro will return from the calling function if not -- it looks like it
> just returns a Boolean. Maybe REJECT_IF_PATH_NOT_REPARAMETERIZABLE and
> REJECT_IF_PATH_LIST_NOT_REPARAMETERIZABLE or some such.


Agreed.


> Another question here is whether we really want to back-patch all of
> this or whether it might be better to, as Tom proposed previously,
> back-patch a more minimal fix and leave the more invasive stuff for
> master.


Fair point.  I think we can back-patch a more minimal fix, as Tom
proposed in [1], which disallows the reparameterization if the path
contains sampling info that references the outer rel.  But I think we
need also to disallow the reparameterization of a path if it contains
restriction clauses that reference the outer rel, as such paths have
been found to cause incorrect results, or planning errors as in [2].

[1] https://www.postgresql.org/message-id/3163033.1692719009%40sss.pgh.pa.us
[2]
https://www.postgresql.org/message-id/CAMbWs4-CSR4VnZCDep3ReSoHGTA7E%2B3tnjF_LmHcX7yiGrkVfQ%40mail.gmail.com

Thanks
Richard


v9-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch
Description: Binary data


Changing a schema's name with function1 calling function2

2024-01-08 Thread Wilma Wantren
­If I want to change the name of my database schema, I call 
alter schema my_schema rename to other_schema
However, there is a problem with functions that call other functions in the 
same schema. These functions have a search_path 
alter function my_schema.function1 set search_path to my_schema
If the name of the schema is changed with "alter schema...", the search path of 
the functions is not changed, so I still have to call the following after 
renaming the schema: 
alter function other_schema.function1 set search_path to other_schema
This is worse than it seems at first glance, because I need to know which 
functions have a search_path. If my list of these functions is incomplete and I 
therefore do not change the search_path for all functions, there will be an 
error in the schema after renaming the schema.

I am sure that in the vast majority of cases where a function has a 
search_path, this search_path specifies the schema in which the function is 
located, i.e. the function 
my_schema.function1 
has search_path 
my_schema
It would therefore be great if you could implement a "magic variable" called 
__function_schema__, which can be set as the search_path of a function and 
which is not evaluated from the outset, but is transferred unchanged to the 
metadata of the function:
Metadata of function1:
...
search_path: __function_schema__
...
Each time the function is executed, the variable value is determined. 
Therefore, the search_path is always correct: as long as the function is in the 
schema my_schema, the search_path __function_schema__ is evaluated to my_schema 
when the function is executed, and as soon as the function is in the schema 
other_schema after the schema has been renamed, the search_path 
__function_schema__ is evaluated to other_schema when the function is executed.
Of course, the implementation could cache the value of __function_schema__ for 
each function and only change it when the schema of the function changes.

Wilma
PS Even though I wrote that I would like to have a "magic variable" called 
__function_schema__, I would of course also be very happy with a name other 
than __function_schema__.

Your E-Mail. Your Cloud. Your Office. eclipso Mail Europe. 
https://www.eclipso.de