Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-19 Thread Masahiko Sawada
On Fri, Mar 17, 2023 at 4:49 PM Masahiko Sawada  wrote:
>
> On Fri, Mar 17, 2023 at 4:03 PM John Naylor
>  wrote:
> >
> > On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Mar 14, 2023 at 8:27 PM John Naylor
> > >  wrote:
> > > >
> > > > I wrote:
> > > >
> > > > > > > Since the block-level measurement is likely overestimating quite 
> > > > > > > a bit, I propose to simply reverse the order of the actions here, 
> > > > > > > effectively reporting progress for the *last page* and not the 
> > > > > > > current one: First update progress with the current memory usage, 
> > > > > > > then add tids for this page. If this allocated a new block, only 
> > > > > > > a small bit of that will be written to. If this block pushes it 
> > > > > > > over the limit, we will detect that up at the top of the loop. 
> > > > > > > It's kind of like our earlier attempts at a "fudge factor", but 
> > > > > > > simpler and less brittle. And, as far as OS pages we have 
> > > > > > > actually written to, I think it'll effectively respect the memory 
> > > > > > > limit, at least in the local mem case. And the numbers will make 
> > > > > > > sense.
> >
> > > > I still like my idea at the top of the page -- at least for vacuum and 
> > > > m_w_m. It's still not completely clear if it's right but I've got 
> > > > nothing better. It also ignores the work_mem issue, but I've given up 
> > > > anticipating all future cases at the moment.
> >
> > > IIUC you suggested measuring memory usage by tracking how much memory
> > > chunks are allocated within a block. If your idea at the top of the
> > > page follows this method, it still doesn't deal with the point Andres
> > > mentioned.
> >
> > Right, but that idea was orthogonal to how we measure memory use, and in 
> > fact mentions blocks specifically. The re-ordering was just to make sure 
> > that progress reporting didn't show current-use > max-use.
>
> Right. I still like your re-ordering idea. It's true that the most
> area of the last allocated block before heap scanning stops is not
> actually used yet. I'm guessing we can just check if the context
> memory has gone over the limit. But I'm concerned it might not work
> well in systems where overcommit memory is disabled.
>
> >
> > However, the big question remains DSA, since a new segment can be as large 
> > as the entire previous set of allocations. It seems it just wasn't designed 
> > for things where memory growth is unpredictable.

aset.c also has a similar characteristic; allocates an 8K block upon
the first allocation in a context, and doubles that size for each
successive block request. But we can specify the initial block size
and max blocksize. This made me think of another idea to specify both
to DSA and both values are calculated based on m_w_m. For example, we
can create a DSA in parallel_vacuum_init() as follows:

initial block size = min(m_w_m / 4, 1MB)
max block size = max(m_w_m / 8, 8MB)

In most cases, we can start with a 1MB initial segment, the same as
before. For small memory cases, say 1MB, we start with a 256KB initial
segment and heap scanning stops after DSA allocated 1.5MB (= 256kB +
256kB + 512kB + 512kB). For larger memory, we can have heap scan stop
after DSA allocates 1.25 times more memory than m_w_m. For example, if
m_w_m = 1GB, the both initial and maximum segment sizes are 1MB and
128MB respectively, and then DSA allocates the segments as follows
until heap scanning stops:

2 * (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) + (128 * 5) = 1150MB

dsa_allocate() will be extended to have the initial and maximum block
sizes like AllocSetContextCreate().

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Improve logging when using Huge Pages

2023-03-19 Thread Michael Paquier
On Mon, Mar 20, 2023 at 01:09:09AM -0400, Tom Lane wrote:
> I'm confused here, because Horiguchi-san is saying that that
> won't work.  I've not checked the code lately, but I think that
> "postgres -C var" prints its results before actually attempting
> to establish shared memory, so I suspect Horiguchi-san is right.

Yes, I haven't read correctly through.  Sorry for the noise.
--
Michael


signature.asc
Description: PGP signature


Re: Improve logging when using Huge Pages

2023-03-19 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Mar 14, 2023 at 02:02:19PM +0900, Kyotaro Horiguchi wrote:
>> I slightly prefer using a function for this, as if GUC is used, it can
>> only return "unknown" for the command "postgres -C
>> huge_page_active". However, apart from this advantage, I prefer using
>> a GUC for this information.

> The main advantage of a read-only GUC over a function is that users
> would not need to start a postmaster to know if huge pages would be
> active or not.

I'm confused here, because Horiguchi-san is saying that that
won't work.  I've not checked the code lately, but I think that
"postgres -C var" prints its results before actually attempting
to establish shared memory, so I suspect Horiguchi-san is right.

regards, tom lane




Re: Improve logging when using Huge Pages

2023-03-19 Thread Michael Paquier
On Mon, Mar 20, 2023 at 01:54:46PM +0900, Michael Paquier wrote:
> The main advantage of a read-only GUC over a function is that users
> would not need to start a postmaster to know if huge pages would be
> active or not.  This is the main reason why a GUC would be a better
> fit, in my opinion, because it makes for a cheaper check, while still
> allowing a SQL query to check the value of the GUC.

[ Should have read more carefully ]

..  Which is something you cannot do with -C because mmap() happens
after the runtime-computed logic for postgres -C.  It does not sound
right to do the mmap() for a GUC check, so indeed a function may be
more adapted rather than move mmap() call a bit earlier in the
postmaster startup sequence.
--
Michael


signature.asc
Description: PGP signature


Re: Improve logging when using Huge Pages

2023-03-19 Thread Michael Paquier
On Tue, Mar 14, 2023 at 02:02:19PM +0900, Kyotaro Horiguchi wrote:
> Regarding huge_page_active, its value remains constant throughout a
> postmaster's lifespan. In this regard, GUC may be a better fit for
> this information.  The issue with using GUC for this value is that the
> postgres command cannot report the final value via the -C option,
> which may be the reason for the third alternative "unknown".
> 
> I slightly prefer using a function for this, as if GUC is used, it can
> only return "unknown" for the command "postgres -C
> huge_page_active". However, apart from this advantage, I prefer using
> a GUC for this information.

The main advantage of a read-only GUC over a function is that users
would not need to start a postmaster to know if huge pages would be
active or not.  This is the main reason why a GUC would be a better
fit, in my opinion, because it makes for a cheaper check, while still
allowing a SQL query to check the value of the GUC.
--
Michael


signature.asc
Description: PGP signature


Re: Initial Schema Sync for Logical Replication

2023-03-19 Thread Amit Kapila
On Thu, Mar 16, 2023 at 10:27 PM Kumar, Sachin  wrote:
>
> > Hi,
> >
> > I have a couple of questions.
> >
> > Q1.
> >
> > What happens if the subscriber already has some tables present? For
> > example, I did not see the post saying anything like "Only if the table does
> > not already exist then it will be created".
> >
> My assumption was the if subscriber is doing initial schema sync , It does 
> not have
> any conflicting database objects.
>

Can't we simply error out in such a case with "obj already exists"?
This would be similar to how we deal with conflicting rows with
unique/primary keys.

-- 
With Regards,
Amit Kapila.




RE: Allow logical replication to copy tables in binary format

2023-03-19 Thread Hayato Kuroda (Fujitsu)
Dear Melih,

Thank you for updating the patch.
I checked your added description about initial data sync and I think it's OK.

Few minor comments:

01. copy_table

```
+   List   *options = NIL;
```

I found a unnecessary blank just after "List". You can remove it and align 
definition.

02. copy_table

```
+   options = lappend(options, makeDefElem("format", (Node *) 
makeString("binary"), -1));
```

The line seems to exceed 80 characters. How do you think to change like 
following?

```
options = lappend(options,
  makeDefElem("format",
  (Node 
*) makeString("binary"), -1));
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: logical decoding and replication of sequences, take 2

2023-03-19 Thread Amit Kapila
On Sat, Mar 18, 2023 at 8:49 PM Tomas Vondra
 wrote:
>
> On 3/18/23 06:35, Amit Kapila wrote:
> > On Sat, Mar 18, 2023 at 3:13 AM Tomas Vondra
> >  wrote:
> >>
> >> ...
> >>
> >> Clearly, for sequences we can't quite rely on snapshots/slots, we need
> >> to get the LSN to decide what changes to apply/skip from somewhere else.
> >> I wonder if we can just ignore the queued changes in tablesync, but I
> >> guess not - there can be queued increments after reading the sequence
> >> state, and we need to apply those. But maybe we could use the page LSN
> >> from the relfilenode - that should be the LSN of the last WAL record.
> >>
> >> Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we
> >> use to read the sequence state ...
> >>
> >
> > What if some Alter Sequence is performed before the copy starts and
> > after the copy is finished, the containing transaction rolled back?
> > Won't it copy something which shouldn't have been copied?
> >
>
> That shouldn't be possible - the alter creates a new relfilenode and
> it's invisible until commit. So either it gets committed (and then
> replicated), or it remains invisible to the SELECT during sync.
>

Okay, however, we need to ensure that such a change will later be
replicated and also need to ensure that the required WAL doesn't get
removed.

Say, if we use your first idea of page LSN from the relfilenode, then
how do we ensure that the corresponding WAL doesn't get removed when
later the sync worker tries to start replication from that LSN? I am
imagining here the sync_sequence_slot will be created before
copy_sequence but even then it is possible that the sequence has not
been updated for a long time and the LSN location will be in the past
(as compared to the slot's LSN) which means the corresponding WAL
could be removed. Now, here we can't directly start using the slot's
LSN to stream changes because there is no correlation of it with the
LSN (page LSN of sequence's relfilnode) where we want to start
streaming.

Now, for the second idea which is to directly use
pg_current_wal_insert_lsn(), I think we won't be able to ensure that
the changes covered by in-progress transactions like the one with
Alter Sequence I have given example would be streamed later after the
initial copy. Because the LSN returned by pg_current_wal_insert_lsn()
could be an LSN after the LSN associated with Alter Sequence but
before the corresponding xact's commit.

-- 
With Regards,
Amit Kapila.




Re: Add pg_walinspect function with block info columns

2023-03-19 Thread Kyotaro Horiguchi
At Sat, 18 Mar 2023 10:08:53 +0530, Bharath Rupireddy 
 wrote in 
> On Sat, Mar 18, 2023 at 1:06 AM Peter Geoghegan  wrote:
> >
> > On Fri, Mar 17, 2023 at 12:20 AM Bharath Rupireddy
> >  wrote:
> > > +1 for pg_get_wal_block_info emitting per-record WAL info too along
> > > with block info, attached v2 patch does that. IMO, usability wins the
> > > race here.
> >
> > I think that this direction makes a lot of sense. Under this scheme,
> > we still have pg_get_wal_records_info(), which is more or less an SQL
> > interface to the information that pg_waldump presents by default.
> > That's important when the record view of things is of primary
> > importance. But now you also have a "block oriented" view of WAL
> > presented by pg_get_wal_block_info(), which is useful when particular
> > blocks are of primary interest. I think that I'll probably end up
> > using both, while primarily using pg_get_wal_block_info() for more
> > advanced analysis that focuses on what happened to particular blocks
> > over time.
> 
> Hm.

Even though I haven't explored every aspect of the view, I believe it
makes sense to have at least the record_type in the block data. I
don't know how often it will be used, but considering that, I have no
objections to adding all the record information (apart from the block
data itself) to the block data.

> > It makes sense to present pg_get_wal_block_info() immediately after
> > pg_get_wal_records_info() in the documentation under this scheme,
> > since they're closely related.
> 
> -1. I don't think we need that and even if we did, it's hard to
> maintain that ordering in future. One who knows to use these functions
> will anyway get to know how they're related.

The documentation has just one section titled "General Functions"
which directly contains detailed explation of four functions, making
it hard to get clear understanding of the available functions.  I
considered breaking it down into a few subsections, but that wouldn't
look great since most of them would only contain one function.
However, I feel it would be helpful to add a list of all functions at
the beginning of the section.

> > (Checks pg_walinspect once more...)
> >
> > Actually, I now see that block_ref won't be NULL for those records
> > that have no block references at all -- it just outputs an empty
> > string.
> 
> Yes, that's unnecessary.
> 
> > But wouldn't it be better if it actually output NULL?
> 
> +1 done so in the attached 0001 patch.
> 
> > Better
> > for its own sake, but also better because doing so enables describing
> > the relationship between the two functions with reference to
> > block_ref. It seems particularly helpful to me to be able to say that
> > pg_get_wal_block_info() doesn't show anything for precisely those WAL
> > records whose block_ref is NULL according to
> > pg_get_wal_records_info().
> 
> Hm.

I agree that adding a note about the characteristics would helpful to
avoid the misuse of pg_get_wal_block_info(). How about something like,
"Note that pg_get_wal_block_info() omits records that contains no
block references."?

> Attaching v3 patch set - 0001 optimizations around block references,
> 0002 enables pg_get_wal_block_info() to emit per-record info. Any
> thoughts?

+   /* Get block references, if any, otherwise continue. */
+   if (!XLogRecHasAnyBlockRefs(xlogreader))
+   continue;

I'm not sure, but the "continue" might be confusing since the code
"continue"s if the condition is true and continues the process
otherwise..  And it seems like a kind of "explaination of what the
code does". I feel we don't need the a comment there.

It is not an issue with this patch, but as I look at this version, I'm
starting to feel uneasy about the subtle differences between what
GetWALRecordsInfo and GetWALBlockInfo do. One solution might be to
have GetWALBlockInfo return a values array for each block, but that
could make things more complex than needed. Alternatively, could we
get GetWALRecordsInfo to call tuplestore_putvalues() internally? This
way, both functions can manage the temporary memory context within
themselves.



About 0002:

+   /* Reset only per-block output columns, keep per-record info 
as-is. */
+   memset([PG_GET_WAL_BLOCK_INFO_PER_RECORD_COLS], 0,
+  PG_GET_WAL_BLOCK_INFO_PER_RECORD_COLS * 
sizeof(bool));
+   memset([PG_GET_WAL_BLOCK_INFO_PER_RECORD_COLS], 0,
+  PG_GET_WAL_BLOCK_INFO_PER_RECORD_COLS * 
sizeof(bool));

sizeof(*values) is not sizeof(bool), but sizeof(Datum).

It seems to me that the starting elemnt of the arrays is
(PG_GET_WAL_BLOCK_INFO_COLS -
PG_GET_WAL_BLOCK_INFO_PER_RECORD_COLS). But I don't think simply
rewriting that way is great.

  #define PG_GET_WAL_RECORD_INFO_COLS 11
...
+#define PG_GET_WAL_BLOCK_INFO_PER_RECORD_COLS 9

This means GetWALBlockInfo overwrites the last two columns generated
by GetWalRecordInfo, but I don't think 

Re: meson documentation build open issues

2023-03-19 Thread Andres Freund
Hi,

On 2023-03-15 20:55:33 -0700, Andres Freund wrote:
> WIP patch for that attached. There's now
>   install-doc-man
>   install-doc-html
> run targets and a
>   install-docs
> alias target.
> 
> 
> I did end up getting stuck when hacking on this, and ended up adding css
> support for nochunk and support for the website style for htmlhelp and
> nochunk, as well as obsoleting the need for copying the css files... But
> perhaps that's a bit too much.

Updated set of patches attached. This one works in older meson versions too
and adds install-world and install-quiet targets.


I also ended up getting so frustrated at the docs build speed that I started
to hack a bit on that. I attached a patch shaving a few seconds off the
buildtime.


I think we can make the docs build in parallel and incrementally, by building
the different parts of the docs in parallel, using --stringparam rootid,
e.g. building each 'part' separately.

A very very rough draft attached:

parallel with parts:
real0m10.831s
user0m58.295s
sys 0m1.402s

normal:
real0m32.215s
user0m31.876s
sys 0m0.328s

1/3 of the build time at 2x the cost is nothing to sneeze at.

Greetings,

Andres Freund
>From 9313c2938265109b3a82f96697bae26ed3f74412 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 15 Mar 2023 15:53:14 -0700
Subject: [PATCH v2 1/8] meson: rename html_help target to htmlhelp

Reported-by: Peter Eisentraut 
Discussion: https://postgr.es/m/3fc3bb9b-f7f8-d442-35c1-ec82280c5...@enterprisedb.com
---
 doc/src/sgml/meson.build | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 38f1b8e7b17..e6fe124c7bc 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -123,7 +123,7 @@ if xsltproc_bin.found()
   # build multi-page html docs as part of docs target
   docs += html
 
-  html_help = custom_target('html_help',
+  htmlhelp = custom_target('htmlhelp',
 input: ['stylesheet-hh.xsl', postgres_full_xml],
 output: 'htmlhelp',
 depfile: 'htmlhelp.d',
@@ -131,7 +131,7 @@ if xsltproc_bin.found()
 command: [xsltproc, '--path', '@OUTDIR@', '-o', '@OUTDIR@/', xsltproc_flags, '@INPUT@'],
 build_by_default: false,
   )
-  alldocs += html_help
+  alldocs += htmlhelp
 
 
   # single-page HTML
-- 
2.38.0

>From 10fd8d20ae15889054e69e0b62a445e52b9afe2f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 15 Mar 2023 16:29:27 -0700
Subject: [PATCH v2 2/8] meson: make install_test_files more generic, rename to
 install_files

Now it supports installing directories and directory contents as well. This
will be used in a subsequent patch to install doc contents.
---
 meson.build  |  5 ++-
 src/tools/install_files  | 72 
 src/tools/install_test_files | 33 -
 3 files changed, 75 insertions(+), 35 deletions(-)
 create mode 100644 src/tools/install_files
 delete mode 100644 src/tools/install_test_files

diff --git a/meson.build b/meson.build
index 7f76a101ecf..84fe2c3d4c3 100644
--- a/meson.build
+++ b/meson.build
@@ -369,6 +369,8 @@ flex_cmd = [python, flex_wrapper,
 wget = find_program('wget', required: false, native: true)
 wget_flags = ['-O', '@OUTPUT0@', '--no-use-server-timestamps']
 
+install_files = files('src/tools/install_files')
+
 
 
 ###
@@ -2845,9 +2847,8 @@ testprep_targets += test_install_libs
 
 
 # command to install files used for tests, which aren't installed by default
-install_test_files = files('src/tools/install_test_files')
 install_test_files_args = [
-  install_test_files,
+  install_files,
   '--prefix', dir_prefix,
   '--install', contrib_data_dir, test_install_data,
   '--install', dir_lib_pkg, test_install_libs,
diff --git a/src/tools/install_files b/src/tools/install_files
new file mode 100644
index 000..0428316b67e
--- /dev/null
+++ b/src/tools/install_files
@@ -0,0 +1,72 @@
+#!/usr/bin/env python3
+
+# Helper to install additional files into the temporary installation
+# for tests, beyond those that are installed by meson/ninja install.
+
+import argparse
+import os
+import shutil
+import sys
+from pathlib import PurePath
+
+parser = argparse.ArgumentParser()
+
+parser.add_argument('--destdir', type=str,
+default=os.environ.get('DESTDIR', None))
+parser.add_argument('--prefix', type=str)
+parser.add_argument('--install', type=str, nargs='+',
+action='append', default=[])
+parser.add_argument('--install-dirs', type=str, nargs='+',
+action='append', default=[])
+parser.add_argument('--install-dir-contents', type=str, nargs='+',
+action='append', default=[])
+
+args = parser.parse_args()
+
+
+def error_exit(msg: str):
+print(msg, file=sys.stderr)
+exit(1)
+
+
+def create_target_dir(prefix: str, destdir: str, targetdir: str):
+if not os.path.isabs(targetdir):
+

Re: Allow logical replication to copy tables in binary format

2023-03-19 Thread Amit Kapila
On Mon, Mar 20, 2023 at 3:37 AM Peter Smith  wrote:
>
> There are a couple of TAP tests where the copy binary is expected to
> fail. And when it fails, you do binary=false (changing the format back
> to 'text') so the test is then expected to be able to proceed.
>
> I don't know if this happens in practice, but IIUC in theory, if the
> timing is extremely bad, the tablesync could relaunch in binary mode
> multiple times (any fail multiple times?) before your binary=false
> change takes effect.
>
> So, I was wondering if it could help to use the subscription
> 'disable_on_error=true' parameter for those cases so that the
> tablesync won't needlessly attempt to relaunch until you have set
> binary=false and then re-enabled the subscription.
>

+1. That would make tests more reliable.

-- 
With Regards,
Amit Kapila.




Re: Lock conflict

2023-03-19 Thread David Rowley
On Mon, 20 Mar 2023 at 14:58, 席冲(宜穆)  wrote:
> I think lock requested only check for conflict with already-held lock, if 
> there is no conflict, the lock should be granted.

That would mean that stronger locks such as AEL might never be granted
if there was never any moment when no other conflicting locks existed
(which is very likely to happen on busy OLTP-type workloads).  The way
it works now makes it fair so that weaker locks don't jump the queue.

David




Lock conflict

2023-03-19 Thread 席冲(宜穆)
Hello all,
This PostgreSQL version is 11.9.
In LockAcquireExtended(), why if lock requested conflicts with locks requested 
by waiters, must join
wait queue. Why does the lock still check for conflict with the lock requested, 
rather than check for directly with conflict with the already-held lock?
I think lock requested only check for conflict with already-held lock, if there 
is no conflict, the lock should be granted.
Best regards


Re: Commitfest 2023-03 starting tomorrow!

2023-03-19 Thread Thomas Munro
On Mon, Mar 20, 2023 at 1:10 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > ... I suppose then someone might complain
> > that it should be clearer if a patch hasn't applied for a very long
> > time; suggestions for how to show that are welcome.
>
> Can you make the pop-up tooltip text read "Rebase needed since
> -MM-DD"?

Done.  It's the GMT date of the first failure to apply.




RE: Data is copied twice when specifying both child and parent table in publication

2023-03-19 Thread wangw.f...@fujitsu.com
On Fri, Mar 17, 2023 at 20:07 PM Amit Kapila  wrote:
> On Fri, Mar 17, 2023 at 11:58 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Thu, Mar 16, 2023 at 20:25 PM Amit Kapila 
> wrote:
> > >
> >
> > Thanks for your comments.
> >
> > > + if (server_version >= 16)
> > > + {
> > > + appendStringInfo(, "SELECT DISTINCT N.nspname, C.relname,\n"
> > > + "  ( SELECT array_agg(a.attname ORDER BY a.attnum)\n"
> > > + "FROM pg_attribute a\n"
> > > + "WHERE a.attrelid = GPT.relid AND a.attnum > 0 AND\n"
> > > + "  NOT a.attisdropped AND\n"
> > > + "  (a.attnum = ANY(GPT.attrs) OR GPT.attrs IS 
> > > NULL)\n"
> > > + "  ) AS attnames\n"
> > > + " FROM pg_class C\n"
> > > + "   JOIN pg_namespace N ON N.oid = C.relnamespace\n"
> > > + "   JOIN ( SELECT (pg_get_publication_tables(VARIADIC
> > > array_agg(pubname::text))).*\n"
> > > + "  FROM pg_publication\n"
> > > + "  WHERE pubname IN ( %s )) as GPT\n"
> > > + "   ON GPT.relid = C.oid\n",
> > > + pub_names.data);
> > >
> > > The function pg_get_publication_tables()  has already handled dropped
> > > columns, so we don't need it here in this query. Also, the part to
> > > build attnames should be the same as it is in view
> > > pg_publication_tables.
> >
> > Agree. Changed.
> >
> > > Can we directly try to pass the list of
> > > pubnames to the function pg_get_publication_tables() instead of
> > > joining it with pg_publication?
> >
> > Changed.
> > I think the aim of joining it with pg_publication before is to exclude
> > non-existing publications.
> >
> 
> Okay, A comment for that would have made it clear.

Make sense. Added the comment atop the query.

> > Otherwise, we would get an error because of the call
> > to function GetPublicationByName (with 'missing_ok = false') in function
> > pg_get_publication_tables. So, I changed "missing_ok" to true. If anyone 
> > doesn't
> > like this change, I'll reconsider this in the next version.
> >
> 
> I am not sure about changing missing_ok behavior. Did you check it for
> any other similar usage in other functions?

After reviewing the pg_get_* functions in the 'pg_proc.dat' file, I think most
of them ignore incorrect input, such as the function pg_get_indexdef. However,
some functions, such as pg_get_serial_sequence and pg_get_object_address, will
report an error. So, I think it's better to discuss this in a separate thread.
Reverted this modification. And I will start a new separate thread for this
later.

> + foreach(lc, pub_elem_tables)
> + {
> + published_rel *table_info = (published_rel *) malloc(sizeof(published_rel));
> 
> Is there a reason to use malloc instead of palloc?

No. I think we need to use palloc here.
Changed.

Attach the new patch set.

Regards,
Wang Wei


HEAD_v18-0001-Fix-data-replicated-twice-when-specifying-publis.patch
Description:  HEAD_v18-0001-Fix-data-replicated-twice-when-specifying-publis.patch


HEAD_v18-0002-Fix-this-problem-for-back-branches.patch
Description: HEAD_v18-0002-Fix-this-problem-for-back-branches.patch


HEAD_v18-0003-Add-clarification-for-the-behaviour-of-the-publi.patch
Description:  HEAD_v18-0003-Add-clarification-for-the-behaviour-of-the-publi.patch


REL15_v18-0001-Fix-data-replicated-twice-when-specifying-publis_patch
Description:  REL15_v18-0001-Fix-data-replicated-twice-when-specifying-publis_patch


REL14_v18-0001-Fix-data-replicated-twice-when-specifying-publis_patch
Description:  REL14_v18-0001-Fix-data-replicated-twice-when-specifying-publis_patch


Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2023-03-19 Thread Michael Paquier
On Thu, Jun 09, 2022 at 11:21:58AM -0700, Soumyadeep Chakraborty wrote:
> Thanks. Fixed and rebased.

I think that I am OK with the concept of this patch to use a
partitioned table's relam as a reference when creating a partition
rather than relying on the default GUC, in a way similar to
tablespaces.

One worry I see is that forcing a recursion on the leaves on ALTER
TABLE could silently break partitions where multiple table AMs are
used across separate partitions if ALTER TABLE SET ACCESS METHOD is
used on one of the parents, though it seems like this is not something
I would much worry about as now the command is an error.

A second worry is that we would just break existing creation flows
that rely on the GUC defining the default AM.  This is worth a close
lookup at pg_dump to make sure that we do things correctly with this
patch in place..  Did you check dump and restore flows with partition
trees and --no-table-access-method?  Perhaps there should be
some regression tests with partitioned tables?

+/*
+ * For partitioned tables, when no access method is specified, we
+ * default to the parent table's AM.
+ */
+Assert(list_length(inheritOids) == 1);
+/* XXX: should implement get_rel_relam? */
+relid = linitial_oid(inheritOids);
+tup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+if (!HeapTupleIsValid(tup))
+elog(ERROR, "cache lookup failed for relation %u", relid);

Having a wrapper for that could be useful, yes.  We don't have any
code paths that would directly need that now, from what I can see,
though.  This patch gives one reason to have one.
--
Michael


signature.asc
Description: PGP signature


Re: Commitfest 2023-03 starting tomorrow!

2023-03-19 Thread Tom Lane
Thomas Munro  writes:
> ... I suppose then someone might complain
> that it should be clearer if a patch hasn't applied for a very long
> time; suggestions for how to show that are welcome.

Can you make the pop-up tooltip text read "Rebase needed since
-MM-DD"?

regards, tom lane




Re: Commitfest 2023-03 starting tomorrow!

2023-03-19 Thread Thomas Munro
On Mon, Mar 20, 2023 at 11:13 AM Thomas Munro  wrote:
> I realised that part of Alvaro's complaint was probably caused by
> cfbot's refusal to show any useful information just because it
> couldn't apply a patch the last time it tried.  A small improvement
> today: now it shows a ♲ symbol (with hover text "Rebase needed") if it
> doesn't currently apply, but you can still see the most recent CI test
> results.  And from there you can find your way to the parent commit
> ID.

And in the cases where it still shows no results, that'd be because
the patch set hasn't successfully applied in the past 46 days, ie
since 1 Feb, when cfbot started retaining history.  That visible
amnesia should gradually disappear as those patches make progress and
the history window expands.  I suppose then someone might complain
that it should be clearer if a patch hasn't applied for a very long
time; suggestions for how to show that are welcome.  I wondered about
making them gradually fade out to white, ghost memories that
eventually disappear completely after a few commitfests :-D




Re: Remove AUTH_REQ_KRB4 and AUTH_REQ_KRB5 in libpq code

2023-03-19 Thread Michael Paquier
On Sun, Mar 19, 2023 at 06:53:28PM -0400, Tom Lane wrote:
> 9.2 is still within our "supported old versions" window, so it's
> at least plausible that somebody would hit this for KRB5.  Still,
> the net effect would be that they'd get "authentication method 2
> not supported" instead of "Kerberos 5 authentication not supported".
> I lean (weakly) to the idea that it's no longer worth the translation
> maintenance effort to keep the special message.
> 
> A compromise could be to drop KRB4 but keep the KRB5 case for
> awhile yet.

Hmm.  I think that I would still drop both of them at the end, even in
v16 but I won't fight hard on that, either.  The only difference is
the verbosity of the error string generated, and there is still a
trace of what the code numbers were in pqcomm.h.

> One other thought is that I don't really like these comments
> implying that recycling these AUTH_REQ codes might be a good
> thing to do:
> 
> +/* 1 is available. It was used for Kerberos V4, not supported any more  */
> 
> I think we'd be better off treating them as permanently retired.
> It's not like there's any shortage of code space to worry about.
> More, there might be other implementations of our wire protocol
> that still have support for these codes, so that re-using them
> could cause compatibility issues.  So maybe write "reserved"
> instead of "available"?

Okay, fine by me.
--
Michael


signature.asc
Description: PGP signature


Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-19 Thread Michael Paquier
On Thu, Mar 16, 2023 at 02:13:45PM +0100, Drouvot, Bertrand wrote:
> On 3/16/23 12:46 PM, Michael Paquier wrote:
>> There is no trace of them.
>> Perhaps the ones exposted through pg_stat_xact_all_tables are fine if
>> not listed.
> 
> I'd be tempted to add documentation for all of them, I can look at it.

I am not sure that there is any need to completely undo ddfc2d9, later
simplified by 5f2b089, so my opinion would be to just add
documentation for the functions that can be used but are used in none
of the system functions.  

Anyway, double-checking, I only see an inconsistency for these two,
confirming my first impression:
- pg_stat_get_xact_blocks_fetched
- pg_stat_get_xact_blocks_hit

There may be a point in having them in some of the system views, but
the non-xact flavors are only used in the statio views, which don't
really need xact versions AFAIK.  I am not sure that it makes much
sense to add them in pg_stat_xact_all_tables, either.  Another view is
just remove them, though some could rely on them externally.  At the
end, documenting both still sounds like the best move to me.
--
Michael


signature.asc
Description: PGP signature


Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-19 Thread Melanie Plageman
On Wed, Mar 15, 2023 at 6:46 AM Ants Aasma  wrote:
>
> On Wed, 15 Mar 2023 at 02:29, Melanie Plageman
>  wrote:
> > As for routine vacuuming and the other buffer access strategies, I think
> > there is an argument for configurability based on operator knowledge --
> > perhaps your workload will use the data you are COPYing as soon as the
> > COPY finishes, so you might as well disable a buffer access strategy or
> > use a larger fraction of shared buffers. Also, the ring sizes were
> > selected sixteen years ago and average server memory and data set sizes
> > have changed.
>
> To be clear I'm not at all arguing against configurability. I was
> thinking that dynamic use could make the configuration simpler by self
> tuning to use no more buffers than is useful.

Yes, but I am struggling with how we would define "useful".

> > StrategyRejectBuffer() will allow bulkreads to, as you say, use more
> > buffers than the original ring size, since it allows them to kick
> > dirty buffers out of the ring and claim new shared buffers.
> >
> > Bulkwrites and vacuums, however, will inevitably dirty buffers and
> > require flushing the buffer (and thus flushing the associated WAL) when
> > reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of
> > the ring, since dirtying buffers is their common case. A dynamic
> > resizing like the one you suggest would likely devolve to vacuum and
> > bulkwrite strategies always using the max size.
>
> I think it should self stabilize around the point where the WAL is
> either flushed by other commit activity, WAL writer or WAL buffers
> filling up. Writing out their own dirtied buffers will still happen,
> just the associated WAL flushes will be in larger chunks and possibly
> done by other processes.

They will have to write out any WAL associated with modifications to the
dirty buffer before flushing it, so I'm not sure I understand how this
would work.

> > As for decreasing the ring size, buffers are only "added" to the ring
> > lazily and, technically, as it is now, buffers which have been added
> > added to the ring can always be reclaimed by the clocksweep (as long as
> > they are not pinned). The buffer access strategy is more of a
> > self-imposed restriction than it is a reservation. Since the ring is
> > small and the buffers are being frequently reused, odds are the usage
> > count will be 1 and we will be the one who set it to 1, but there is no
> > guarantee. If, when attempting to reuse the buffer, its usage count is
> > > 1 (or it is pinned), we also will kick it out of the ring and go look
> > for a replacement buffer.
>
> Right, but while the buffer is actively used by the ring it is
> unlikely that clocksweep will find it at usage 0 as the ring buffer
> should cycle more often than the clocksweep. Whereas if the ring stops
> using a buffer, clocksweep will eventually come and reclaim it. And if
> the ring shrinking decision turns out to be wrong before the
> clocksweep gets around to reusing it, we can bring the same buffer
> back into the ring.

I can see what you mean about excluding a buffer from the ring being a
more effective way of allowing it to be reclaimed. However, I'm not sure
I understand the use case. If the operation, say vacuum, is actively
using the buffer and keeping its usage count at one, then what would be
the criteria for it to decide to stop using it?

Also, if vacuum used the buffer once and then didn't reuse it but, for
some reason, the vacuum isn't over, it isn't any different at that point
than some other buffer with a usage count of one. It isn't any harder
for it to be reclaimed by the clocksweep.

The argument I could see for decreasing the size even when the buffers
are being used by the operation employing the strategy is if there is
pressure from other workloads to use those buffers. But, designing a
system that would reclaim buffers when needed by other workloads is more
complicated than what is being proposed here.

> > I do think that it is a bit unreasonable to expect users to know how
> > large they would like to make their buffer access strategy ring. What we
> > want is some way of balancing different kinds of workloads and
> > maintenance tasks reasonably. If your database has no activity because
> > it is the middle of the night or it was shutdown because of transaction
> > id wraparound, there is no reason why vacuum should limit the number of
> > buffers it uses. I'm sure there are many other such examples.
>
> Ideally yes, though I am not hopeful of finding a solution that does
> this any time soon. Just to take your example, if a nightly
> maintenance job wipes out the shared buffer contents slightly
> optimizing its non time-critical work and then causes morning user
> visible load to have big latency spikes due to cache misses, that's
> not a good tradeoff either.

Yes, that is a valid concern.

- Melanie




Re: Remove AUTH_REQ_KRB4 and AUTH_REQ_KRB5 in libpq code

2023-03-19 Thread Tom Lane
Michael Paquier  writes:
> $subject has been discussed here, still seems worth its own thread for
> clarity:
> https://www.postgresql.org/message-id/4037249.1679011...@sss.pgh.pa.us

> Support for Kerberos v4 has been removed in a159ad3 (2005) and the
> same happened for v5 in 98de86e (2014, meaning that this is still
> possible with 9.2 and 9.3 backends).  Anyway, the attached seems worth
> the simplifications now?  This includes a cleanup of protocol.sgml.

9.2 is still within our "supported old versions" window, so it's
at least plausible that somebody would hit this for KRB5.  Still,
the net effect would be that they'd get "authentication method 2
not supported" instead of "Kerberos 5 authentication not supported".
I lean (weakly) to the idea that it's no longer worth the translation
maintenance effort to keep the special message.

A compromise could be to drop KRB4 but keep the KRB5 case for
awhile yet.

One other thought is that I don't really like these comments
implying that recycling these AUTH_REQ codes might be a good
thing to do:

+/* 1 is available. It was used for Kerberos V4, not supported any more  */

I think we'd be better off treating them as permanently retired.
It's not like there's any shortage of code space to worry about.
More, there might be other implementations of our wire protocol
that still have support for these codes, so that re-using them
could cause compatibility issues.  So maybe write "reserved"
instead of "available"?

regards, tom lane




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-03-19 Thread Melanie Plageman
Thanks for the review!

Attached is an updated v6.

v6 has some updates and corrections. It has two remaining TODOs in the
code: one is around what value to initialize the ring_size to in
VacuumParams, the other is around whether or not parallel vacuum index
workers should in fact stick with the default buffer access strategy
sizes.

I also separated vacuumdb into its own commit.

I also have addressed Justin's review feedback.

On Sat, Mar 18, 2023 at 2:30 PM Justin Pryzby  wrote:
>
> > Subject: [PATCH v4 3/3] add vacuum[db] option to specify ring_size and guc
>
> > +Specifies the ring buffer size to be used for a given invocation of
> > +VACUUM or instance of autovacuum. This size is
> > +converted to a number of shared buffers which will be reused as 
> > part of
>
> I'd say "specifies the size of shared_buffers to be reused as .."

I've included "shared_buffers" in the description.

> > +a Buffer Access Strategy. 0 
> > will
> > +disable use of a Buffer Access Strategy.
> > +-1 will set the size to a default of 
> > 256
> > +kB. The maximum ring buffer size is 16 
> > GB.
> > +Though you may set vacuum_buffer_usage_limit 
> > below
> > +128 kB, it will be clamped to 128
> > +kB at runtime. The default value is 
> > -1.
> > +This parameter can be set at any time.
>
> GUC docs usually also say something like
> "If this value is specified without units, it is taken as .."

I had updated this in v5 with slightly different wording, but I now am
using the wording you suggested (which does appear standard in the rest
of the docs).

>
> > +  is used to calculate a number of shared buffers which will be reused 
> > as
>
> *the* number?

updated.

>
> > +  VACUUM. The analyze stage and parallel vacuum 
> > workers
> > +  do not use this size.
>
> I think what you mean is that vacuum's heap scan stage uses the
> strategy, but the index scan/cleanup phases doesn't?

Yes, non-parallel index vacuum and cleanup will use whatever value you
specify but parallel workers make their own buffer access strategy
object. I've updated the docs to indicate that they will use the default
size for this.


>
> > +The size in kB of the ring buffer used for vacuuming. This size is 
> > used
> > +to calculate a number of shared buffers which will be reused as 
> > part of
>
> *the* number

fixed.

> > +++ b/doc/src/sgml/ref/vacuumdb.sgml
>
> The docs here duplicate the sql-vacuum docs.  It seems better to refer
> to the vacuum page for details, like --parallel does.

Good idea.

>
> Unrelated: it would be nice if the client-side options were documented
> separately from the server-side options.  Especially due to --jobs and
> --parallel.

Yes, that would be helpful.

> > + if (!parse_int(vac_buffer_size, , GUC_UNIT_KB, 
> > NULL))
> > + {
> > + ereport(ERROR,
> > + 
> > (errcode(ERRCODE_SYNTAX_ERROR),
> > + 
> > errmsg("buffer_usage_limit for a vacuum must be between -1 and %d. %s is 
> > invalid.",
> > + 
> > MAX_BAS_RING_SIZE_KB, vac_buffer_size),
> > + 
> > parser_errposition(pstate, opt->location)));
> > + }
> > +
> > + /* check for out-of-bounds */
> > + if (result < -1 || result > MAX_BAS_RING_SIZE_KB)
> > + {
> > + ereport(ERROR,
> > + 
> > (errcode(ERRCODE_SYNTAX_ERROR),
> > + 
> > errmsg("buffer_usage_limit for a vacuum must be between -1 and %d",
> > + 
> > MAX_BAS_RING_SIZE_KB),
> > + 
> > parser_errposition(pstate, opt->location)));
> > + }
>
> I think these checks could be collapsed into a single ereport().
>
> if !parse_int() || (result < -1 || result > MAX_BAS_RINGSIZE_KB):
> ereport(ERROR,
> errcode(ERRCODE_SYNTAX_ERROR),
> errmsg("buffer_usage_limit for a vacuum must be an integer 
> between -1 and %d",
> MAX_BAS_RING_SIZE_KB),
>
> There was a recent, similar, and unrelated suggestion here:
> https://www.postgresql.org/message-id/20230314.135859.260879647537075548.horikyota.ntt%40gmail.com

So, these have been updated/improved in v5. I still didn't combine them.
I see what you are saying about combining them (and I checked the link
you shared), but in this case, having them separate allows me to provide
info using the hintmsg passed to parse_int() about why it failed during
parse_int -- which could be something not related 

Re: Avoid use deprecated Windows Memory API

2023-03-19 Thread Michael Paquier
On Fri, Mar 17, 2023 at 12:19:56PM -0300, Ranier Vilela wrote:
> Rebased to latest Head.

I was looking at this thread, and echoing Daniel's and Alvaro's
arguments, I don't quite see why this patch is something we need.  I
would recommend to mark it as rejected and move on.
--
Michael


signature.asc
Description: PGP signature


Remove AUTH_REQ_KRB4 and AUTH_REQ_KRB5 in libpq code

2023-03-19 Thread Michael Paquier
Hi all,

$subject has been discussed here, still seems worth its own thread for
clarity:
https://www.postgresql.org/message-id/4037249.1679011...@sss.pgh.pa.us

Support for Kerberos v4 has been removed in a159ad3 (2005) and the
same happened for v5 in 98de86e (2014, meaning that this is still
possible with 9.2 and 9.3 backends).  Anyway, the attached seems worth
the simplifications now?  This includes a cleanup of protocol.sgml.

Thoughts?
--
Michael
diff --git a/src/include/libpq/pqcomm.h b/src/include/libpq/pqcomm.h
index bff7dd18a2..8251639cd3 100644
--- a/src/include/libpq/pqcomm.h
+++ b/src/include/libpq/pqcomm.h
@@ -111,8 +111,8 @@ extern PGDLLIMPORT bool Db_user_namespace;
 /* These are the authentication request codes sent by the backend. */
 
 #define AUTH_REQ_OK			0	/* User is authenticated  */
-#define AUTH_REQ_KRB4		1	/* Kerberos V4. Not supported any more. */
-#define AUTH_REQ_KRB5		2	/* Kerberos V5. Not supported any more. */
+/* 1 is available. It was used for Kerberos V4, not supported any more  */
+/* 2 is available. It was used for Kerberos V5, not supported any more  */
 #define AUTH_REQ_PASSWORD	3	/* Password */
 #define AUTH_REQ_CRYPT		4	/* crypt password. Not supported any more. */
 #define AUTH_REQ_MD5		5	/* md5 password */
diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index fa95f8e6e9..1ce0794f89 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -942,14 +942,6 @@ pg_fe_sendauth(AuthRequest areq, int payloadlen, PGconn *conn)
 		case AUTH_REQ_OK:
 			break;
 
-		case AUTH_REQ_KRB4:
-			libpq_append_conn_error(conn, "Kerberos 4 authentication not supported");
-			return STATUS_ERROR;
-
-		case AUTH_REQ_KRB5:
-			libpq_append_conn_error(conn, "Kerberos 5 authentication not supported");
-			return STATUS_ERROR;
-
 #if defined(ENABLE_GSS) || defined(ENABLE_SSPI)
 		case AUTH_REQ_GSS:
 #if !defined(ENABLE_SSPI)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 8b5e7b1ad7..ed7a3ceda1 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -271,20 +271,6 @@
   
  
 
- 
-  AuthenticationKerberosV5
-  
-   
-The frontend must now take part in a Kerberos V5
-authentication dialog (not described here, part of the
-Kerberos specification) with the server.  If this is
-successful, the server responds with an AuthenticationOk,
-otherwise it responds with an ErrorResponse. This is no
-longer supported.
-   
-  
- 
-
  
   AuthenticationCleartextPassword
   
@@ -3318,41 +3304,6 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
 

 
-   
-AuthenticationKerberosV5 (B)
-
-
- 
-  
-   Byte1('R')
-   
-
- Identifies the message as an authentication request.
-
-   
-  
-
-  
-   Int32(8)
-   
-
- Length of message contents in bytes, including self.
-
-   
-  
-
-  
-   Int32(2)
-   
-
- Specifies that Kerberos V5 authentication is required.
-
-   
-  
- 
-
-   
-

 AuthenticationCleartextPassword (B)
 


signature.asc
Description: PGP signature


Re: Commitfest 2023-03 starting tomorrow!

2023-03-19 Thread Thomas Munro
On Sun, Mar 19, 2023 at 12:44 PM Justin Pryzby  wrote:
> On Sat, Mar 18, 2023 at 04:28:02PM -0700, Peter Geoghegan wrote:
> > On Sat, Mar 18, 2023 at 4:19 PM Justin Pryzby  wrote:
> > > The only issue with this is that cfbot has squished all the commits into
> > > one, and lost the original commit messages (if any).  I submitted
> > > patches to address that but still waiting for feedback.
> > >
> > > https://www.postgresql.org/message-id/20220623193125.gb22...@telsasoft.com
> >
> > Right. I would like to see that change. But you still need to have CF
> > tester/the CF app remember the last master branch commit that worked
> > before bitrot. And you have to provide an easy way to get that
> > information.
>
> No - the last in cfbot's repo is from the last time it successfully
> applied the patch.  You can summarily check checkout cfbot's branch to
> build (or just to git log -p it, if you dislike github's web interface).
>
> If you're curious and still wanted to know what commit it was applied
> on, it's currently the 2nd commit in "git log" (due to squishing
> all patches into one).

I realised that part of Alvaro's complaint was probably caused by
cfbot's refusal to show any useful information just because it
couldn't apply a patch the last time it tried.  A small improvement
today: now it shows a ♲ symbol (with hover text "Rebase needed") if it
doesn't currently apply, but you can still see the most recent CI test
results.  And from there you can find your way to the parent commit
ID.

The reason for the previous behaviour is that it had no memory, but I
had to give it one that so I can study flapping tests, log highlights,
statistical trends etc.  Reminds me, I also need to teach it to track
the postgres/postgres master mirror's CI results, because it's still
(rather stupidly) testing patches when master itself is failing (eg
the recent slapd commits), which ought to be easy enough to avoid
given the data...




Re: Allow logical replication to copy tables in binary format

2023-03-19 Thread Peter Smith
There are a couple of TAP tests where the copy binary is expected to
fail. And when it fails, you do binary=false (changing the format back
to 'text') so the test is then expected to be able to proceed.

I don't know if this happens in practice, but IIUC in theory, if the
timing is extremely bad, the tablesync could relaunch in binary mode
multiple times (any fail multiple times?) before your binary=false
change takes effect.

So, I was wondering if it could help to use the subscription
'disable_on_error=true' parameter for those cases so that the
tablesync won't needlessly attempt to relaunch until you have set
binary=false and then re-enabled the subscription.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Fix fseek() detection of unseekable files on WIN32

2023-03-19 Thread Michael Paquier
On Sun, Mar 19, 2023 at 08:10:10PM +0100, Juan José Santamaría Flecha wrote:
> My approach was trying to make something minimal so it could be
> backpatchable. This looks fine for HEAD, but are you planning on something
> similar for the other branches?

Yes.  This is actually not invasive down to 14 as the code is
consistent for these branches.

> Doesn't pgwin32_get_file_type() fit in dirmod.c? Might be a question of
> personal taste, I don't really have strong feelings against win32common.c.

Not sure about this one.  I have considered it and dirmod.c includes
also bits for cygwin, while being aimed for higher-level routines like
rename(), unlink() or symlink().  This patch is only for WIN32, and
aimed for common parts in win32*.c code, so a separate file seemed a
bit cleaner to me at the end.
--
Michael


signature.asc
Description: PGP signature


Re: Infinite Interval

2023-03-19 Thread Joseph Koshakow
On Sun, Mar 19, 2023 at 5:13 PM Tom Lane  wrote:
>
>Did you actually write "if TIMESTAMP_IS_NOBEGIN(dt2)" and not
>"if (TIMESTAMP_IS_NOBEGIN(dt2))"?  If the former, I'm not surprised
>that pgindent gets confused.  The parentheses are required by the
>C standard.  Your code might accidentally work because the macro
>has parentheses internally, but call sites have no business
>knowing that.  For example, it would be completely legit to change
>TIMESTAMP_IS_NOBEGIN to be a plain function, and then this would be
>syntactically incorrect.

Oh duh. I've been doing too much Rust development and did this without
thinking. I've attached a patch with a fix.

- Joe Koshakow
From d3543e7c410f83cbe3f3f3df9715025bc767fc5f Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 18 Mar 2023 13:59:34 -0400
Subject: [PATCH 3/3] Add infinite interval values

This commit adds positive and negative infinite values to the interval
data type. The entire range of intervals with INT_MAX months or INT_MIN
months are reserved for infinite values. This makes checking finiteness
much simpler.

Ashutosh Bapat and Joe Koshakow and Jian He
---
 doc/src/sgml/datatype.sgml|   2 +-
 doc/src/sgml/func.sgml|   5 +-
 src/backend/utils/adt/date.c  |  32 +
 src/backend/utils/adt/datetime.c  |   2 +
 src/backend/utils/adt/formatting.c|   2 +-
 src/backend/utils/adt/selfuncs.c  |  12 +-
 src/backend/utils/adt/timestamp.c | 679 ++
 src/include/datatype/timestamp.h  |  19 +
 src/include/utils/timestamp.h |   3 +
 src/test/regress/expected/horology.out|   6 +-
 src/test/regress/expected/interval.out| 559 --
 src/test/regress/expected/timestamp.out   |  62 ++
 src/test/regress/expected/timestamptz.out |  62 ++
 src/test/regress/sql/horology.sql |   6 +-
 src/test/regress/sql/interval.sql | 170 +-
 src/test/regress/sql/timestamp.sql|  19 +
 src/test/regress/sql/timestamptz.sql  |  18 +
 17 files changed, 1454 insertions(+), 204 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index faf0d74104..694af4000d 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2321,7 +2321,7 @@ TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
  
  
   infinity
-  date, timestamp
+  date, timestamp, interval
   later than all other time stamps
  
  
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a3a13b895f..33fa3e6670 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9472,7 +9472,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
  boolean
 
 
- Test for finite interval (currently always true)
+ Test for finite interval (not +/-infinity)
 
 
  isfinite(interval '4 hours')
@@ -10369,7 +10369,8 @@ SELECT EXTRACT(YEAR FROM TIMESTAMP '2001-02-16 20:38:40');
  When the input value is +/-Infinity, extract returns
  +/-Infinity for monotonically-increasing fields (epoch,
  julian, year, isoyear,
- decade, century, and millennium).
+ decade, century, and millennium
+ for all types and hour and day just for interval).
  For other fields, NULL is returned.  PostgreSQL
  versions before 9.6 returned zero for all cases of infinite input.
 
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index a163fbb4ab..5b4ba76eed 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -2023,6 +2023,11 @@ interval_time(PG_FUNCTION_ARGS)
 	TimeADT		result;
 	int64		days;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("time out of range")));
+
 	result = span->time;
 	if (result >= USECS_PER_DAY)
 	{
@@ -2067,6 +2072,11 @@ time_pl_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeADT		result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot add infinite interval to time")));
+
 	result = time + span->time;
 	result -= result / USECS_PER_DAY * USECS_PER_DAY;
 	if (result < INT64CONST(0))
@@ -2085,6 +2095,11 @@ time_mi_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeADT		result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("cannot subtract infinite interval from time")));
+
 	result = time - span->time;
 	result -= result / USECS_PER_DAY * USECS_PER_DAY;
 	if (result < INT64CONST(0))
@@ -2599,6 +2614,11 @@ timetz_pl_interval(PG_FUNCTION_ARGS)
 	Interval   *span = PG_GETARG_INTERVAL_P(1);
 	TimeTzADT  *result;
 
+	if (INTERVAL_NOT_FINITE(span))
+		ereport(ERROR,
+

Re: Infinite Interval

2023-03-19 Thread Tom Lane
Joseph Koshakow  writes:
> I must have been doing something wrong because I tried again today and
> it worked fine. However, I go get a lot of changes like the following:

>   -   if TIMESTAMP_IS_NOBEGIN(dt2)
>   -   ereport(ERROR,
>   -
> (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
>   -errmsg("timestamp out of
> range")));
>   +   if TIMESTAMP_IS_NOBEGIN
>   +   (dt2)
>   +   ereport(ERROR,
>   +
> (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
>   +errmsg("timestamp out of
> range")));

> Should I keep these pgindent changes or keep it the way I have it?

Did you actually write "if TIMESTAMP_IS_NOBEGIN(dt2)" and not
"if (TIMESTAMP_IS_NOBEGIN(dt2))"?  If the former, I'm not surprised
that pgindent gets confused.  The parentheses are required by the
C standard.  Your code might accidentally work because the macro
has parentheses internally, but call sites have no business
knowing that.  For example, it would be completely legit to change
TIMESTAMP_IS_NOBEGIN to be a plain function, and then this would be
syntactically incorrect.

regards, tom lane




Re: Commitfest 2023-03 starting tomorrow!

2023-03-19 Thread Tom Lane
Peter Eisentraut  writes:
> On 17.03.23 15:38, Tom Lane wrote:
>>> Simplify find_my_exec by using realpath(3)
>> The problem with this one is that Peter would like it to do something
>> other than what I think it should do.  Not sure how to resolve that.

> I have no objection to changing the internal coding of the current behavior.

Oh ... where the thread trailed off [1] was you not answering whether
you'd accept a compromise behavior.  If it's okay to stick with the
behavior we have, then I'll just do the original patch (modulo Munro's
observations about _fullpath's error reporting).

regards, tom lane

[1] https://www.postgresql.org/message-id/2319396.1664978360%40sss.pgh.pa.us




Re: Infinite Interval

2023-03-19 Thread Joseph Koshakow
On Sat, Mar 18, 2023 at 3:55 PM Tom Lane  wrote:
>
>Joseph Koshakow  writes:
>> On Sat, Mar 18, 2023 at 3:08 PM Tom Lane  wrote:
>>> More specifically, those are from running pg_indent with an obsolete
>>> typedefs list.
>
>> I must be doing something wrong because even after doing that I get
the
>> same strange formatting. Specifically from the root directory I ran
>
>Hmm, I dunno what's going on there.  When I do this:
>
>>   curl https://buildfarm.postgresql.org/cgi-bin/typedefs.pl -o
>> src/tools/pgindent/typedefs.list
>
>I end up with a plausible set of updates, notably
>
>$ git diff
>diff --git a/src/tools/pgindent/typedefs.list
b/src/tools/pgindent/typedefs.list
>index 097f42e1b3..667f8e13ed 100644
>--- a/src/tools/pgindent/typedefs.list
>+++ b/src/tools/pgindent/typedefs.list
>...
>@@ -545,10 +548,12 @@ DataDumperPtr
> DataPageDeleteStack
> DatabaseInfo
> DateADT
>+DateTimeErrorExtra
> Datum
> DatumTupleFields
> DbInfo
> DbInfoArr
>+DbLocaleInfo
> DeClonePtrType
> DeadLockState
> DeallocateStmt
>
>so it sure ought to know DateTimeErrorExtra is a typedef.
>I then tried pgindent'ing datetime.c and timestamp.c,
>and it did not want to change either file.  I do get
>diffs like

> DecodeDateTime(char **field, int *ftype, int nf,
>   int *dtype, struct pg_tm *tm, fsec_t *fsec, int *tzp,
>-  DateTimeErrorExtra *extra)
>+  DateTimeErrorExtra * extra)
> {
>int fmask = 0,
>
>if I try to pgindent datetime.c with typedefs.list as it
>stands in HEAD.  That's pretty much pgindent's normal
>behavior when it doesn't recognize a name as a typedef.

I must have been doing something wrong because I tried again today and
it worked fine. However, I go get a lot of changes like the following:

  -   if TIMESTAMP_IS_NOBEGIN(dt2)
  -   ereport(ERROR,
  -
(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
  -errmsg("timestamp out of
range")));
  +   if TIMESTAMP_IS_NOBEGIN
  +   (dt2)
  +   ereport(ERROR,
  +
(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
  +errmsg("timestamp out of
range")));

Should I keep these pgindent changes or keep it the way I have it?

- Joe Koshakow


Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2023-03-19 Thread stephane tachoires
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, failed

Feature is clearly missing with partition handling in PostgreSQL, so, this 
patch is very welcome (as are futur steps)
Code presents good, comments are explicit
Patch v14 apply nicely on 4f46f870fa56fa73d6678273f1bd059fdd93d5e6
Compilation ok with meson compile
LCOV after meson test shows good new code coverage.
Documentation is missing in v14.

Re: proposal: possibility to read dumped table's name from file

2023-03-19 Thread Pavel Stehule
ne 19. 3. 2023 v 15:01 odesílatel Justin Pryzby 
napsal:

> On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> > rebase + enhancing about related option from a563c24
>
> Thanks.
>
> It looks like this doesn't currently handle extensions, which were added
> at 6568cef26e.
>
> > +   table_and_children: tables, works like
> > +   -t/--table, except that
> > +   it also includes any partitions or inheritance child
> > +   tables of the table(s) matching the
> > +   pattern.
>
> Why doesn't this just say "works like --table-and-children" ?
>
> I think as you wrote log_invalid_filter_format(), the messages wouldn't
> be translated, because they're printed via %s.  One option is to call
> _() on the message.
>
> > +ok($dump !=~ qr/^CREATE TABLE public\.bootab/m,   "exclude dumped
> children table");
>
> !=~ is being interpretted as as numeric "!=" and throwing warnings.
> It should be a !~ b, right ?
> It'd be nice if perl warnings during the tests were less easy to miss.
>
> > + * char is not alpha. The char '_' is allowed too (exclude first
> position).
>
> Why is it treated specially?  Could it be treated the same as alpha?
>
> > + log_invalid_filter_format(,
> > +
>"\"include\" table data filter is not allowed");
> > + log_invalid_filter_format(,
> > +
>"\"include\" table data and children filter is not allowed");
>
> For these, it might be better to write the literal option:
>
> > +
>"include filter for \"table_data_and_children\" is not allowed");
>
> Because the option is a literal and shouldn't be translated.
> And it's probably better to write that using %s, like:
>
> > +
>"include filter for \"%s\" is not allowed");
>
> That makes shorter and fewer strings.
>
> Find attached a bunch of other corrections as 0002.txt
>

Thank you very much - I'll recheck the mentioned points tomorrow.


>
> I also dug up what I'd started in november, trying to reduce the code
> duplication betwen pg_restore/dump/all.  This isn't done, but I might
> never finish it, so I'll at least show what I have in case you think
> it's a good idea.  This passes tests on CI, except for autoconf, due to
> using exit_nicely() differently.
>

Your implementation reduced 60 lines, but the interface and code is more
complex. I cannot say what is significantly better. Personally, in this
case, I prefer my variant, because I think it is a little bit more
readable, and possible modification can be more simple. But this is just my
opinion, and I have no problem accepting other opinions. I can imagine to
define some configuration array like getopt, but it looks like over
engineering

Regards

Pavel


> --
> Justin
>


Re: Memory leak from ExecutorState context?

2023-03-19 Thread Justin Pryzby
On Fri, Mar 17, 2023 at 05:41:11PM +0100, Tomas Vondra wrote:
> >> * Patch 2 is worth considering to backpatch
> 
> I'm not quite sure what exactly are the numbered patches, as some of the
> threads had a number of different patch ideas, and I'm not sure which
> one was/is the most promising one.

patch 2 is referring to the list of patches that was compiled
https://www.postgresql.org/message-id/20230310195114.6d0c5406%40karst




Re: Fix fseek() detection of unseekable files on WIN32

2023-03-19 Thread Juan José Santamaría Flecha
On Sun, Mar 19, 2023 at 12:45 PM Michael Paquier 
wrote:

>
> In short, I was thinking among the lines of something like the
> attached, where I have invented a pgwin32_get_file_type() that acts as
> a wrapper of GetFileType() in a new file called win32common.c, with
> all the error handling we would use between fstat(), fseeko() and
> ftello() centralized in a single code path.
>
> The refactoring with win32common.c had better be separated into its
> own patch, at the end, if using an approach like that.
>

My approach was trying to make something minimal so it could be
backpatchable. This looks fine for HEAD, but are you planning on something
similar for the other branches?

Doesn't pgwin32_get_file_type() fit in dirmod.c? Might be a question of
personal taste, I don't really have strong feelings against win32common.c.

Regards,

Juan José Santamaría Flecha


Re: Extending outfuncs support to utility statements

2023-03-19 Thread Tom Lane
Alexander Lakhin  writes:
> Please look at the function _readA_Const() (introduced in a6bc33019), which 
> fails on current master under valgrind:
> ...
> Here _readA_Const() performs:
>      union ValUnion *tmp = nodeRead(NULL, 0);

>      memcpy(_node->val, tmp, sizeof(*tmp));

> where sizeof(union ValUnion) = 16, but nodeRead()->makeInteger() produced 
> Integer (sizeof(Integer) = 8).

Right, so we can't get away without a switch-on-value-type like the
other functions for A_Const have.  Will fix.

regards, tom lane




Re: Should vacuum process config file reload more often

2023-03-19 Thread Melanie Plageman
On Sat, Mar 18, 2023 at 6:47 PM Melanie Plageman
 wrote:
> On Wed, Mar 15, 2023 at 1:14 AM Masahiko Sawada  wrote:
> > On Sat, Mar 11, 2023 at 8:11 AM Melanie Plageman
> >  wrote:
> > > > > Also not sure how the patch interacts with failsafe autovac and 
> > > > > parallel
> > > > > vacuum.
> > > >
> > > > Good point.
> > > >
> > > > When entering the failsafe mode, we disable the vacuum delays (see
> > > > lazy_check_wraparound_failsafe()). We need to keep disabling the
> > > > vacuum delays even after reloading the config file. One idea is to
> > > > have another global variable indicating we're in the failsafe mode.
> > > > vacuum_delay_point() doesn't update VacuumCostActive if the flag is
> > > > true.
> > >
> > > I think we might not need to do this. Other than in
> > > lazy_check_wraparound_failsafe(), VacuumCostActive is only updated in
> > > two places:
> > >
> > > 1) in vacuum() which autovacuum will call per table. And failsafe is
> > > reset per table as well.
> > >
> > > 2) in vacuum_delay_point(), but, since VacuumCostActive will already be
> > > false when we enter vacuum_delay_point() the next time after
> > > lazy_check_wraparound_failsafe(), we won't set VacuumCostActive there.
> >
> > Indeed. But does it mean that there is no code path to turn
> > vacuum-delay on, even when vacuum_cost_delay is updated from 0 to
> > non-0?
>
> Ah yes! Good point. This is true.
> I'm not sure how to cheaply allow for re-enabling delays after disabling
> them in the middle of a table vacuum.
>
> I don't see a way around checking if we need to reload the config file
> on every call to vacuum_delay_point() (currently, we are only doing this
> when we have to wait anyway). It seems expensive to do this check every
> time. If we do do this, we would update VacuumCostActive when updating
> VacuumCostDelay, and we would need a global variable keeping the
> failsafe status, as you mentioned.
>
> It could be okay to say that you can only disable cost-based delays in
> the middle of vacuuming a table (i.e. you cannot enable them if they are
> already disabled until you start vacuuming the next table). Though maybe
> it is weird that you can increase the delay but not re-enable it...

So, I thought about it some more, and I think it is a bit odd that you
can increase the delay and limit but not re-enable them if they were
disabled. And, perhaps it would be okay to check ConfigReloadPending at
the top of vacuum_delay_point() instead of only after sleeping. It is
just one more branch. We can check if VacuumCostActive is false after
checking if we should reload and doing so if needed and return early.
I've implemented that in attached v6.

I added in the global we discussed for VacuumFailsafeActive. If we keep
it, we can probably remove the one in LVRelState -- as it seems
redundant. Let me know what you think.

- Melanie
From 1218c1852794a1310d25359a37b87d068282500e Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 18 Mar 2023 15:42:39 -0400
Subject: [PATCH v6] [auto]vacuum reloads config file more often

Previously, VACUUM and autovacuum workers would reload the configuration
file only between vacuuming tables. This precluded user updates to
cost-based delay parameters from taking effect while vacuuming a table.

Check if a reload is pending roughly once per block now, when checking
if we need to delay.

In order for this change to have the intended effect on autovacuum,
autovacuum workers must start updating their cost delay more frequently
as well.

With this new paradigm, balancing the cost limit amongst workers also
must work differently. Previously, a worker's wi_cost_limit was set only
at the beginning of vacuuming a table, after reloading the config file.
Therefore, at the time that autovac_balance_cost() is called, workers
vacuuming tables with no table options could still have different values
for their wi_cost_limit_base and wi_cost_delay. With this change,
workers will (within some margin of error) have no reason to have
different values for cost limit and cost delay (in the absence of table
options). Thus, remove cost limit and cost delay from shared memory and
keep track only of the number of workers actively vacuuming tables with
no cost-related table options. Then, use this value to determine what
each worker's effective cost limit should be.

Reviewed-by: Masahiko Sawada 
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_buP5wzsho3qNw5o9_R0pF69FRM5hgCmr-mvXmGXwdA7A%40mail.gmail.com#5e6771d4cdca4db6efc2acec2dce0bc7
---
 src/backend/access/heap/vacuumlazy.c  |   1 +
 src/backend/commands/vacuum.c |  55 +--
 src/backend/commands/vacuumparallel.c |   1 +
 src/backend/postmaster/autovacuum.c   | 209 +++---
 src/backend/utils/init/globals.c  |   1 +
 src/include/miscadmin.h   |   1 +
 src/include/postmaster/autovacuum.h   |   2 +
 7 files changed, 142 insertions(+), 128 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c 

Re: buildfarm + meson

2023-03-19 Thread Andrew Dunstan


On 2023-03-18 Sa 21:32, Andrew Dunstan wrote:



On 2023-03-18 Sa 19:00, Andres Freund wrote:

Hi,

On 2023-03-18 17:53:38 -0400, Andrew Dunstan wrote:

On 2023-03-11 Sa 16:25, Andres Freund wrote:

Hi,

On 2023-03-09 18:31:10 -0500, Andrew Dunstan wrote:

Another thing: the test for uuid.h is too strict. On Fedora 36 the OSSP
header is in /usr/include, not /usr/include/ossp (I got around that for now
by symlinking it, but obviously that's a nasty hack we can't ask people to
do)

Yea, that was just wrong. It happened to work on debian and a few other OSs,
but ossp's .pc puts whatever the right directory is into the include
path. Pushed the fairly obvious fix.

Another issue: building plpython appears impossible on Windows because it's
finding meson's own python:


Program python3 found: YES (C:\Program Files\Meson\meson.exe runpython)
Could not find Python3 library 'C:\\Program
Files\\Meson\\libs\\python311.lib'

Any more details - windows CI builds with python. What python do you want to
use and where is it installed?



It's in c:/python37, which is at the front of the PATH. It fails as 
above if I add -Dplpython=enabled to the config.




Looks like the answer is not to install using the MSI installer, which 
provides its own Python, but to install meson and ninja into an existing 
python installation via pip. That's a bit sad, but manageable.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Commitfest 2023-03 starting tomorrow!

2023-03-19 Thread Peter Eisentraut

On 17.03.23 15:38, Tom Lane wrote:

  Simplify find_my_exec by using realpath(3)

The problem with this one is that Peter would like it to do something
other than what I think it should do.  Not sure how to resolve that.


I have no objection to changing the internal coding of the current behavior.




Re: proposal: possibility to read dumped table's name from file

2023-03-19 Thread Justin Pryzby
On Thu, Mar 16, 2023 at 01:05:41PM +0100, Pavel Stehule wrote:
> rebase + enhancing about related option from a563c24

Thanks.

It looks like this doesn't currently handle extensions, which were added
at 6568cef26e.

> +   table_and_children: tables, works like
> +   -t/--table, except that
> +   it also includes any partitions or inheritance child
> +   tables of the table(s) matching the
> +   pattern.

Why doesn't this just say "works like --table-and-children" ?

I think as you wrote log_invalid_filter_format(), the messages wouldn't
be translated, because they're printed via %s.  One option is to call
_() on the message.

> +ok($dump !=~ qr/^CREATE TABLE public\.bootab/m,   "exclude dumped children 
> table");

!=~ is being interpretted as as numeric "!=" and throwing warnings.
It should be a !~ b, right ?
It'd be nice if perl warnings during the tests were less easy to miss.

> + * char is not alpha. The char '_' is allowed too (exclude first position).

Why is it treated specially?  Could it be treated the same as alpha?

> + log_invalid_filter_format(,
> + 
>   "\"include\" table data filter is not allowed");
> + log_invalid_filter_format(,
> + 
>   "\"include\" table data and children filter is not allowed");

For these, it might be better to write the literal option:

> + 
>   "include filter for \"table_data_and_children\" is not allowed");

Because the option is a literal and shouldn't be translated.
And it's probably better to write that using %s, like:

> + 
>   "include filter for \"%s\" is not allowed");

That makes shorter and fewer strings.

Find attached a bunch of other corrections as 0002.txt

I also dug up what I'd started in november, trying to reduce the code
duplication betwen pg_restore/dump/all.  This isn't done, but I might
never finish it, so I'll at least show what I have in case you think
it's a good idea.  This passes tests on CI, except for autoconf, due to
using exit_nicely() differently.

-- 
Justin
>From e86d4dca9b4754185a1d559fdea7bf4ee2de8b1a Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 16 Mar 2023 08:18:08 +0100
Subject: [PATCH 1/3] possibility to read options for dump from file

---
 doc/src/sgml/ref/pg_dump.sgml   | 110 +++
 doc/src/sgml/ref/pg_dumpall.sgml|  22 +
 doc/src/sgml/ref/pg_restore.sgml|  25 +
 src/bin/pg_dump/Makefile|   5 +-
 src/bin/pg_dump/filter.c| 489 ++
 src/bin/pg_dump/filter.h|  56 ++
 src/bin/pg_dump/meson.build |   2 +
 src/bin/pg_dump/pg_dump.c   | 117 
 src/bin/pg_dump/pg_dumpall.c|  61 +-
 src/bin/pg_dump/pg_restore.c| 114 
 src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 701 
 src/tools/msvc/Mkvcbuild.pm |   1 +
 12 files changed, 1700 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_dump/filter.c
 create mode 100644 src/bin/pg_dump/filter.h
 create mode 100644 src/bin/pg_dump/t/005_pg_dump_filterfile.pl

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d6b1faa8042..5672fbb273b 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -829,6 +829,102 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table,
+--table-and-children,
+--exclude-table-and-children or
+-T for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data,
+--exclude-table-data-and-and-children for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | table_and_children | schema | foreign_data | table_data | table_data_and_children } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered 

Re: pg_stat_statements and "IN" conditions

2023-03-19 Thread Dmitry Dolgov
> On Tue, Mar 14, 2023 at 08:04:32PM +0100, Dmitry Dolgov wrote:
> > On Tue, Mar 14, 2023 at 02:14:17PM -0400, Gregory Stark (as CFM) wrote:
> > So I was seeing that this patch needs a rebase according to cfbot.
>
> Yeah, folks are getting up to speed in with pgss improvements recently.
> Thanks for letting me know.

Following recent refactoring of pg_stat_statements tests, I've created a
new one for merging functionality in the patch. This should solve any
conflicts.
>From 4f8ad43c1f0547913e5da8fa97fe08bccf06c91d Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 17 Feb 2023 10:17:55 +0100
Subject: [PATCH v14 1/2] Reusable decimalLength functions

Move out decimalLength functions to reuse in the following patch.
---
 src/backend/utils/adt/numutils.c | 50 +---
 src/include/utils/numutils.h | 67 
 2 files changed, 68 insertions(+), 49 deletions(-)
 create mode 100644 src/include/utils/numutils.h

diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 471fbb7ee6..df7418cce7 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -18,9 +18,8 @@
 #include 
 #include 
 
-#include "common/int.h"
 #include "utils/builtins.h"
-#include "port/pg_bitutils.h"
+#include "utils/numutils.h"
 
 /*
  * A table of all two-digit numbers. This is used to speed up decimal digit
@@ -38,53 +37,6 @@ static const char DIGIT_TABLE[200] =
 "80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
 "90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
 
-/*
- * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
- */
-static inline int
-decimalLength32(const uint32 v)
-{
-	int			t;
-	static const uint32 PowersOfTen[] = {
-		1, 10, 100,
-		1000, 1, 10,
-		100, 1000, 1,
-		10
-	};
-
-	/*
-	 * Compute base-10 logarithm by dividing the base-2 logarithm by a
-	 * good-enough approximation of the base-2 logarithm of 10
-	 */
-	t = (pg_leftmost_one_pos32(v) + 1) * 1233 / 4096;
-	return t + (v >= PowersOfTen[t]);
-}
-
-static inline int
-decimalLength64(const uint64 v)
-{
-	int			t;
-	static const uint64 PowersOfTen[] = {
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000),
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000),
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000),
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000),
-		UINT64CONST(1), UINT64CONST(10),
-		UINT64CONST(100), UINT64CONST(1000)
-	};
-
-	/*
-	 * Compute base-10 logarithm by dividing the base-2 logarithm by a
-	 * good-enough approximation of the base-2 logarithm of 10
-	 */
-	t = (pg_leftmost_one_pos64(v) + 1) * 1233 / 4096;
-	return t + (v >= PowersOfTen[t]);
-}
-
 static const int8 hexlookup[128] = {
 	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
 	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
diff --git a/src/include/utils/numutils.h b/src/include/utils/numutils.h
new file mode 100644
index 00..876e64f2df
--- /dev/null
+++ b/src/include/utils/numutils.h
@@ -0,0 +1,67 @@
+/*-
+ *
+ * numutils.h
+ *	  Decimal length functions for numutils.c
+ *
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/numutils.h
+ *
+ *-
+ */
+#ifndef NUMUTILS_H
+#define NUMUTILS_H
+
+#include "common/int.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10
+ */
+static inline int
+decimalLength32(const uint32 v)
+{
+	int			t;
+	static const uint32 PowersOfTen[] = {
+		1, 10, 100,
+		1000, 1, 10,
+		100, 1000, 1,
+		10
+	};
+
+	/*
+	 * Compute base-10 logarithm by dividing the base-2 logarithm by a
+	 * good-enough approximation of the base-2 logarithm of 10
+	 */
+	t = (pg_leftmost_one_pos32(v) + 1) * 1233 / 4096;
+	return t + (v >= PowersOfTen[t]);
+}
+
+static inline int
+decimalLength64(const uint64 v)
+{
+	int			t;
+	static const uint64 PowersOfTen[] = {
+		UINT64CONST(1), UINT64CONST(10),
+		UINT64CONST(100), UINT64CONST(1000),
+		UINT64CONST(1), UINT64CONST(10),
+		UINT64CONST(100), UINT64CONST(1000),
+		UINT64CONST(1), UINT64CONST(10),
+		UINT64CONST(100), UINT64CONST(1000),
+		UINT64CONST(1), UINT64CONST(10),
+		UINT64CONST(100), UINT64CONST(1000),
+		UINT64CONST(1), UINT64CONST(10),
+		

Re: Extending outfuncs support to utility statements

2023-03-19 Thread Alexander Lakhin

Hello,

26.09.2022 17:46, Peter Eisentraut wrote:

On 22.09.22 23:21, Tom Lane wrote:

Anyway, this is a bit far afield from the stated topic of this
thread.  I think we should commit something approximately like
what I posted and then start a new thread specifically about
what we'd like to do about utility commands in new-style SQL
functions.


Right, I have committed everything and will close the CF entry.  I don't have a specific idea about how to move 
forward right now.


Please look at the function _readA_Const() (introduced in a6bc33019), which 
fails on current master under valgrind:
CPPFLAGS="-DUSE_VALGRIND -DWRITE_READ_PARSE_PLAN_TREES -Og " ./configure -q --enable-debug 
&& make -s -j8 && make check

== creating temporary instance    ==
== initializing database system   ==

pg_regress: initdb failed
Examine .../src/test/regress/log/initdb.log for the reason.

initdb.log contains:
performing post-bootstrap initialization ... ==00:00:00:02.155 3419654== 
Invalid read of size 16
==00:00:00:02.155 3419654==    at 0x448691: memcpy (string_fortified.h:29)
==00:00:00:02.155 3419654==    by 0x448691: _readA_Const (readfuncs.c:315)
==00:00:00:02.155 3419654==    by 0x44CCD2: parseNodeString 
(readfuncs.switch.c:129)
==00:00:00:02.155 3419654==    by 0x4348D6: nodeRead (read.c:338)
==00:00:00:02.155 3419654==    by 0x434879: nodeRead (read.c:452)
==00:00:00:02.155 3419654==    by 0x440E6C: _readTypeName 
(readfuncs.funcs.c:830)
==00:00:00:02.155 3419654==    by 0x44CC3A: parseNodeString 
(readfuncs.switch.c:121)
==00:00:00:02.155 3419654==    by 0x4348D6: nodeRead (read.c:338)
==00:00:00:02.155 3419654==    by 0x43D51D: _readFunctionParameter 
(readfuncs.funcs.c:2513)
==00:00:00:02.155 3419654==    by 0x44DE0C: parseNodeString 
(readfuncs.switch.c:367)
==00:00:00:02.155 3419654==    by 0x4348D6: nodeRead (read.c:338)
==00:00:00:02.155 3419654==    by 0x434879: nodeRead (read.c:452)
==00:00:00:02.155 3419654==    by 0x438A9C: _readCreateFunctionStmt 
(readfuncs.funcs.c:2499)
==00:00:00:02.155 3419654==  Address 0xf12f718 is 0 bytes inside a block of 
size 8 client-defined
==00:00:00:02.155 3419654==    at 0x6A70C3: MemoryContextAllocZeroAligned 
(mcxt.c:1109)
==00:00:00:02.155 3419654==    by 0x450C31: makeInteger (value.c:25)
==00:00:00:02.155 3419654==    by 0x434D59: nodeRead (read.c:482)
==00:00:00:02.155 3419654==    by 0x448690: _readA_Const (readfuncs.c:313)
==00:00:00:02.155 3419654==    by 0x44CCD2: parseNodeString 
(readfuncs.switch.c:129)
==00:00:00:02.155 3419654==    by 0x4348D6: nodeRead (read.c:338)
==00:00:00:02.155 3419654==    by 0x434879: nodeRead (read.c:452)
==00:00:00:02.155 3419654==    by 0x440E6C: _readTypeName 
(readfuncs.funcs.c:830)
==00:00:00:02.155 3419654==    by 0x44CC3A: parseNodeString 
(readfuncs.switch.c:121)
==00:00:00:02.155 3419654==    by 0x4348D6: nodeRead (read.c:338)
==00:00:00:02.155 3419654==    by 0x43D51D: _readFunctionParameter 
(readfuncs.funcs.c:2513)
==00:00:00:02.155 3419654==    by 0x44DE0C: parseNodeString 
(readfuncs.switch.c:367)
==00:00:00:02.155 3419654==

Here _readA_Const() performs:
    union ValUnion *tmp = nodeRead(NULL, 0);

    memcpy(_node->val, tmp, sizeof(*tmp));

where sizeof(union ValUnion) = 16, but nodeRead()->makeInteger() produced 
Integer (sizeof(Integer) = 8).

Best regards,
Alexander

Re: Fix fseek() detection of unseekable files on WIN32

2023-03-19 Thread Michael Paquier
On Sun, Mar 19, 2023 at 08:20:27PM +0900, Michael Paquier wrote:
> I am not sure, TBH.  As presented, the two GetFileType() calls in
> _pgftello64() and _pgfseeko64() ignore the case where it returns
> FILE_TYPE_UNKNOWN and GetLastError() has something else than
> NO_ERROR.  The code would return EINVAL for all the errors happening.
> Perhaps that's fine, though I am wondering if we should report
> something more exact, based on _dosmaperr(GetLastError())?

In short, I was thinking among the lines of something like the
attached, where I have invented a pgwin32_get_file_type() that acts as
a wrapper of GetFileType() in a new file called win32common.c, with
all the error handling we would use between fstat(), fseeko() and
ftello() centralized in a single code path.

The refactoring with win32common.c had better be separated into its
own patch, at the end, if using an approach like that.
--
Michael
From 3e778828a58751a0b562908176dcb76cabcc7945 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sun, 19 Mar 2023 20:40:22 +0900
Subject: [PATCH v3] fix fseek detection of unseekable files for WIN32

Calling fseek() on a handle to a non-seeking device such as a pipe or
a communications device is not supported, even though the fseek() may
not return an error, so harden that funcion with our version.
---
 src/include/port/win32_port.h | 12 --
 src/port/meson.build  |  2 +
 src/port/win32common.c| 68 
 src/port/win32fseek.c | 74 +++
 src/port/win32stat.c  | 22 ++-
 configure |  6 +++
 configure.ac  |  1 +
 src/tools/msvc/Mkvcbuild.pm   |  2 +
 8 files changed, 165 insertions(+), 22 deletions(-)
 create mode 100644 src/port/win32common.c
 create mode 100644 src/port/win32fseek.c

diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 5116c2fc06..19206ce922 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -204,15 +204,21 @@ struct itimerval
 
 int			setitimer(int which, const struct itimerval *value, struct itimerval *ovalue);
 
+/* Convenience wrapper for GetFileType() */
+extern DWORD pgwin32_get_file_type(HANDLE hFile);
+
 /*
  * WIN32 does not provide 64-bit off_t, but does provide the functions operating
- * with 64-bit offsets.
+ * with 64-bit offsets. Also, fseek() might not give an error for unseekable
+ * streams, so harden that function with our version.
  */
 #define pgoff_t __int64
 
 #ifdef _MSC_VER
-#define fseeko(stream, offset, origin) _fseeki64(stream, offset, origin)
-#define ftello(stream) _ftelli64(stream)
+extern int  	_pgfseeko64(FILE *stream, pgoff_t offset, int origin);
+extern pgoff_t	_pgftello64(FILE *stream);
+#define fseeko(stream, offset, origin) _pgfseeko64(stream, offset, origin)
+#define ftello(stream) _pgftello64(stream)
 #else
 #ifndef fseeko
 #define fseeko(stream, offset, origin) fseeko64(stream, offset, origin)
diff --git a/src/port/meson.build b/src/port/meson.build
index b174b25021..24416b9bfc 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -29,10 +29,12 @@ if host_system == 'windows'
 'kill.c',
 'open.c',
 'system.c',
+'win32common.c',
 'win32dlopen.c',
 'win32env.c',
 'win32error.c',
 'win32fdatasync.c',
+'win32fseek.c',
 'win32getrusage.c',
 'win32link.c',
 'win32ntdll.c',
diff --git a/src/port/win32common.c b/src/port/win32common.c
new file mode 100644
index 00..2fd78f7f93
--- /dev/null
+++ b/src/port/win32common.c
@@ -0,0 +1,68 @@
+/*-
+ *
+ * win32common.c
+ *	  Common routines shared among the win32*.c ports.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/port/win32common.c
+ *
+ *-
+ */
+
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#endif
+
+#ifdef WIN32
+
+/*
+ * pgwin32_get_file_type
+ *
+ * Convenience wrapper for GetFileType() with specific error handling for all the
+ * port implementations.  Returns the file type associated with a HANDLE.
+ *
+ * On error, sets errno with FILE_TYPE_UNKNOWN as file type.
+ */
+DWORD
+pgwin32_get_file_type(HANDLE hFile)
+{
+	DWORD		fileType = FILE_TYPE_UNKNOWN;
+	DWORD		lastError;
+
+	errno = 0;
+
+	/*
+	 * When stdin, stdout, and stderr aren't associated with a stream the
+	 * special value -2 is returned:
+	 * https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/get-osfhandle
+	 */
+	if (hFile == INVALID_HANDLE_VALUE || hFile == (HANDLE) -2)
+	{
+		errno = EINVAL;
+		return FILE_TYPE_UNKNOWN;
+	}
+
+	fileType = GetFileType(hFile);
+	lastError = GetLastError();
+
+	/*
+	 * Invoke GetLastError in order to distinguish between a "valid" return 

server log inflates due to pg_logical_slot_peek_changes/pg_logical_slot_get_changes calls

2023-03-19 Thread Jeevan Ladhe
Hi,

I observed absurd behaviour while using pg_logical_slot_peek_changes()
and pg_logical_slot_get_changes(). Whenever any of these two functions
are called to read the changes using a decoder plugin, the following
messages are printed in the log for every single such call.

2023-03-19 16:36:06.040 IST [30099] LOG:  starting logical decoding for
slot "test_slot1"
2023-03-19 16:36:06.040 IST [30099] DETAIL:  Streaming transactions
committing after 0/851DFD8, reading WAL from 0/851DFA0.
2023-03-19 16:36:06.040 IST [30099] STATEMENT:  SELECT data FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL, 'format-version',
'2');
2023-03-19 16:36:06.040 IST [30099] LOG:  logical decoding found consistent
point at 0/851DFA0
2023-03-19 16:36:06.040 IST [30099] DETAIL:  There are no running
transactions.
2023-03-19 16:36:06.040 IST [30099] STATEMENT:  SELECT data FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL, 'format-version',
'2');

This log is printed on every single call to peek/get functions and bloats
the server log file by a huge amount when called in the loop for reading
the changes.

IMHO, printing the message every time we create the context for
decoding a slot using pg_logical_slot_get_changes() seems over-burn.
Wondering if instead of LOG messages, should we mark these as
DEBUG1 in SnapBuildFindSnapshot() and CreateDecodingContext()
respectively? I can produce a patch for the same if we agree.

Regards,
Jeevan Ladhe


Re: Fix fseek() detection of unseekable files on WIN32

2023-03-19 Thread Michael Paquier
On Thu, Mar 16, 2023 at 10:08:44AM +0100, Juan José Santamaría Flecha wrote:
> IDK, this is just looking for the good case, anything else we'll fail with
> ESPIPE or EINVAL anyway. If we want to get the proper file type we can call
> fstat(), which has the full logic.

I am not sure, TBH.  As presented, the two GetFileType() calls in
_pgftello64() and _pgfseeko64() ignore the case where it returns
FILE_TYPE_UNKNOWN and GetLastError() has something else than
NO_ERROR.  The code would return EINVAL for all the errors happening.
Perhaps that's fine, though I am wondering if we should report
something more exact, based on _dosmaperr(GetLastError())?
--
Michael


signature.asc
Description: PGP signature


Replace known_assigned_xids_lck by memory barrier

2023-03-19 Thread Michail Nikolaev
Hello everyone and Tom.

Tom, this is about your idea (1) from 2010 to replace spinlock with a
memory barrier in a known assignment xids machinery.

It was mentioned by you again in (2) and in (3) we have decided to
extract this change into a separate commitfest entry.

So, creating it here with a rebased version of (4).

In a nutshell: KnownAssignedXids as well as the head/tail pointers are
modified only by the startup process, so spinlock is used to ensure
that updates of the array and head/tail pointers are seen in a correct
order. It is enough to pass the barrier after writing to the array
(but before updating the pointers) to achieve the same result.

Best regards.

[1]: 
https://github.com/postgres/postgres/commit/2871b4618af1acc85665eec0912c48f8341504c4#diff-8879f0173be303070ab7931db7c757c96796d84402640b9e386a4150ed97b179R2408-R2412

[2]: 
https://www.postgresql.org/message-id/flat/1249332.1668553589%40sss.pgh.pa.us#19d00eb435340f5c5455e3bf259eccc8

[3]: 
https://www.postgresql.org/message-id/flat/1225350.1669757944%40sss.pgh.pa.us#23ca1956e694910fd7795a514a3bc79f

[4]: 
https://www.postgresql.org/message-id/flat/CANtu0oiPoSdQsjRd6Red5WMHi1E83d2%2B-bM9J6dtWR3c5Tap9g%40mail.gmail.com#cc4827dee902978f93278732435e8521
From d968645551412abdb3fac6b8514c3d6746e8ac3d Mon Sep 17 00:00:00 2001
From: Michail Nikolaev 
Date: Sat, 18 Mar 2023 21:28:00 +0300
Subject: [PATCH v2] Removes known_assigned_xids_lck, using the write memory
 barrier as suggested earlier.

---
 src/backend/storage/ipc/procarray.c | 41 +++--
 1 file changed, 9 insertions(+), 32 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ea91ce355f..95e2672f9a 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -61,7 +61,6 @@
 #include "port/pg_lfind.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
-#include "storage/spin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
@@ -82,7 +81,6 @@ typedef struct ProcArrayStruct
 	int			numKnownAssignedXids;	/* current # of valid entries */
 	int			tailKnownAssignedXids;	/* index of oldest valid element */
 	int			headKnownAssignedXids;	/* index of newest element, + 1 */
-	slock_t		known_assigned_xids_lck;	/* protects head/tail pointers */
 
 	/*
 	 * Highest subxid that has been removed from KnownAssignedXids array to
@@ -444,7 +442,6 @@ CreateSharedProcArray(void)
 		procArray->numKnownAssignedXids = 0;
 		procArray->tailKnownAssignedXids = 0;
 		procArray->headKnownAssignedXids = 0;
-		SpinLockInit(>known_assigned_xids_lck);
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
@@ -4651,10 +4648,9 @@ KnownAssignedTransactionIdsIdleMaintenance(void)
  * pointer.  This wouldn't require any lock at all, except that on machines
  * with weak memory ordering we need to be careful that other processors
  * see the array element changes before they see the head pointer change.
- * We handle this by using a spinlock to protect reads and writes of the
- * head/tail pointers.  (We could dispense with the spinlock if we were to
- * create suitable memory access barrier primitives and use those instead.)
- * The spinlock must be taken to read or write the head/tail pointers unless
+ * We handle this by using a memory barrier to protect writes of the
+ * head pointer.
+ * The memory barrier is taken before write the head pointer unless
  * the caller holds ProcArrayLock exclusively.
  *
  * Algorithmic analysis:
@@ -4712,7 +4708,7 @@ KnownAssignedXidsCompress(KAXCompressReason reason, bool haveLock)
 
 	/*
 	 * Since only the startup process modifies the head/tail pointers, we
-	 * don't need a lock to read them here.
+	 * are safe read them here.
 	 */
 	head = pArray->headKnownAssignedXids;
 	tail = pArray->tailKnownAssignedXids;
@@ -4893,21 +4889,20 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	pArray->numKnownAssignedXids += nxids;
 
 	/*
-	 * Now update the head pointer.  We use a spinlock to protect this
+	 * Now update the head pointer.  We use a memory barrier to protect this
 	 * pointer, not because the update is likely to be non-atomic, but to
 	 * ensure that other processors see the above array updates before they
 	 * see the head pointer change.
 	 *
 	 * If we're holding ProcArrayLock exclusively, there's no need to take the
-	 * spinlock.
+	 * barrier.
 	 */
 	if (exclusive_lock)
 		pArray->headKnownAssignedXids = head;
 	else
 	{
-		SpinLockAcquire(>known_assigned_xids_lck);
+		pg_write_barrier();
 		pArray->headKnownAssignedXids = head;
-		SpinLockRelease(>known_assigned_xids_lck);
 	}
 }
 
@@ -4930,20 +4925,8 @@ KnownAssignedXidsSearch(TransactionId xid, bool remove)
 	int			tail;
 	int			result_index = -1;
 
-	if (remove)
-	{
-		/* we hold ProcArrayLock exclusively, so no need for 

Re: heapgettup refactoring

2023-03-19 Thread David Rowley
On Wed, 8 Feb 2023 at 15:09, David Rowley  wrote:
> Using the tests mentioned in [1], I tested out
> remove_HeapScanDescData_rs_inited_field.patch. It's not looking very
> promising at all.

In light of the performance regression from removing the rs_inited
field, let's just forget doing that for now.  It does not really seem
that important compared to the other work that's already been done.

If one of us gets time during the v17 cycle, then maybe we can revisit it then.

I'll mark the patch as committed in the CF app.

David




Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-03-19 Thread Dean Rasheed
I see the PlaceHolderVar issue turned out to be a pre-existing bug after all.
Rebased version attached.

Regards,
Dean
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
new file mode 100644
index b87ad5c..1482ede
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -396,8 +396,8 @@
 originally matched appears later in the list of actions.
 On the other hand, if the row is concurrently updated or deleted so
 that the join condition fails, then MERGE will
-evaluate the condition's NOT MATCHED actions next,
-and execute the first one that succeeds.
+evaluate the condition's NOT MATCHED [BY TARGET]
+actions next, and execute the first one that succeeds.
 If MERGE attempts an INSERT
 and a unique index is present and a duplicate row is concurrently
 inserted, then a uniqueness violation error is raised;
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
new file mode 100644
index 0995fe0..8ef121a
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -33,7 +33,8 @@ USING dat
 and when_clause is:
 
 { WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
-  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }
+  WHEN NOT MATCHED BY SOURCE [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
+  WHEN NOT MATCHED [BY TARGET] [ AND condition ] THEN { merge_insert | DO NOTHING } }
 
 and merge_insert is:
 
@@ -70,7 +71,9 @@ DELETE
from data_source to
target_table_name
producing zero or more candidate change rows.  For each candidate change
-   row, the status of MATCHED or NOT MATCHED
+   row, the status of MATCHED,
+   NOT MATCHED BY SOURCE,
+   or NOT MATCHED [BY TARGET]
is set just once, after which WHEN clauses are evaluated
in the order specified.  For each candidate change row, the first clause to
evaluate as true is executed.  No more than one WHEN
@@ -226,16 +229,37 @@ DELETE
   At least one WHEN clause is required.
  
  
+  The WHEN clause may specify WHEN MATCHED,
+  WHEN NOT MATCHED BY SOURCE, or
+  WHEN NOT MATCHED [BY TARGET].
+  Note that the SQL standard only defines
+  WHEN MATCHED and WHEN NOT MATCHED
+  (which is defined to mean no matching target row).
+  WHEN NOT MATCHED BY SOURCE is an extension to the
+  SQL standard, as is the option to append
+  BY TARGET to WHEN NOT MATCHED, to
+  make its meaning more explicit.
+ 
+ 
   If the WHEN clause specifies WHEN MATCHED
-  and the candidate change row matches a row in the
+  and the candidate change row matches a row in the source to a row in the
   target_table_name,
   the WHEN clause is executed if the
   condition is
   absent or it evaluates to true.
  
  
-  Conversely, if the WHEN clause specifies
-  WHEN NOT MATCHED
+  If the WHEN clause specifies
+  WHEN NOT MATCHED BY SOURCE and the candidate change
+  row represents a row in the
+  target_table_name that does
+  not match a source row, the WHEN clause is executed
+  if the condition is
+  absent or it evaluates to true.
+ 
+ 
+  If the WHEN clause specifies
+  WHEN NOT MATCHED [BY TARGET]
   and the candidate change row does not match a row in the
   target_table_name,
   the WHEN clause is executed if the
@@ -257,7 +281,10 @@ DELETE
  
   A condition on a WHEN MATCHED clause can refer to columns
   in both the source and the target relations. A condition on a
-  WHEN NOT MATCHED clause can only refer to columns from
+  WHEN NOT MATCHED BY SOURCE clause can only refer to
+  columns from the target relation, since by definition there is no matching
+  source row. A condition on a WHEN NOT MATCHED [BY TARGET]
+  clause can only refer to columns from
   the source relation, since by definition there is no matching target row.
   Only the system attributes from the target table are accessible.
  
@@ -382,8 +409,10 @@ DELETE
   WHEN MATCHED clause, the expression can use values
   from the original row in the target table, and values from the
   data_source row.
-  If used in a WHEN NOT MATCHED clause, the
-  expression can use values from the data_source.
+  If used in a WHEN NOT MATCHED BY SOURCE clause, the
+  expression can only use values from the original row in the target table.
+  If used in a WHEN NOT MATCHED [BY TARGET] clause, the
+  expression can only use values from the data_source.
  
 

@@ -452,8 +481,9 @@ MERGE tot

 
  
-  Evaluate whether each row is MATCHED or
-  NOT MATCHED.
+  Evaluate whether each row is MATCHED,
+  NOT MATCHED BY SOURCE, or
+  NOT MATCHED [BY TARGET].
  
 
 
@@ -528,7 +558,8 @@ MERGE tot
   
If a WHEN clause omits an AND