Re: cfbot is listing committed patches?

2023-04-10 Thread Thomas Munro
On Tue, Apr 11, 2023 at 6:16 PM Peter Smith  wrote:
> cfbot [1] is listing some already committed patches under the "Needs
> Review" category. For example here are some of mine [1][2]. And
> because they are already committed, the 'apply'  fails, so they get
> flagged by cfbot as needed rebase.
>
> Something seems broken.

Thanks, fixed.  It was confused because after CF #42 was recently
closed, #43 became "current" but there is not yet a "next" commitfest.
I never noticed before, but I guess those are manually created.




Re: Improve logging when using Huge Pages

2023-04-10 Thread Michael Paquier
On Thu, Mar 23, 2023 at 05:25:46PM +0900, Kyotaro Horiguchi wrote:
> Wouldn't storing the value in the shared memory itself work? Though,
> that space does become almost dead for the server's lifetime...

Sure, it would work.  However, we'd still need an interface for the
extra function.  At this point, a GUC with an unknown state is kind of
OK for me.  Anyway, where would you stick this state?
--
Michael


signature.asc
Description: PGP signature


cfbot is listing committed patches?

2023-04-10 Thread Peter Smith
cfbot [1] is listing some already committed patches under the "Needs
Review" category. For example here are some of mine [1][2]. And
because they are already committed, the 'apply'  fails, so they get
flagged by cfbot as needed rebase.

Something seems broken.

--
[1] http://cfbot.cputube.org/next.html
[2] https://commitfest.postgresql.org/43/4246/
[3] https://commitfest.postgresql.org/43/4266/

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: pg_recvlogical prints bogus error when interrupted

2023-04-10 Thread Michael Paquier
On Mon, Oct 24, 2022 at 08:15:11AM +0530, Bharath Rupireddy wrote:
> The attached patch (pg_recvlogical_graceful_interrupt.text) has a
> couple of problems, I believe. We're losing prepareToTerminate() with
> keepalive true and we're not skipping pg_log_error("unexpected
> termination of replication stream: %s" upon interrupt, after all we're
> here discussing how to avoid it.
> 
> I came up with the attached v2 patch, please have a look.

This thread has slipped through the feature freeze deadline.  Would
people be OK to do something now on HEAD?  A backpatch is also in
order, IMO, as the current behavior looks confusing under SIGINT and
SIGTERM.
--
Michael


signature.asc
Description: PGP signature


Re: "an SQL" vs. "a SQL"

2023-04-10 Thread Michael Paquier
On Tue, Apr 11, 2023 at 05:43:04PM +1200, David Rowley wrote:
> That alarm went off today.
> 
> There seem to be only 3 "a SQL"s in the docs to change to "an SQL".
> 
> This is a pretty old thread, so here's a link [1] to the discussion.

Good catches!
--
Michael


signature.asc
Description: PGP signature


Add information about command path and version of flex in meson output

2023-04-10 Thread Michael Paquier
Hi all,

While doing a few things on Windows with meson, I have noticed that,
while we output some information related to bison after a setup step,
there is nothing about flex.

I think that adding something about flex in the "Programs" section
would be pretty useful, particularly for Windows as the command used
could be "flex" as much as "win_flex.exe".

Attached is a patch to show the path to the flex command used, as well
as its version.

Opinions or thoughts?
--
Michael
diff --git a/meson.build b/meson.build
index b69aaddb1f..3615b861c5 100644
--- a/meson.build
+++ b/meson.build
@@ -361,6 +361,10 @@ bison_kw = {
 }
 
 flex_flags = []
+if flex.found()
+  flex_version = run_command(flex, '--version', check: true)
+  flex_version = flex_version.stdout().split(' ')[1].split('\n')[0]
+endif
 flex_wrapper = files('src/tools/pgflex')
 flex_cmd = [python, flex_wrapper,
   '--builddir', '@BUILD_ROOT@',
@@ -3350,6 +3354,7 @@ if meson.version().version_compare('>=0.57')
 {
   'bison': '@0@ @1@'.format(bison.full_path(), bison_version),
   'dtrace': dtrace,
+  'flex': '@0@ @1@'.format(flex.full_path(), flex_version),
 },
 section: 'Programs',
   )


signature.asc
Description: PGP signature


Re: Fix fseek() detection of unseekable files on WIN32

2023-04-10 Thread Michael Paquier
On Mon, Mar 20, 2023 at 07:06:22AM +0900, Michael Paquier wrote:
> Not sure about this one.  I have considered it and dirmod.c includes
> also bits for cygwin, while being aimed for higher-level routines like
> rename(), unlink() or symlink().  This patch is only for WIN32, and
> aimed for common parts in win32*.c code, so a separate file seemed a
> bit cleaner to me at the end.

After going through the installation of a Windows setup with meson and
ninja under VS, I have checked that this is working correctly by
myself, so I am going to apply that.  One of the tests I have done
involved feeding a dump of the regression data through a pipe to
pg_restore, and the whole was able to work fine, while head broke when
using a pipe.

Digressing a bit, while I don't forget..

Spoiler 1: I don't think that recommending ActivePerl in the
documentation is a good idea these days.  They do not provide anymore
a standalone installer that deploys the binaries you can use, and
they've made it really difficult to even access a "perl" command as it
has become necessary to use an extra command "state activate
--default" to link with a project registered in their stuff, meaning a
connection to their project.  Once this command is launched, the
terminal links to a cached state in AppData.  This is very unfriendly.
In comparison, relying on StrawberryPerl and Chocolatey feels like a
breeze..

Spoiler 2: mingw.org seems to be dead, and we have two links in the
docs referring to it.
--
Michael
From 6ce0e6c996d263d1d9d7348e55756445c80daf2d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 11 Apr 2023 14:33:29 +0900
Subject: [PATCH v4] fix fseek detection of unseekable files for WIN32

Calling fseek() on a handle to a non-seeking device such as a pipe or
a communications device is not supported, even though the fseek() may
not return an error, so harden that funcion with our version.
---
 src/include/port/win32_port.h | 12 --
 src/port/meson.build  |  2 +
 src/port/win32common.c| 68 +++
 src/port/win32fseek.c | 75 +++
 src/port/win32stat.c  | 22 ++
 configure |  6 +++
 configure.ac  |  1 +
 src/tools/msvc/Mkvcbuild.pm   |  2 +
 8 files changed, 166 insertions(+), 22 deletions(-)
 create mode 100644 src/port/win32common.c
 create mode 100644 src/port/win32fseek.c

diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 5116c2fc06..58965e0dfd 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -204,15 +204,21 @@ struct itimerval
 
 int			setitimer(int which, const struct itimerval *value, struct itimerval *ovalue);
 
+/* Convenience wrapper for GetFileType() */
+extern DWORD pgwin32_get_file_type(HANDLE hFile);
+
 /*
  * WIN32 does not provide 64-bit off_t, but does provide the functions operating
- * with 64-bit offsets.
+ * with 64-bit offsets.  Also, fseek() might not give an error for unseekable
+ * streams, so harden that function with our version.
  */
 #define pgoff_t __int64
 
 #ifdef _MSC_VER
-#define fseeko(stream, offset, origin) _fseeki64(stream, offset, origin)
-#define ftello(stream) _ftelli64(stream)
+extern int	_pgfseeko64(FILE *stream, pgoff_t offset, int origin);
+extern pgoff_t _pgftello64(FILE *stream);
+#define fseeko(stream, offset, origin) _pgfseeko64(stream, offset, origin)
+#define ftello(stream) _pgftello64(stream)
 #else
 #ifndef fseeko
 #define fseeko(stream, offset, origin) fseeko64(stream, offset, origin)
diff --git a/src/port/meson.build b/src/port/meson.build
index b174b25021..24416b9bfc 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -29,10 +29,12 @@ if host_system == 'windows'
 'kill.c',
 'open.c',
 'system.c',
+'win32common.c',
 'win32dlopen.c',
 'win32env.c',
 'win32error.c',
 'win32fdatasync.c',
+'win32fseek.c',
 'win32getrusage.c',
 'win32link.c',
 'win32ntdll.c',
diff --git a/src/port/win32common.c b/src/port/win32common.c
new file mode 100644
index 00..2fd78f7f93
--- /dev/null
+++ b/src/port/win32common.c
@@ -0,0 +1,68 @@
+/*-
+ *
+ * win32common.c
+ *	  Common routines shared among the win32*.c ports.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/port/win32common.c
+ *
+ *-
+ */
+
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#endif
+
+#ifdef WIN32
+
+/*
+ * pgwin32_get_file_type
+ *
+ * Convenience wrapper for GetFileType() with specific error handling for all the
+ * port implementations.  Returns the file type associated with a HANDLE.
+ *
+ * On error, sets errno with FILE_TYPE_UNKNOWN as file type.
+ */
+DWORD

Re: "an SQL" vs. "a SQL"

2023-04-10 Thread David Rowley
On Fri, 11 Jun 2021 at 13:44, David Rowley  wrote:
> Anyway, I'll set an alarm for this time next year so I can check on
> how many inconsistencies have crept back in over the development
> cycle.

That alarm went off today.

There seem to be only 3 "a SQL"s in the docs to change to "an SQL".

This is a pretty old thread, so here's a link [1] to the discussion.

David

[1] 
https://postgr.es/m/caaphdvpml27uqfxnryo1mjddskvmqoizispvsaghke_tskx...@mail.gmail.com


a_SQL_to_an_SQL.patch
Description: Binary data


Re: Minimal logical decoding on standbys

2023-04-10 Thread Noah Misch
On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:
> --- /dev/null
> +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
> @@ -0,0 +1,720 @@
> +# logical decoding on standby : test logical decoding,
> +# recovery conflict and standby promotion.
...
> +$node_primary->append_conf('postgresql.conf', q{
> +wal_level = 'logical'
> +max_replication_slots = 4
> +max_wal_senders = 4
> +log_min_messages = 'debug2'
> +log_error_verbosity = verbose
> +});

Buildfarm member hoverfly stopped reporting in when this test joined the tree.
It's currently been stuck here for 140 minutes:

===
$ tail -n5 regress_log_035_standby_logical_decoding
[02:57:48.390](0.100s) ok 66 - otherslot on standby not dropped

### Reloading node "standby"
# Running: pg_ctl -D 
/scratch/nm/farm/xlc64v16/HEAD/pgsql.build/src/test/recovery/tmp_check/t_035_standby_logical_decoding_standby_data/pgdata
 reload
server signaled
===

I've posted a tarball of the current logs at
https://drive.google.com/file/d/1JIZ5hSHBsKjEgU5WOGHOqXB7Z_-9XT5u/view?usp=sharing.
The test times out (PG_TEST_TIMEOUT_DEFAULT=5400), and uploading logs then
fails with 413 Request Entity Too Large.  Is the above
log_min_messages='debug2' important?  Removing that may make the logs small
enough to upload normally.




Re: longfin missing gssapi_ext.h

2023-04-10 Thread Thomas Munro
On Tue, Apr 11, 2023 at 2:53 PM Thomas Munro  wrote:
> On Tue, Apr 11, 2023 at 2:31 AM Stephen Frost  wrote:
> > Have you tried running the tests in src/test/kerberos with elver?  Or is
> > it configured to run them?  Would be awesome if it could be, or if
> > there's issues with running the tests on FBSD w/ MIT Kerberos, I'd be
> > happy to try and help work through them.

Oh, the FreeBSD CI already runs the kerberos test and it uses the krb5
package, because the image it uses installs that[1] and at least the
Meson build automatically prefers that over Heimdal.  So the CI in the
postgres/postgres account tested it[2] as soon as you committed, and
cfbot was testing all along in the commitfest.  It's not skipped and
that test would clearly BAIL_OUT if it detected Heimdal.  Is that good
enough?

I have OpenBSD and NetBSD vagrant images around, do you want me to
test those too?

As for elver, I remembered an unfortunate detail: it doesn't currently
have kerberos enabled in PG_TEXT_EXTRA, because it the test depends on
localhost being 127.0.0.1 which isn't quite true on this box
(container tech with an unusual network stack, long boring story) and
I hadn't got around to figuring out what to do about that.  I can look
into it if you want, or perhaps you are satisfied with CI proving that
FreeBSD likes your patch.

[1] https://github.com/anarazel/pg-vm-images/blob/main/packer/freebsd.pkr.hcl
[2] https://cirrus-ci.com/task/6378179762323456?logs=test_world#L43




RE: Support logical replication of DDLs

2023-04-10 Thread Wei Wang (Fujitsu)
On Fri, Apr 7, 2023 11:23 AM Hou, Zhijie/侯 志杰  wrote:
>

Thanks for updating the patch set.

Here are some comments:
1. The function deparse_drop_command in 0001 patch and the function
publication_deparse_ddl_command_end in 0002 patch.
```
+/*
+ * Handle deparsing of DROP commands.
+ *
+ * Verbose syntax
+ * DROP %s IF EXISTS %%{objidentity}s %{cascade}s
+ */
+char *
+deparse_drop_command(const char *objidentity, const char *objecttype,
+DropBehavior behavior)
+{
+   .
+   stmt = new_objtree_VA("DROP %{objtype}s IF EXISTS %{objidentity}s", 2,
+ "objtype", ObjTypeString, 
objecttype,
+ "objidentity", ObjTypeString, 
identity);
```

I think the option "IF EXISTS" here will be forced to be parsed regardless of
whether it is actually specified by user. Also, I think we seem to be missing
another option for parsing (DropStmt->concurrent).

===
For patch 0002
2. The function parse_publication_options
2.a
```
 static void
 parse_publication_options(ParseState *pstate,
  List *options,
+ bool for_all_tables,
  bool *publish_given,
  PublicationActions 
*pubactions,
  bool 
*publish_via_partition_root_given,
- bool 
*publish_via_partition_root)
+ bool 
*publish_via_partition_root,
+ bool *ddl_type_given)
 {
```

It seems there is nowhere to ues the parameter "for_all_tables". I think we
could revert this change.

2.b
```
@@ -123,7 +129,7 @@ parse_publication_options(ParseState *pstate,
pubactions->pubtruncate = false;
 
*publish_given = true;
-   publish = defGetString(defel);
+   publish = pstrdup(defGetString(defel));
 
if (!SplitIdentifierString(publish, ',', &publish_list))
ereport(ERROR,
```

I think it is fine to only invoke the function defGetString here. Is there any
special reason to invoke the function pstrdup?

Regards,
Wang wei


RE: Support logical replication of DDLs

2023-04-10 Thread Yu Shi (Fujitsu)
On Fri, Apr 7, 2023 11:23 AM houzj.f...@fujitsu.com  
wrote:
> 
> On Friday, April 7, 2023 11:13 AM houzj.f...@fujitsu.com
> 
> >
> > On Tuesday, April 4, 2023 7:35 PM shveta malik 
> > wrote:
> > >
> > > On Tue, Apr 4, 2023 at 8:43 AM houzj.f...@fujitsu.com
> > >  wrote:
> > >
> > > > Attach the new version patch set which did the following changes:
> > > >
> > >
> > > Hello,
> > >
> > > I tried below:
> > > pubnew=# ALTER PUBLICATION mypub2 SET (ddl = 'table'); ALTER
> > > PUBLICATION
> > >
> > > pubnew=# \dRp+
> > > Publication mypub2 Owner  |
> > > All tables
> > > | All DDLs | Table DDLs |
> > > ++--++-
> > > shveta | t  | f   | f
> > > (1 row)
> > >
> > > I still see 'Table DDLs' as false and ddl replication did not work for 
> > > this case.
> >
> > Thanks for reporting.
> >
> > Attach the new version patch which include the following changes:
> > * Fix the above bug for ALTER PUBLICATION SET.
> > * Modify the corresponding event trigger when user execute ALTER
> > PUBLICATION SET to change the ddl option.
> > * Fix a miss in pg_dump's code which causes CFbot failure.
> > * Rebase the patch due to recent commit 4826759.
> 
> Sorry, there was a miss when rebasing the patch which could cause the
> CFbot to fail and here is the correct patch set.
> 

Hi,

Thanks for your patch. Here are some comments.

1.
I saw a problem in the following case.

create type rewritetype as (a int);
alter type rewritetype add attribute b int cascade;

For the ALTER TYPE command, the deparse result is:
ALTER TYPE public.rewritetype ADD ATTRIBUTE  b pg_catalog.int4 STORAGE plain

"STORAGE" is not supported for TYPE. Besides, "CASCADE" is missed.

I think that's because in deparse_AlterRelation(), we process ALTER TYPE ADD
ATTRIBUTE the same way as ALTER TABLE ADD COLUMN. It looks we need some
modification for ALTER TYPE.

2. 
in 0001 patch
+   tmp_obj2 = new_objtree_VA("CASCADE", 1,
+   
 "present", ObjTypeBool, subcmd->behavior);

Would it be better to use "subcmd->behavior == DROP_CASCADE" here?

Regards,
Shi Yu


Re: Can we rely on the ordering of paths in pathlist?

2023-04-10 Thread Andy Fan
On Tue, Apr 11, 2023 at 11:03 AM Richard Guo  wrote:

> As the comment above add_path() says, 'The pathlist is kept sorted by
> total_cost, with cheaper paths at the front.'  And it seems that
> get_cheapest_parallel_safe_total_inner() relies on this ordering
> (without being mentioned in the comments, which I think we should do).
>

I think the answer for ${subject} should be yes. Per the comments in
add_partial_path, we have

 * add_partial_path
 *
 *  As in add_path, the partial_pathlist is kept sorted with the cheapest
 *  total path in front.  This is depended on by multiple places, which
 *  just take the front entry as the cheapest path without searching.
 *
.

> I'm wondering if this is the right thing to do, as in other places
> cheapest total cost is found by compare_path_costs(), which would
> consider startup cost if paths have the same total cost, and that seems
> more sensible.
>
> Attach a trivial patch to make get_cheapest_parallel_safe_total_inner
> act this way.
>
>
Did you run into any real issue with the current coding? If we have to
"consider startup cost if paths have the same total cost", we still can
rely on the ordering and stop iterating when the total_cost becomes
bigger to avoid scanning all the paths in pathlist.

But if you are complaining the function prototype, where is the pathlist
may be not presorted,  I think the better way maybe changes it from:
Path *get_cheapest_parallel_safe_total_inner(List *paths)  to
Path *get_cheapest_parallel_safe_total_inner(RelOptInfo *rel);
and scan the rel->pathlist in the function body.

-- 
Best Regards
Andy Fan


RE: [PATCH] Support % wildcard in extension upgrade filenames

2023-04-10 Thread Regina Obe
> On Mon, Apr 03, 2023 at 09:26:25AM +0700, Yurii Rashkovskii wrote:
> > I want to chime in on the issue of lower-number releases that are
> > released after higher-number releases. The way I see this particular
> > problem is that we always put upgrade SQL files in release "packages,"
> > and they obviously become static resources.
> >
> > While I [intentionally] overlook some details here, what if (as a
> > convention, for projects where it matters) we shipped extensions with
> > non-upgrade SQL files only, and upgrades were available as separate
> > downloads? This way, we're not tying releases themselves to upgrade paths.
> > This also requires no changes to Postgres.
> 
> This is actually something that's on the plate, and we recently added a --
> disable-extension-upgrades-install configure switch and a `install-extension-
> upgrades-from-known-versions` make target in PostGIS to help going in that
> direction. I guess the ball would now be in the hands of packagers.
> 
> > I know this may be a big delivery layout departure for
> > well-established projects; I also understand that this won't solve the
> > problem of having to have these files in the first place (though in
> > many cases, they can be automatically generated once, I suppose, if they are
> trivial).
> 
> We will now also be providing a `postgis` script for administration that among
> other things will support a `install-extension-upgrades` command to install
> upgrade paths from specific old versions, but will not hard-code any list of
> "known" old versions as such a list will easily become outdated.
> 
> Note that all upgrade scripts installed by the Makefile target or by the
> `postgis` scripts will only be empty upgrade paths from the source version to
> the fake "ANY" version, as the ANY-- upgrade path will still be the
> "catch-all" upgrade script.
> 
> --strk;
> 

Sounds like a user and packaging nightmare to me.
How is a packager to know which versions  a user might have installed?

and leaving that to users to run an extra command sounds painful.  They barely 
know how to run

ALTER EXTENSION postgis UPDATE;

Or pg_upgrade

I much preferred the idea of just listing all our upgrade targets in the 
control file.

The many annoyance I have left is the 1000 of files that seem to grow forever.

This isn't just postgis.  It's pgRouting, it's h3_pg, it's mobilitydb.

I don't want to have to set this up for every single extension that does 
micro-updates.
I just want a single file that can list all the target paths worst case.

We need to come up with a convention of how to describe a micro update, as it's 
really a problem with extensions that follow the pattern

major.minor.micro

In almost all cases the minor upgrade script works for all micros of that minor 
and in postgis yes we have a single script that works for all cases.

Thanks,
Regina






 





Can we rely on the ordering of paths in pathlist?

2023-04-10 Thread Richard Guo
As the comment above add_path() says, 'The pathlist is kept sorted by
total_cost, with cheaper paths at the front.'  And it seems that
get_cheapest_parallel_safe_total_inner() relies on this ordering
(without being mentioned in the comments, which I think we should do).
I'm wondering if this is the right thing to do, as in other places
cheapest total cost is found by compare_path_costs(), which would
consider startup cost if paths have the same total cost, and that seems
more sensible.

Attach a trivial patch to make get_cheapest_parallel_safe_total_inner
act this way.

Thanks
Richard


v1-0001-Revise-get_cheapest_parallel_safe_total_inner.patch
Description: Binary data


Re: Direct I/O

2023-04-10 Thread Thomas Munro
On Tue, Apr 11, 2023 at 2:31 PM Thomas Munro  wrote:
> I tried to find out what POSIX says about this

(But of course whatever it might say is of especially limited value
when O_DIRECT is in the picture, being completely unstandardised.
Really I guess all they meant was "if you *copy* something that's
moving, who knows which bits you'll copy"... not "your data might be
incinerated with lasers".)




Re: longfin missing gssapi_ext.h

2023-04-10 Thread Thomas Munro
On Tue, Apr 11, 2023 at 2:31 AM Stephen Frost  wrote:
> Have you tried running the tests in src/test/kerberos with elver?  Or is
> it configured to run them?  Would be awesome if it could be, or if
> there's issues with running the tests on FBSD w/ MIT Kerberos, I'd be
> happy to try and help work through them.

I'm also happy to test/help/improve the animal/teach CI to do
it/whatever.  I've made a note to test out the reverted commit later
today when I'll be in front of the right computers.




Re: longfin missing gssapi_ext.h

2023-04-10 Thread Jonathan S. Katz

On 4/10/23 11:37 AM, Tom Lane wrote:

Stephen Frost  writes:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

IOW, maybe it'd be okay to de-revert 3d4fa227b and add documentation
saying that --with-gssapi requires MIT Kerberos not Heimdal.



I'd be happy with that and can add the appropriate documentation noting
that we require MIT Kerberos.  Presumably the appropriate step at this
point would be to check with the RMT?


Yeah, RMT would have final say at this stage.

If you pull the trigger, a note to buildfarm-members would be
appropriate too, so people will know if they need to remove
--with-gssapi from animal configurations.


The RMT discussed this thread and agrees to a "de-revert" of the "Add 
support for Kerberos delegation" patch, provided that:


1. The appropriate documentation is added AND
2. The de-revert occurs no later than 2023-04-15 (upper bound 2023-04-16 
0:00 AoE).


There's an open item[1] for this task.

Thanks,

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#Open_Issues



OpenPGP_signature
Description: OpenPGP digital signature


Re: Direct I/O

2023-04-10 Thread Thomas Munro
On Tue, Apr 11, 2023 at 2:15 PM Andres Freund  wrote:
> And the fix has been merged into
> https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/log/?h=for-next
>
> I think that means it'll have to wait for 6.4 development to open (in a few
> weeks), and then will be merged into the stable branches from there.

Great!  Let's hope/assume for now that that'll fix phenomenon #2.
That still leaves the checksum-vs-concurrent-modification thing that I
called phenomenon #1, which we've not actually hit with PostgreSQL yet
but is clearly possible and can be seen with the stand-alone
repro-program I posted upthread.  You wrote:

On Mon, Apr 10, 2023 at 2:57 PM Andres Freund  wrote:
> I think we really need to think about whether we eventually we want to do
> something to avoid modifying pages while IO is in progress. The only
> alternative is for filesystems to make copies of everything in the IO path,
> which is far from free (and obviously prevents from using DMA for the whole
> IO). The copy we do to avoid the same problem when checksums are enabled,
> shows up quite prominently in write-heavy profiles, so there's a "purely
> postgres" reason to avoid these issues too.

+1

I wonder what the other file systems that maintain checksums (see list
at [1]) do when the data changes underneath a write.  ZFS's policy is
conservative[2], while BTRFS took the demons-will-fly-out-of-your-nose
route.  I can see arguments for both approaches (ZFS can only reach
zero-copy optimum by turning off checksums completely, while BTRFS is
happy to assume that if you break this programming rule that is not
written down anywhere then you must never want to see your data ever
again).  What about ReFS?  CephFS?

I tried to find out what POSIX says about this WRT synchronous
pwrite() (as Tom suggested, maybe we're doing something POSIX doesn't
allow), but couldn't find it in my first attempt.  It *does* say it's
undefined for aio_write() (which means that my prototype
io_method=posix_aio code that uses that stuff is undefined in presense
of hintbit modifications).  I don't really see why it should vary
between synchronous and asynchronous interfaces (considering the
existence of threads, shared memory etc, the synchronous interface
only removes one thread from list of possible suspects that could flip
some bits).

But yeah, in any case, it doesn't seem great that we do that.

[1] https://en.wikipedia.org/wiki/Comparison_of_file_systems#Block_capabilities
[2] 
https://openzfs.topicbox.com/groups/developer/T950b02acdf392290/odirect-semantics-in-zfs




Re: Direct I/O

2023-04-10 Thread Andres Freund
Hi,

On 2023-04-10 18:55:26 -0700, Andres Freund wrote:
> On 2023-04-10 19:27:27 +1200, Thomas Munro wrote:
> > On Mon, Apr 10, 2023 at 2:57 PM Andres Freund  wrote:
> > > Have you tried to write a reproducer for this that doesn't involve 
> > > postgres?
> > 
> > I tried a bit.  I'll try harder soon.
> > 
> > > ... What kernel version did you repro
> > > this on Thomas?
> > 
> > Debian's 6.0.10-2 kernel (Debian 12 on a random laptop).  Here's how I
> > set up a test btrfs in case someone else wants a head start:
> > 
> > truncate -s2G 2GB.img
> > sudo losetup --show --find 2GB.img
> > sudo mkfs -t btrfs /dev/loop0 # the device name shown by losetup
> > sudo mkdir /mnt/tmp
> > sudo mount /dev/loop0 /mnt/tmp
> > sudo chown $(whoami) /mnt/tmp
> > 
> > cd /mnt/tmp
> > /path/to/initdb -D pgdata
> > ... (see instructions further up for postgres command line + queries to run)
> 
> I initially failed to repro the issue with these instructions. Turns out that
> the problem does not happen if huge pages are in used - I'd configured huge
> pages, so the default huge_pages=try succeeded. As soon as I disable
> huge_pages explicitly, it reproduces.
> 
> Another interesting bit is that if checksums are enabled, I also can't
> reproduce the issue. Together with the huge_page issue, it does suggest that
> this is somehow related to page faults. Which fits with the thread Andrea
> referenced...

The last iteration of the fix that I could find is:
https://lore.kernel.org/linux-btrfs/20230328051957.1161316-1-...@lst.de/T/#m1afdc3fe77e10a97302e7d80fed3efeaa297f0f7

And the fix has been merged into
https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/log/?h=for-next

I think that means it'll have to wait for 6.4 development to open (in a few
weeks), and then will be merged into the stable branches from there.

Greetings,

Andres Freund




RE: running logical replication as the subscription owner

2023-04-10 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi hackers,
Thank you for developing a great feature. 
The following commit added a column to the pg_subscription catalog.
 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=482675987bcdffb390ae735cfd5f34b485ae97c6
 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c3afe8cf5a1e465bd71e48e4bc717f5bfdc7a7d6

I found that the documentation of the pg_subscription catalog was missing an 
explanation of the columns subrunasowner and subpasswordrequired, so I attached 
a patch. Please fix the patch if you have a better explanation.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Robert Haas  
Sent: Wednesday, April 5, 2023 1:10 AM
To: Noah Misch 
Cc: Jeff Davis ; Jelte Fennema ; 
pgsql-hack...@postgresql.org; Andres Freund 
Subject: Re: running logical replication as the subscription owner

On Mon, Apr 3, 2023 at 10:09 PM Noah Misch  wrote:
> I gather we agree on what to do for v16, which is good.

I have committed the patches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com 




pg_subscription_doc_v1.diff
Description: pg_subscription_doc_v1.diff


Re: Direct I/O

2023-04-10 Thread Andres Freund
Hi,

On 2023-04-10 19:27:27 +1200, Thomas Munro wrote:
> On Mon, Apr 10, 2023 at 2:57 PM Andres Freund  wrote:
> > Have you tried to write a reproducer for this that doesn't involve postgres?
> 
> I tried a bit.  I'll try harder soon.
> 
> > ... What kernel version did you repro
> > this on Thomas?
> 
> Debian's 6.0.10-2 kernel (Debian 12 on a random laptop).  Here's how I
> set up a test btrfs in case someone else wants a head start:
> 
> truncate -s2G 2GB.img
> sudo losetup --show --find 2GB.img
> sudo mkfs -t btrfs /dev/loop0 # the device name shown by losetup
> sudo mkdir /mnt/tmp
> sudo mount /dev/loop0 /mnt/tmp
> sudo chown $(whoami) /mnt/tmp
> 
> cd /mnt/tmp
> /path/to/initdb -D pgdata
> ... (see instructions further up for postgres command line + queries to run)

I initially failed to repro the issue with these instructions. Turns out that
the problem does not happen if huge pages are in used - I'd configured huge
pages, so the default huge_pages=try succeeded. As soon as I disable
huge_pages explicitly, it reproduces.

Another interesting bit is that if checksums are enabled, I also can't
reproduce the issue. Together with the huge_page issue, it does suggest that
this is somehow related to page faults. Which fits with the thread Andrea
referenced...

Greetings,

Andres Freund




Re: Show various offset arrays for heap WAL records

2023-04-10 Thread Peter Geoghegan
On Mon, Apr 10, 2023 at 5:23 PM Melanie Plageman
 wrote:
> If you keep the name, I'd explain it briefly in a comment above the code
> then -- for those of us who spend less time with btrees. It is a tool
> that will be often used by developers, so it is not unreasonable to
> assume they may read the code if they are confused.

Okay, I'll do something about that shortly.

Attached v2 deals with the "trailing comma and space in flags array"
heap desc issue using an approach that's along the same lines as your
suggested approach. What do you think?

-- 
Peter Geoghegan


v2-0001-Fix-heapdesc-infomask-array-output.patch
Description: Binary data


Re: Show various offset arrays for heap WAL records

2023-04-10 Thread Melanie Plageman
On Mon, Apr 10, 2023 at 04:31:44PM -0700, Peter Geoghegan wrote:
> On Mon, Apr 10, 2023 at 3:04 PM Melanie Plageman  
> wrote:
> > 
> > I will say that the prefix of p in "ptid" makes it sound like pointer to
> > a tid, which I don't believe is what you meant.
> 
> I was thinking of the symbol name "ptid" from
> _bt_delitems_delete_check() (it even appears in code comments). I
> intended "posting list TID". But "pointer to a TID" actually kinda
> works too, since these are offsets into a posting list (a simple
> ItemPointerData array) for those TIDs that we're in the process of
> removing/deleted from the tuple.

If you keep the name, I'd explain it briefly in a comment above the code
then -- for those of us who spend less time with btrees. It is a tool
that will be often used by developers, so it is not unreasonable to
assume they may read the code if they are confused.

- Melanie




Re: Should vacuum process config file reload more often

2023-04-10 Thread Melanie Plageman
On Fri, Apr 7, 2023 at 9:07 AM Melanie Plageman
 wrote:
>
> On Fri, Apr 7, 2023 at 2:53 AM Masahiko Sawada  wrote:
> >
> > On Fri, Apr 7, 2023 at 8:08 AM Daniel Gustafsson  wrote:
> > >
> > > > On 7 Apr 2023, at 00:12, Melanie Plageman  
> > > > wrote:
> > > >
> > > > On Thu, Apr 6, 2023 at 5:45 PM Daniel Gustafsson  
> > > > wrote:
> > > >>
> > > >>> On 6 Apr 2023, at 23:06, Melanie Plageman  
> > > >>> wrote:
> > > >>
> > > >>> Autovacuum workers, at the end of VacuumUpdateCosts(), check if cost
> > > >>> limit or cost delay have been changed. If they have, they assert that
> > > >>> they don't already hold the AutovacuumLock, take it in shared mode, 
> > > >>> and
> > > >>> do the logging.
> > > >>
> > > >> Another idea would be to copy the values to local temp variables while 
> > > >> holding
> > > >> the lock, and release the lock before calling elog() to avoid holding 
> > > >> the lock
> > > >> over potential IO.
> > > >
> > > > Good idea. I've done this in attached v19.
> > > > Also I looked through the docs and everything still looks correct for
> > > > balancing algo.
> > >
> > > I had another read-through and test-through of this version, and have 
> > > applied
> > > it with some minor changes to comments and whitespace.  Thanks for the 
> > > quick
> > > turnaround times on reviews in this thread!
> >
> > Cool!
> >
> > Regarding the commit 7d71d3dd08, I have one comment:
> >
> > +   /* Only log updates to cost-related variables */
> > +   if (vacuum_cost_delay == original_cost_delay &&
> > +   vacuum_cost_limit == original_cost_limit)
> > +   return;
> >
> > IIUC by default, we log not only before starting the vacuum but also
> > when changing cost-related variables. Which is good, I think, because
> > logging the initial values would also be helpful for investigation.
> > However, I think that we don't log the initial vacuum cost values
> > depending on the values. For example, if the
> > autovacuum_vacuum_cost_delay storage option is set to 0, we don't log
> > the initial values. I think that instead of comparing old and new
> > values, we can write the log only if
> > message_level_is_interesting(DEBUG2) is true. That way, we don't need
> > to acquire the lwlock unnecessarily. And the code looks cleaner to me.
> > I've attached the patch (use_message_level_is_interesting.patch)
>
> Thanks for coming up with the case you thought of with storage param for
> cost delay = 0. In that case we wouldn't print the message initially and
> we should fix that.
>
> I disagree, however, that we should condition it only on
> message_level_is_interesting().
>
> Actually, outside of printing initial values when the autovacuum worker
> first starts (before vacuuming all tables), I don't think we should log
> these values except when they are being updated. Autovacuum workers
> could vacuum tons of small tables and having this print out at least
> once per table (which I know is how it is on master) would be
> distracting. Also, you could be reloading the config to update some
> other GUCs and be oblivious to an ongoing autovacuum and get these
> messages printed out, which I would also find distracting.
>
> You will have to stare very hard at the logs to tell if your changes to
> vacuum cost delay and limit took effect when you reload config. I think
> with our changes to update the values more often, we should take the
> opportunity to make this logging more useful by making it happen only
> when the values are changed.
>
> I would be open to elevating the log level to DEBUG1 for logging only
> updates and, perhaps, having an option if you set log level to DEBUG2,
> for example, to always log these values in VacuumUpdateCosts().
>
> I'd even argue that, potentially, having the cost-delay related
> parameters printed at the beginning of vacuuming could be interesting to
> regular VACUUM as well (even though it doesn't benefit from config
> reload while in progress).
>
> To fix the issue you mentioned and ensure the logging is printed when
> autovacuum workers start up before vacuuming tables, we could either
> initialize vacuum_cost_delay and vacuum_cost_limit to something invalid
> that will always be different than what they are set to in
> VacuumUpdateCosts() (not sure if this poses a problem for VACUUM using
> these values since they are set to the defaults for VACUUM). Or, we
> could duplicate this logging message in do_autovacuum().
>
> Finally, one other point about message_level_is_interesting(). I liked
> the idea of using it a lot, since log level DEBUG2 will not be the
> common case. I thought of it but hesitated because all other users of
> message_level_is_interesting() are avoiding some memory allocation or
> string copying -- not avoiding take a lock. Making this conditioned on
> log level made me a bit uncomfortable. I can't think of a situation when
> it would be a problem, but it felt a bit off.
>
> > Also, while testing the autovacuum delay with relopt
> > autovacu

Re: When to drop src/tools/msvc support

2023-04-10 Thread Jonathan S. Katz

On 4/10/23 4:50 PM, Tom Lane wrote:

Magnus Hagander  writes:

Thus, +1 on actually keeping it up and dropping it immediately as v17
opens, giving them a year of advantage. And probably updating the docs
(if anybody were to read them.. but at least then we tried) stating
that it's deprecated and will be removed in v17.


Yeah, I think that's the only feasible answer at this point.
Maybe a month or two back we could have done differently,
but there's not a lot of runway now.

Once we do drop src/tools/msvc from HEAD, we should make a point
of reminding -packagers about it, in hopes that they'll work on
the transition sooner than next May.


[personal opinion, not RMT]

The last point would be my reasoning for "why not now" given deadlines 
are a pretty good motivator to get things done.


That said, if the plan is to do this "shortly thereafter" for v17 and it 
makes the transition easier, I'm all for that.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: When to drop src/tools/msvc support

2023-04-10 Thread Michael Paquier
On Mon, Apr 10, 2023 at 03:32:19PM -0700, Andres Freund wrote:
> On IM Thomas made some point about CI - I wonder if we should add building 16
> with src/tools/msvc as an optional CI task? We can't enable it by default
> (yet), because we'd not have enough resources to also run that for cfbot. Once
> 16 forked, we then could set to run automatically in the 16 branch, as cfbot
> won't run those.

Getting a CI job able to do some validation for MSVC would be indeed
nice.  What's the plan in the buildfarm with this coverage?  Would all
the animals switch to meson (Chocolatey + StrawberryPerl, I assume)
for the job or will there still be some coverage with MSVC for v16
there?
--
Michael


signature.asc
Description: PGP signature


Re: Show various offset arrays for heap WAL records

2023-04-10 Thread Peter Geoghegan
On Mon, Apr 10, 2023 at 3:04 PM Melanie Plageman
 wrote:
> I took a look at the first patch even though you've pushed the bugfix
> part. Any reason you didn't use array_desc() for the inner array (of
> "ptids")? I find that following the pattern of using array_desc (when it
> is correct, of course!) helps me to quickly identify: "okay, this is an
> array of x" without having to stare at the loop too much.

It was fairly arbitrary. I was thinking "we can't use array_desc for
this", which wasn't 100% true, but seemed close enough. It helped that
this allowed me to remove uint16_elem_desc(), which likely wouldn't
have been reused later on.

> I will say that the prefix of p in "ptid" makes it sound like pointer to
> a tid, which I don't believe is what you meant.

I was thinking of the symbol name "ptid" from
_bt_delitems_delete_check() (it even appears in code comments). I
intended "posting list TID". But "pointer to a TID" actually kinda
works too, since these are offsets into a posting list (a simple
ItemPointerData array) for those TIDs that we're in the process of
removing/deleted from the tuple.

> I like the new guidelines you proposed (in the patch).
> They are well-written and clear.

Thanks. The guidelines might well become stricter in the future. Right
now I'd be happy if everybody could at least be in rough agreement
about the most basic things.

> I recognized that the output doesn't look nice, but I hadn't exactly
> thought of it as malformed. Perhaps you are right.

It does seem like an annoying thing to have to handle if you actually
want to parse the array. It requires a different approach to every
other array, which seems bad.

> I will say and I am still not a fan of the "if (first) else" logic in
> your attached patch.

I agree that my approach there was pretty ugly.

> How about we have the flags use a trailing comma and space and then
> overwrite the last one with something this:
>
> if (infobits & XLHL_KEYS_UPDATED)
> appendStringInfoString(buf, "KEYS_UPDATED, ");
> buf->data[buf->len -= strlen(", ")] = '\0';

I'll try something like that instead.

> -offsets = (OffsetNumber *) &plans[xlrec->nplans];
> +offsets = (OffsetNumber *) ((char *) plans +
> +(xlrec->nplans *
> +sizeof(xl_heap_freeze_plan)));
>  appendStringInfoString(buf, ", plans:");
>  array_desc(buf, plans, sizeof(xl_heap_freeze_plan), 
> xlrec->nplans,
>&plan_elem_desc, &offsets);

I thought that it made sense to match the FREEZE_PAGE REDO routine.

Another fairly arbitrary change, to be honest.

> > Note that the patch makes many individual (say) HOT_UPDATE records
> > have descriptions that look like this:
> >
> > ... old_infobits: [], ...
> >
> > This differs from HEAD, where the output is totally suppressed because
> > there are no flag bits to show. I think that this behavior is more
> > logical and consistent overall.
>
> Yea, I think it is better to include things and show that they are empty
> then omit them. I find it more clear.

Right. It makes sense for something like this, because generally
speaking the structures aren't nested in any real sense. They're also
very static -- WAL records have a fixed structure. So it's unlikely
that anybody is going to try to parse the description before knowing
which particular WAL record type (or perhaps types, plural) are
involved.

-- 
Peter Geoghegan




Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-10 Thread Andres Freund
Hi,

On 2023-04-10 08:14:18 +0900, Michael Paquier wrote:
> On Fri, Apr 07, 2023 at 12:01:17PM -0700, Andres Freund wrote:
> > Why would it mean that? Parallel workers are updated together with the 
> > leader,
> > so there's no compatibility issue?
> 
> My point is that the callback system would still need to be maintained
> in a stable branch, and, while useful, it could be used for much more
> than it is originally written.  I guess that this could be used in
> custom nodes with their own custom parallel nodes.

Hm, I'm somewhat doubtful that that's something we should encourage. And
doubtful we'd get it right without a concrete use case at hand to verify the
design.

Greetings,

Andres Freund




Re: When to drop src/tools/msvc support

2023-04-10 Thread Andres Freund
Hi,

On 2023-04-10 16:50:20 -0400, Tom Lane wrote:
> Yeah, I think that's the only feasible answer at this point.
> Maybe a month or two back we could have done differently,
> but there's not a lot of runway now.
> 
> Once we do drop src/tools/msvc from HEAD, we should make a point
> of reminding -packagers about it, in hopes that they'll work on
> the transition sooner than next May.

So the plan is:

- add note to docs in HEAD that the src/tools/msvc style of build is
  deprecated
- give -packagers a HEADS up, once the deprecation notice has been added to
  the docs
- have a patch ready to drop src/tools/msvc from HEAD once 16 has branched off
  (i.e. a polished version of what I posted upthread)

On IM Thomas made some point about CI - I wonder if we should add building 16
with src/tools/msvc as an optional CI task? We can't enable it by default
(yet), because we'd not have enough resources to also run that for cfbot. Once
16 forked, we then could set to run automatically in the 16 branch, as cfbot
won't run those.

Greetings,

Andres Freund




Re: When to drop src/tools/msvc support

2023-04-10 Thread Andres Freund
Hi,

On 2023-04-10 19:55:35 +0100, Dave Page wrote:
> Projects other than the EDB installers use the MSVC build system - e.g.
> pgAdmin uses it’s own builds of libpq and other tools (psql, pg_dump etc)
> that are pretty heavily baked into a fully automated build system (even the
> build servers and all their requirements are baked into Ansible).
> 
> Changing that lot would be non-trivial, though certainly possible, and I
> suspect we’re not the only ones doing that sort of thing.

Do you have a link to the code for that, if it's open? Just to get an
impression for how hard it'd be to switch over?

Greetings,

Andres Freund




Re: Show various offset arrays for heap WAL records

2023-04-10 Thread Melanie Plageman
On Sun, Apr 9, 2023 at 8:12 PM Peter Geoghegan  wrote:
>
> On Fri, Apr 7, 2023 at 4:46 PM Peter Geoghegan  wrote:
> > Pushed that one too.
>
> I noticed that the nbtree VACUUM and DELETE record types have their
> update/xl_btree_update arrays output incorrectly. We cannot use the
> generic array_desc() approach with xl_btree_update elements, because
> they're variable-width elements. The problem is that array_desc() only deals
> with fixed-width elements.

You are right. I'm sorry for the rather egregious oversight.

I took a look at the first patch even though you've pushed the bugfix
part. Any reason you didn't use array_desc() for the inner array (of
"ptids")? I find that following the pattern of using array_desc (when it
is correct, of course!) helps me to quickly identify: "okay, this is an
array of x" without having to stare at the loop too much.

I will say that the prefix of p in "ptid" makes it sound like pointer to
a tid, which I don't believe is what you meant.

> I also changed some of the details around whitespace in arrays in the
> fixup patch (though I didn't do the same with objects). It doesn't
> seem useful to use so much whitespace for long arrays of integers
> (really page offset numbers). And I brought a few nbtree desc routines
> that still used ";" characters as punctuation in line with the new
> convention.

Cool.

> Finally, the patch revises the guidelines written for rmgr desc
> routine authors. I don't think that we need to describe how to handle
> outputting whitespace in detail. It'll be quite natural for other
> rmgrs to use existing facilities such as array_desc() themselves,
> which makes whitespace type inconsistencies unlikely. I've tried to
> make the limits of the guidelines clear. The main goal is to avoid
> gratuitous inconsistencies, and to provide a standard way of doing
> things that many different rmgrs are likely to want to do, again and
> again. But individual rmgrs still have a certain amount of discretion,
> which seems like a good thing to me (the alternative requires that we
> fix at least a couple of things in nbtdesc.c and in heapdesc.c, which
> doesn't seem useful to me).

I like the new guidelines you proposed (in the patch).
They are well-written and clear.


On Mon, Apr 10, 2023 at 3:18 PM Peter Geoghegan  wrote:
>
> On Sun, Apr 9, 2023 at 5:12 PM Peter Geoghegan  wrote:
> > I noticed that the nbtree VACUUM and DELETE record types have their
> > update/xl_btree_update arrays output incorrectly. We cannot use the
> > generic array_desc() approach with xl_btree_update elements, because
> > they're variable-width elements. The problem is that array_desc() only deals
> > with fixed-width elements.
>
> I pushed this fix just now, though without the updates to the
> guidelines (or only minimal updates).
>
> A remaining problem with arrays appears in "infobits" fields for
> record types such as LOCK. Here's an example of the problem:
>
> off: 34, xid: 3, flags: 0x00, infobits: [, LOCK_ONLY, EXCL_LOCK ]
>
> Clearly the punctuation from the array is malformed.

So, I did do this on purpose -- because I didn't want to have to do the
gymnastics to determine which flag was hit first (though it looks like I
mistakenly omitted the comma prepending IS_MULTI -- that was not
intentional).
I recognized that the output doesn't look nice, but I hadn't exactly
thought of it as malformed. Perhaps you are right.

I will say and I am still not a fan of the "if (first) else" logic in
your attached patch.

I've put my suggestion for how to do it instead inline with the code
diff below for clarity.

diff --git a/src/backend/access/rmgrdesc/heapdesc.c
b/src/backend/access/rmgrdesc/heapdesc.c
index 3bd083875..a64d14c2c 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -18,29 +18,75 @@
 #include "access/rmgrdesc_utils.h"

 static void
-out_infobits(StringInfo buf, uint8 infobits)
+infobits_desc(StringInfo buf, uint8 infobits, const char *keyname)
...
 if (infobits & XLHL_KEYS_UPDATED)
-appendStringInfoString(buf, ", KEYS_UPDATED");
+{
+if (first)
+appendStringInfoString(buf, "KEYS_UPDATED");
+else
+appendStringInfoString(buf, ", KEYS_UPDATED");
+first = false;
+}

How about we have the flags use a trailing comma and space and then
overwrite the last one with something this:

if (infobits & XLHL_KEYS_UPDATED)
appendStringInfoString(buf, "KEYS_UPDATED, ");
buf->data[buf->len -= strlen(", ")] = '\0';


@@ -230,7 +271,9 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 OffsetNumber *offsets;

I don't prefer this to what I had, which is also correct, right?

 plans = (xl_heap_freeze_plan *)
XLogRecGetBlockData(record, 0, NULL);
-offsets = (OffsetNumber *) &plans[xlrec->nplans];
+offsets = (OffsetNumber *) ((char *) plans +
+(xlrec->nplans *
+ 

Re: When to drop src/tools/msvc support

2023-04-10 Thread Tom Lane
Magnus Hagander  writes:
> Thus, +1 on actually keeping it up and dropping it immediately as v17
> opens, giving them a year of advantage. And probably updating the docs
> (if anybody were to read them.. but at least then we tried) stating
> that it's deprecated and will be removed in v17.

Yeah, I think that's the only feasible answer at this point.
Maybe a month or two back we could have done differently,
but there's not a lot of runway now.

Once we do drop src/tools/msvc from HEAD, we should make a point
of reminding -packagers about it, in hopes that they'll work on
the transition sooner than next May.

regards, tom lane




Re: When to drop src/tools/msvc support

2023-04-10 Thread Magnus Hagander
On Mon, Apr 10, 2023 at 6:56 PM Tom Lane  wrote:
>
> Robert Haas  writes:
> > However, if this is the direction we're going, we probably need to
> > give pgsql-packagers a heads up ASAP, because anybody who is still
> > relying on the MSVC system to build Windows binaries is presumably
> > going to need some time to adjust. If we rip out the build system
> > somebody is using a couple of weeks before beta, that might make it
> > difficult for that person to get the beta out promptly. And I think
> > there's probably more than just EDB who would be in that situation.
>
> Oh ... that's a good point.  Is there anyone besides EDB shipping
> MSVC-built executables?  Would it even be practical to switch to
> meson with a month-or-so notice?  Seems kind of tight, and it's
> not like the packagers volunteered to make this switch.
>
> Maybe we have to bite the bullet and keep MSVC for v16.
> If we drop it as soon as v17 opens, there's probably not that
> much incremental work involved compared to dropping for v16.

Not involved with any such build tasks anymore, but I think we can
definitely assume there are others than EDB who do that. It's also
used by people who build add-on modules to be loaded in the
EDB-installer-installed systems, I'm sure.

It seems a bit aggressive to those to drop the entire build system out
just before beta.

Thus, +1 on actually keeping it up and dropping it immediately as v17
opens, giving them a year of advantage. And probably updating the docs
(if anybody were to read them.. but at least then we tried) stating
that it's deprecated and will be removed in v17.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Add index scan progress to pg_stat_progress_vacuum

2023-04-10 Thread Imseih (AWS), Sami
> + case 'P': /* Parallel progress reporting */

I kept this comment as-is but inside case code block I added 
more comments. This is to avoid cluttering up the one-liner comment.

> + * Increase and report the number of index scans. Also, we reset the progress
> + * counters.


> The counters reset are the two index counts, perhaps this comment
> should mention this fact.

Yes, since we are using the multi_param API here, it makes sense to 
mention the progress fields being reset in the comments.


+ /* update progress */
+ int index = pq_getmsgint(msg, 4);
+ int incr = pq_getmsgint(msg, 1);
[...]
+ pq_beginmessage(&progress_message, 'P');
+ pq_sendint32(&progress_message, index);
+ pq_sendint64(&progress_message, incr);
+ pq_endmessage(&progress_message);


> It seems to me that the receiver side is missing one pq_getmsgend()?
Yes. I added this.

> incr is defined and sent as an int64 on the sender side, hence the
> receiver should use pq_getmsgint64(), no? pq_getmsgint(msg, 1) means
> to receive only one byte, see pqformat.c. 

Ah correct, incr is an int64 so what we need is.

int64 incr = pq_getmsgint64(msg);

I also added the pq_getmsgend call.


> And the order is reversed?

I don't think so. The index then incr are sent and they are
back in the same order. Testing the patch shows the value
increments correctly.


See v28 addressing the comments.

Regards,

Sami Imseih 
AWS (Amazon Web Services)




v28-0001-Report-index-vacuum-progress.patch
Description: v28-0001-Report-index-vacuum-progress.patch


Re: Show various offset arrays for heap WAL records

2023-04-10 Thread Peter Geoghegan
On Sun, Apr 9, 2023 at 5:12 PM Peter Geoghegan  wrote:
> I noticed that the nbtree VACUUM and DELETE record types have their
> update/xl_btree_update arrays output incorrectly. We cannot use the
> generic array_desc() approach with xl_btree_update elements, because
> they're variable-width elements. The problem is that array_desc() only deals
> with fixed-width elements.

I pushed this fix just now, though without the updates to the
guidelines (or only minimal updates).

A remaining problem with arrays appears in "infobits" fields for
record types such as LOCK. Here's an example of the problem:

off: 34, xid: 3, flags: 0x00, infobits: [, LOCK_ONLY, EXCL_LOCK ]

Clearly the punctuation from the array is malformed.

A second issue (related to the first) is the name of the key itself,
"infobits". While "infobits" actually seems fine in this particular
example, I don't think that we want to do the same for record types
such as HEAP_UPDATE, since such records require that the description
show information about flags whose underlying field in the WAL record
struct is actually called "old_infobits_set". I think that we should
be outputting "old_infobits: [ ... ] " in the description of
HEAP_UPDATE records, which isn't the case right now.

A third issue is present in the nearby handling of xl_heap_truncate
status flags. It's the same basic array punctuation issue again, so
arguably this is the same issue as the first one.

Attached patch fixes all of these issues, and overhauls the guidelines
in the way originally proposed by the nbtree fix patch (since I didn't
keep that part of the nbtree patch when I pushed it today).

Note that the patch makes many individual (say) HOT_UPDATE records
have descriptions that look like this:

... old_infobits: [], ...

This differs from HEAD, where the output is totally suppressed because
there are no flag bits to show. I think that this behavior is more
logical and consistent overall.

-- 
Peter Geoghegan


v1-0001-Fix-heapdesc-infomask-array-output.patch
Description: Binary data


Re: When to drop src/tools/msvc support

2023-04-10 Thread Dave Page
On Mon, 10 Apr 2023 at 18:34, Robert Haas  wrote:

> On Mon, Apr 10, 2023 at 12:56 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > > However, if this is the direction we're going, we probably need to
> > > give pgsql-packagers a heads up ASAP, because anybody who is still
> > > relying on the MSVC system to build Windows binaries is presumably
> > > going to need some time to adjust. If we rip out the build system
> > > somebody is using a couple of weeks before beta, that might make it
> > > difficult for that person to get the beta out promptly. And I think
> > > there's probably more than just EDB who would be in that situation.
> >
> > Oh ... that's a good point.  Is there anyone besides EDB shipping
> > MSVC-built executables?  Would it even be practical to switch to
> > meson with a month-or-so notice?  Seems kind of tight, and it's
> > not like the packagers volunteered to make this switch.
>
> I can't really speak to those questions with confidence.
>
> Perhaps instead of telling pgsql-packagers what we're doing, we could
> instead ask them if it would work for them if we did XYZ. Then we
> could use that information to inform our decision-making.


Projects other than the EDB installers use the MSVC build system - e.g.
pgAdmin uses it’s own builds of libpq and other tools (psql, pg_dump etc)
that are pretty heavily baked into a fully automated build system (even the
build servers and all their requirements are baked into Ansible).

Changing that lot would be non-trivial, though certainly possible, and I
suspect we’re not the only ones doing that sort of thing.

-- 
-- 
Dave Page
https://pgsnake.blogspot.com

EDB Postgres
https://www.enterprisedb.com


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-10 Thread Robert Haas
On Fri, Apr 7, 2023 at 5:34 PM Yurii Rashkovskii  wrote:
> I'm trying to understand what's wrong with reading port from the pid file (if 
> Postgres writes information there, it's surely so that somebody can read it, 
> otherwise, why write it in the first placd)? The proposed solution uses 
> operating system's functionality to achieve collision-free mechanics with 
> none of the complexity introduced in the commit.

I agree. We don't document the exact format of the postmaster.pid file
to my knowledge, but storage.sgml lists all the things it contains,
and runtime.sgml documents that the first line contains the postmaster
PID, so this is clearly not some totally opaque file that nobody
should ever touch. Consequently, I don't agree with Tom's statement
that this would be a "a horrid way to find out what was picked." There
is some question in my mind about whether this is a feature that we
want PostgreSQL to have, and if we do want it, there may be some room
for debate about how it's implemented, but I reject the idea that
extracting the port number from postmaster.pid is intrinsically a
terrible plan. It seems like a perfectly reasonable plan.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: When to drop src/tools/msvc support

2023-04-10 Thread Robert Haas
On Mon, Apr 10, 2023 at 12:56 PM Tom Lane  wrote:
> Robert Haas  writes:
> > However, if this is the direction we're going, we probably need to
> > give pgsql-packagers a heads up ASAP, because anybody who is still
> > relying on the MSVC system to build Windows binaries is presumably
> > going to need some time to adjust. If we rip out the build system
> > somebody is using a couple of weeks before beta, that might make it
> > difficult for that person to get the beta out promptly. And I think
> > there's probably more than just EDB who would be in that situation.
>
> Oh ... that's a good point.  Is there anyone besides EDB shipping
> MSVC-built executables?  Would it even be practical to switch to
> meson with a month-or-so notice?  Seems kind of tight, and it's
> not like the packagers volunteered to make this switch.

I can't really speak to those questions with confidence.

Perhaps instead of telling pgsql-packagers what we're doing, we could
instead ask them if it would work for them if we did XYZ. Then we
could use that information to inform our decision-making.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: When to drop src/tools/msvc support

2023-04-10 Thread Tom Lane
Robert Haas  writes:
> However, if this is the direction we're going, we probably need to
> give pgsql-packagers a heads up ASAP, because anybody who is still
> relying on the MSVC system to build Windows binaries is presumably
> going to need some time to adjust. If we rip out the build system
> somebody is using a couple of weeks before beta, that might make it
> difficult for that person to get the beta out promptly. And I think
> there's probably more than just EDB who would be in that situation.

Oh ... that's a good point.  Is there anyone besides EDB shipping
MSVC-built executables?  Would it even be practical to switch to
meson with a month-or-so notice?  Seems kind of tight, and it's
not like the packagers volunteered to make this switch.

Maybe we have to bite the bullet and keep MSVC for v16.
If we drop it as soon as v17 opens, there's probably not that
much incremental work involved compared to dropping for v16.

regards, tom lane




Re: is_superuser is not documented

2023-04-10 Thread Robert Haas
On Sat, Apr 8, 2023 at 10:54 AM Joseph Koshakow  wrote:
> is_superuser feels a little out of place in this file. All of
> the options here apply to the entire PostgreSQL server, while
> is_superuser only applies to the current session. The description of
> this file says :
>
> > These options report various aspects of PostgreSQL behavior that
> > might be of interest to certain applications, particularly
> > administrative front-ends. Most of them are determined when
> > PostgreSQL is compiled or when it is installed.
>
> Which doesn't seem to apply to is_superuser. It doesn't affect
> the behavior of PostgreSQL, only what the current session is allowed to
> do.

I'm not sure I agree with that. I mean, maybe the phrasing could be
improved somehow, but "PostgreSQL behavior" as a category seems to
include whether or not it lets you do certain things. And, for
example, psql will show a > or # in the prompt based on whether you're
a superuser. I find "administrative front-end" to be a somewhat odd
turn of phrase, but I guess it means general purpose frontends like
pgsql or pgAdmin or whatever that you might use to administer the
system, as opposed to applications.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: When to drop src/tools/msvc support

2023-04-10 Thread Robert Haas
On Sat, Apr 8, 2023 at 3:30 PM Tom Lane  wrote:
> I guess I'd vote for pulling the trigger in v16 if we can get that
> done by the end of April.  Once we're close to beta I think it
> must wait for v17 to open.

I think that sounds reasonable. It would be to the project's advantage
not to have to maintain three build systems for an extra year, but we
can't still be whacking things around right up until the moment we
expect to ship a beta.

However, if this is the direction we're going, we probably need to
give pgsql-packagers a heads up ASAP, because anybody who is still
relying on the MSVC system to build Windows binaries is presumably
going to need some time to adjust. If we rip out the build system
somebody is using a couple of weeks before beta, that might make it
difficult for that person to get the beta out promptly. And I think
there's probably more than just EDB who would be in that situation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: CREATE SUBSCRIPTION -- add missing tab-completes

2023-04-10 Thread Robert Haas
On Fri, Apr 7, 2023 at 9:29 AM Masahiko Sawada  wrote:
> I think that parameters used by the backend process when performing
> CREATE SUBSCRIPTION belong to the first category. And other parameters
> used by apply workers and tablesync workers belong to the second
> category. Since slot_name is used by both I'm not sure it should be in
> the second category, but password_requried seems to be used by only
> apply workers and tablesync workers, so it should be in the second
> category.

I agree. I think actually the current division is quite odd. The only
parameters that strictly affect the CREATE SUBSCRIPTION command are
"connect" and "create_slot". "enabled" and "slot_name" clearly control
later behavior, because you can alter both of them later, with ALTER
SUBSCRIPTION! The "enabled" parameter is changed using different
syntax, ALTER SUBSCRIPTION .. ENABLE | DISABLE instead of ALTER
SUBSCRIPTION ... SET (enabled = true | false), which is possibly not
the best choice, but regardless of that, these parameters clearly
affect behavior later, not just at CREATE SUBSCRIPTION time.

Probably we ought to just collapse the sections together somehow, and
use the text to clarify the exact behavior as required. I definitely
disagree with the idea of moving the new parameters to the other
section -- that's clearly wrong.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PGDOCS - function pg_get_publication_tables is not documented?

2023-04-10 Thread Tom Lane
"Yu Shi (Fujitsu)"  writes:
> On Fri, Mar 24, 2023 6:26 AM Tom Lane  wrote:
>> I do see a docs change that I think would be worth making: get
>> rid of the explicit mention of it in create_subscription.sgml
>> in favor of using that view.

> I agree and I tried to modify the query to use the view.
> Please see the attached patch.

Ah, now I see why it was written like that: it's kind of annoying
to join to pg_subscription_rel without having access to the relation
OID.  Still, this is more pedagogically correct, so pushed.

regards, tom lane




Re: min/max aggregation for jsonb

2023-04-10 Thread Daneel Yaitskov
Nonetheless PostgreSQL min/max functions don't work with JSON - array_agg
distinct does!

I was working on an experimental napkin audit feature.
It rewrites a chain of SQL queries to thread through meta data about all
computations contributed to every column.
Every data column gets a meta column with JSON.
Calculating meta column for non aggregated column is trivial, because new
column relation with columns used for computation its is 1:1, but
history of aggregated column is composed of a set values (each value has
potentially different history, but usually it is the same).
So in case of aggregated column I had to collapse somehow a set of JSON
values into a few.

Original aggregating query:
SELECT max(a) AS max_a FROM t

The query with audit meta data embedded:
SELECT
max(a) AS max_a,
 jsonb_build_object(
'q', 'SELECT max(a) AS max_a FROM t',
'o', jsonb_build_object(
'a', cast(array_to_json(array_agg( DISTINCT _meta_a)) AS
"jsonb")))
 AS _meta_max_a
FROM t



On Fri, Mar 3, 2023 at 5:41 AM David Rowley  wrote:

> On Fri, 3 Mar 2023 at 23:17, Daneel Yaitskov  wrote:
> > I wanted to use min/max aggregation functions for jsonb type and noticed
> > there is no functions for this type, meanwhile string/array types are
> supported.
>
> It's not really clear to me how you'd want these to sort. If you just
> want to sort by what the output that you see from the type's output
> function then you might get what you need by casting to text.
>
> > Is there a concern about implementing support for jsonb in min/max?
>
> I imagine a lack of any meaningful way of comparing two jsonb values
> to find out which is greater than the other is of some concern.
>
> David
>


-- 

Best regards,
Daniil Iaitskov


Re: Non-superuser subscription owners

2023-04-10 Thread Robert Haas
On Sat, Apr 8, 2023 at 1:35 AM Amit Kapila  wrote:
> Do we need to have a check for this new option "password_required" in
> maybe_reread_subscription() where we "Exit if any parameter that
> affects the remote connection was changed."? This new option is
> related to the remote connection so I thought it is worth considering
> whether we want to exit and restart the apply worker when this option
> is changed.

Hmm, good question. I think that's probably a good idea. If the
current connection is already working, the only possible result of
getting rid of it and trying to create a new one is that it might now
fail instead, but someone might want that behavior. Otherwise, they'd
instead find the failure at a later, maybe less convenient, time.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: longfin missing gssapi_ext.h

2023-04-10 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> IOW, maybe it'd be okay to de-revert 3d4fa227b and add documentation
>> saying that --with-gssapi requires MIT Kerberos not Heimdal.

> I'd be happy with that and can add the appropriate documentation noting
> that we require MIT Kerberos.  Presumably the appropriate step at this
> point would be to check with the RMT?

Yeah, RMT would have final say at this stage.

If you pull the trigger, a note to buildfarm-members would be
appropriate too, so people will know if they need to remove
--with-gssapi from animal configurations.

regards, tom lane




Re: longfin missing gssapi_ext.h

2023-04-10 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Yeah, I wouldn't be the least bit surprised if many folks running
> > FreeBSD with any interest in Kerberos have MIT Kerberos installed given
> > that Heimdal doesn't seem to be under any kind of ongoing active
> > development and is just in this maintenance mode.
> 
> Yeah, that's a pretty scary situation for security-critical software.

Agreed.

> Maybe we should just desupport Heimdal, rather than investing effort
> to the contrary?

As this is for a new major PG release, I'd be in support of that.  I
would like to get the kerberos tests working on a FreeBSD buildfarm
animal with MIT Kerberos installed, if possible.

> Also, the core-code versions of Heimdal in these BSDen are even scarier
> than the upstream releases, so I'm thinking that the fact that we
> currently compile against them is more a net negative than a positive.
> (Same logic as for macOS, really.)

Agreed.  Still, I wouldn't go and break it for minor releases, but for a
new major version saying we no longer support Heimdal seems reasonable.
Then folks have the usual 5-ish years (if they want to delay as long as
possible) to move to MIT Kerberos.

> IOW, maybe it'd be okay to de-revert 3d4fa227b and add documentation
> saying that --with-gssapi requires MIT Kerberos not Heimdal.

I'd be happy with that and can add the appropriate documentation noting
that we require MIT Kerberos.  Presumably the appropriate step at this
point would be to check with the RMT?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: longfin missing gssapi_ext.h

2023-04-10 Thread Tom Lane
Stephen Frost  writes:
> Yeah, I wouldn't be the least bit surprised if many folks running
> FreeBSD with any interest in Kerberos have MIT Kerberos installed given
> that Heimdal doesn't seem to be under any kind of ongoing active
> development and is just in this maintenance mode.

Yeah, that's a pretty scary situation for security-critical software.
Maybe we should just desupport Heimdal, rather than investing effort
to the contrary?

Also, the core-code versions of Heimdal in these BSDen are even scarier
than the upstream releases, so I'm thinking that the fact that we
currently compile against them is more a net negative than a positive.
(Same logic as for macOS, really.)

IOW, maybe it'd be okay to de-revert 3d4fa227b and add documentation
saying that --with-gssapi requires MIT Kerberos not Heimdal.

regards, tom lane




Re: longfin missing gssapi_ext.h

2023-04-10 Thread Stephen Frost
Greetings,

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> On Sun, Apr 9, 2023 at 6:40 AM Tom Lane  wrote:
> > The exact same thing applies to FreeBSD, except that their in-core
> > Heimdal is ancient (1.5.2).  Also, they do have MIT Kerberos
> > available as a package [1].  I'd been misled by the lack of a hit
> > on "kerberos", but "krb5" finds it.  Our code does compile against
> > that version of Heimdal, but src/test/kerberos/ refuses to try to
> > run.
> 
> FWIW my FBSD animal elver has krb5 installed.  Sorry it wasn't running
> when the relevant commit landed.  Stupid network cable wriggled out.

Yeah, I wouldn't be the least bit surprised if many folks running
FreeBSD with any interest in Kerberos have MIT Kerberos installed given
that Heimdal doesn't seem to be under any kind of ongoing active
development and is just in this maintenance mode.

Have you tried running the tests in src/test/kerberos with elver?  Or is
it configured to run them?  Would be awesome if it could be, or if
there's issues with running the tests on FBSD w/ MIT Kerberos, I'd be
happy to try and help work through them.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Fix the miss consideration of tuple_fraction during add_paths_to_append_rel

2023-04-10 Thread Zhang Mingli
HI,


On Apr 10, 2023, 16:35 +0800, Andy Fan , wrote:
> When I am working on "Pushing limit into subqueries of a union" [1], I
> found we already have a great infrastructure to support this. For a query
> like
>
> subquery-1 UNION ALL subquery-2 LIMIT 3;
>
> We have considered the root->tuple_fraction when planning the subqueries
> without an extra Limit node as an overhead. But the reason it doesn't work
> in my real case is flatten_simple_union_all flat the union all subqueries
> into append relation and we didn't handle the root->tuple_fraction during
> add_paths_to_append_rel.
>
> Given the below query for example:
> explain analyze
> (select * from tenk1 order by hundred)
> union all
> (select * from tenk2 order by hundred)
> limit 3;
>
> Without the patch: Execution Time: 7.856 ms
> with the patch:  Execution Time: 0.224 ms
>
> Any suggestion is welcome.
>
> [1] https://www.postgresql.org/message-id/11228.1118365833%40sss.pgh.pa.us
>
> --
> Best Regards
> Andy Fan

There is spare indent at else if.

-   if (childrel->pathlist != NIL &&
+   if (cheapest_startup_path && cheapest_startup_path->param_info 
== NULL)
+   accumulate_append_subpath(cheapest_startup_path,
+     
&subpaths, NULL);
+   else if (childrel->pathlist != NIL &&
    childrel->cheapest_total_path->param_info == NULL)
    accumulate_append_subpath(childrel->cheapest_total_path,
      
&subpaths, NULL);

Could we also consider tuple_fraction in partial_pathlist for  parallel append?


Regards,
Zhang Mingli


Re: Unnecessary confirm work on logical replication

2023-04-10 Thread Ashutosh Bapat
On Fri, Apr 7, 2023 at 11:06 PM Emre Hasegeli  wrote:
>
> I was reading the logical replication code and found a little
> unnecessary work we are doing.
>
> The confirmed_flushed_lsn cannot reasonably be ahead of the
> current_lsn, so there is no point of calling
> LogicalConfirmReceivedLocation() every time we update the candidate
> xmin or restart_lsn.

In fact, the WAL sender always starts reading WAL from restart_lsn,
which in turn is always <= confirmed_flush_lsn. While reading WAL, WAL
sender may read XLOG_RUNNING_XACTS WAL record with lsn <=
confirmed_flush_lsn. While processing XLOG_RUNNING_XACTS record it may
update its restart_lsn and catalog_xmin with current_lsn = lsn fo
XLOG_RUNNING_XACTS record. In this situation current_lsn <=
confirmed_flush_lsn.

Does that make sense?

-- 
Best Wishes,
Ashutosh Bapat




Re: Support logical replication of DDLs

2023-04-10 Thread Amit Kapila
On Fri, Apr 7, 2023 at 8:52 AM houzj.f...@fujitsu.com
 wrote:
>
> Sorry, there was a miss when rebasing the patch which could cause the
> CFbot to fail and here is the correct patch set.
>

I see the following note in the patch: "Note: For ATTACH/DETACH
PARTITION, we haven't added extra logic on the subscriber to handle
the case where the table on the publisher is a PARTITIONED TABLE while
the target table on the subscriber side is a NORMAL table. We will
research this more and improve it later." and wonder what should we do
about this. I can think of the following possibilities: (a) Convert a
non-partitioned table to a partitioned one and then attach the
partition; (b) Add the partition as a separate new table; (c) give an
error that table types mismatch. For Detach partition, I don't see
much possibility than giving an error that no such partition exists or
something like that. Even for the Attach operation, I prefer (c) as
the other options don't seem logical to me and may add more complexity
to this work.

Thoughts?

-- 
With Regards,
Amit Kapila.




RE: Support logical replication of DDLs

2023-04-10 Thread Zhijie Hou (Fujitsu)
On Friday, April 7, 2023 11:23 amhouzj.f...@fujitsu.com 
 wrote:
> 
> On Friday, April 7, 2023 11:13 AM houzj.f...@fujitsu.com
> 
> >
> > On Tuesday, April 4, 2023 7:35 PM shveta malik
> > 
> > wrote:
> > >
> > > On Tue, Apr 4, 2023 at 8:43 AM houzj.f...@fujitsu.com
> > >  wrote:
> > >
> > > > Attach the new version patch set which did the following changes:
> > > >
> > >
> > > Hello,
> > >
> > > I tried below:
> > > pubnew=# ALTER PUBLICATION mypub2 SET (ddl = 'table'); ALTER
> > > PUBLICATION
> > >
> > > pubnew=# \dRp+
> > > Publication mypub2 Owner  |
> > > All tables
> > > | All DDLs | Table DDLs |
> > > ++--++-
> > > shveta | t  | f   | f
> > > (1 row)
> > >
> > > I still see 'Table DDLs' as false and ddl replication did not work for 
> > > this case.
> >
> > Thanks for reporting.
> >
> > Attach the new version patch which include the following changes:
> > * Fix the above bug for ALTER PUBLICATION SET.
> > * Modify the corresponding event trigger when user execute ALTER
> > PUBLICATION SET to change the ddl option.
> > * Fix a miss in pg_dump's code which causes CFbot failure.
> > * Rebase the patch due to recent commit 4826759.

Another thing I find might need to be improved is about the pg_dump handling of
the built-in event trigger. Currently, we skip dumping the event trigger which
are used for ddl replication based on the trigger names(pg_deparse_trig_%s_%u),
because they will be created along with create publication command. Referring
to other built-in triggers(foreign key trigger), it has a tgisinternal catalog
column which can be used to skip the dump for them.

Personally, I think we can follow this style and add a isinternal column to
pg_event_trigger and use it to skip the dump. This could also save some
handling code in pg_dump.

Best Regards,
Hou zj


Re: Using each rel as both outer and inner for JOIN_ANTI

2023-04-10 Thread Richard Guo
On Fri, Apr 7, 2023 at 3:28 PM Richard Guo  wrote:

> On Tue, Aug 2, 2022 at 3:13 PM Richard Guo  wrote:
>
>> On Sun, Jul 31, 2022 at 12:07 AM Tom Lane  wrote:
>>
>>> [ wanders away wondering if JOIN_RIGHT_SEMI should become a thing ... ]
>>
>> Maybe this is something we can do. Currently for the query below:
>>
>> # explain select * from foo where a in (select c from bar);
>>QUERY PLAN
>> -
>>  Hash Semi Join  (cost=154156.00..173691.29 rows=10 width=8)
>>Hash Cond: (foo.a = bar.c)
>>->  Seq Scan on foo  (cost=0.00..1.10 rows=10 width=8)
>>->  Hash  (cost=72124.00..72124.00 rows=500 width=4)
>>  ->  Seq Scan on bar  (cost=0.00..72124.00 rows=500 width=4)
>> (5 rows)
>>
>> I believe we can get a cheaper plan if we are able to swap the outer and
>> inner for SEMI JOIN and use the smaller 'foo' as inner rel.
>>
> It may not be easy for MergeJoin and NestLoop though, as we do not have
> a way to know if an inner tuple has been already matched or not.  But
> the benefit of swapping inputs for MergeJoin and NestLoop seems to be
> small, so I think it's OK to ignore them.
>

Hmm.  Actually we can do it for MergeJoin by avoiding restoring inner
scan to the marked tuple in EXEC_MJ_TESTOUTER, in the case when new
outer tuple == marked tuple.  But I'm not sure how much benefit we can
get from Merge Right Semi Join.

For HashJoin, though, there are cases that can surely benefit from Hash
Right Semi Join.  So I go ahead and have a try on it as attached.

Thanks
Richard


v1-0001-Support-Right-Semi-Join.patch
Description: Binary data


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-10 Thread Hayato Kuroda (Fujitsu)
Dear Julien,

Thank you for giving idea! I have analyzed about it.

> > > If
> > > yes, how does this work if some subscriber node isn't connected when the
> > > publisher node is stopped?  I guess you could add a check in pg_upgrade to
> make
> > > sure that all logical slot are indeed caught up and fail if that's not 
> > > the case
> > > rather than assuming that a clean shutdown implies it.  It would be good 
> > > to
> > > cover that in the TAP test, and also cover some corner cases, like any new
> row
> > > added on the publisher node after the pg_upgrade but before the 
> > > subscriber is
> > > reconnected is also replicated as expected.
> >
> > Hmm, good point. Current patch could not be handled the case because
> walsenders
> > for the such slots do not exist. I have tested your approach, however, I 
> > found that
> > CHECKPOINT_SHUTDOWN record were generated twice when publisher was
> > shutted down and started. It led that the confirmed_lsn of slots always was
> behind
> > from WAL insert location and failed to upgrade every time.
> > Now I do not have good idea to solve it... Do anyone have for this?
> 
> I'm wondering if we could just check that each slot's LSN is exactly
> sizeof(CHECKPOINT_SHUTDOWN) ago or something like that?  That's hackish,
> but if
> pg_upgrade can run it means it was a clean shutdown so it should be safe to
> assume that what's the last record in the WAL was.  For the double
> shutdown checkpoint, I'm not sure that I get the problem.  The check should
> only be done at the very beginning of pg_upgrade, so there should have been
> only one shutdown checkpoint done right?

I have analyzed about the point but it seemed to be difficult. This is because
some additional records like followings may be inserted. PSA the script which is
used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
so I wanted to withdraw it once. Sorry for noise.

* HEAP/HEAP2 records. These records may be inserted by checkpointer.

IIUC, if there are tuples which have not been flushed yet when shutdown is 
requested,
the checkpointer writes back all of them into heap file. At that time many WAL
records are generated. I think we cannot predict the number of records 
beforehand.

* INVALIDATION(S) records. These records may be inserted by VACUUM.

There is a possibility that autovacuum runs and generate WAL records. I think we
cannot predict the number of records beforehand because it depends on the number
of objects.

* RUNNING_XACTS record

It might be a timing issue, but I found that sometimes background writer 
generated
a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it 
will be
generated when the process spends 15 seconds since last logging and there are
important records. I think it is difficult to predict whether this will be 
appeared or not.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



test.sh
Description: test.sh


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-10 Thread Hayato Kuroda (Fujitsu)
Dear Julien,

> Well, even if physical replication slots were eventually preserved during
> pg_upgrade, maybe users would like to only keep one kind of the others so
> having both options could make sense.

You meant to say that we can rename options like "logical-*" and later add a new
option for physical slots if needed, right? PSA the new patch which handled the 
comment.

> That being said, I have a hard time believing that we could actually preserve
> physical replication slots.  I don't think that pg_upgrade final state is 
> fully
> reproducible:  not all object oids are preserved, and the various pg_restore
> are run in parallel so you're very likely to end up with small physical
> differences that would be incompatible with physical replication.  Even if we
> could make it totally reproducible, it would probably be at the cost of making
> pg_upgrade orders of magnitude slower.  And since many people are already
> complaining that it's too slow, that doesn't seem like something we would 
> want.

Your point made sense to me. Thank you for giving your opinion.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v3-0001-pg_upgrade-Add-include-logical-replication-slots-.patch
Description:  v3-0001-pg_upgrade-Add-include-logical-replication-slots-.patch


Fix the miss consideration of tuple_fraction during add_paths_to_append_rel

2023-04-10 Thread Andy Fan
When I am working on "Pushing limit into subqueries of a union" [1], I
found we already have a great infrastructure to support this. For a query
like

subquery-1 UNION ALL subquery-2 LIMIT 3;

We have considered the root->tuple_fraction when planning the subqueries
without an extra Limit node as an overhead. But the reason it doesn't work
in my real case is flatten_simple_union_all flat the union all subqueries
into append relation and we didn't handle the root->tuple_fraction during
add_paths_to_append_rel.

Given the below query for example:
explain analyze
(select * from tenk1 order by hundred)
union all
(select * from tenk2 order by hundred)
limit 3;

Without the patch: Execution Time: 7.856 ms
with the patch:  Execution Time: 0.224 ms

Any suggestion is welcome.

[1] https://www.postgresql.org/message-id/11228.1118365833%40sss.pgh.pa.us

-- 
Best Regards
Andy Fan


v1-0001-Considering-root-tuple_fraction-during-create_app.patch
Description: Binary data


eclg -C ORACLE breaks data

2023-04-10 Thread Kyotaro Horiguchi
Hello, we encountered unexpected behavior from an ECPG program
complied with the -C ORACLE option.  The program executes the
following query:

 SELECT 123::numeric(3,0), 't'::char(2)";

Compilation and execution steps:

$ ecpg -C ORACLE ecpgtest.pgc (attached)
$ gcc -g -o ecpgtest ecpgtest.c -L `pg_config --libdir` -I `pg_config 
--includedir` -lecpg -lpgtypes
$ ./ecpgtest
type: numeric : data: "120"
type: bpchar  : data: "t"

The expected numeric value is "123".


The code below is the direct cause of the unanticipated data
modification.

interfaces/ecpg/ecpglib/data.c:581 (whitespaces are swueezed)
> if (varcharsize == 0 || varcharsize > size)
> {
>   /*
>* compatibility mode, blank pad and null
>* terminate char array
>*/
> if (ORACLE_MODE(compat) && (type == ECPGt_char || type == 
> ECPGt_unsigned_char))
> {
>   memset(str, ' ', varcharsize);
>   memcpy(str, pval, size);
>   str[varcharsize - 1] = '\0';

This results in overwriting str[-1], the last byte of the preceding
numeric in this case, with 0x00, representing the digit '0'. When
callers of ecpg_get_data() explicitly pass zero as varcharsize, they
provide storage that precisely fitting the data. However, it remains
uncertain if this assumption is valid when ecpg_store_result() passes
var->varcharsize which is also zero. Consequently, the current fix
presumes exact-fit storage when varcharsize is zero.

I haven't fully checked, but at least back to 10 have this issue.

Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#include 
#include 
#include 
#include 
#include 
#include 
#include 

EXEC SQL include sqlda.h;

int
main(void)
{
sqlda_t *sqlda = NULL;
char name_buf[16] = {0};
char var_buf[16] = {0};

EXEC SQL CONNECT TO 'postgres@/tmp:5432' AS conn1;
EXEC SQL PREPARE stmt1 FROM "SELECT 123::numeric(3,0), 't'::varchar(2)";
EXEC SQL DECLARE cur1 CURSOR FOR stmt1;
EXEC SQL OPEN cur1;

EXEC SQL FETCH NEXT FROM cur1 INTO DESCRIPTOR sqlda;

if (sqlda == NULL) { fprintf(stderr, "No data\n"); exit(1); }

for (int i = 0 ; i < sqlda->sqld ; i++)
{
sqlvar_t v = sqlda->sqlvar[i];
char *sqldata = v.sqldata;

if (v.sqltype == ECPGt_numeric)
  sqldata =

PGTYPESnumeric_to_asc((numeric*)sqlda->sqlvar[i].sqldata, -1);

printf("type: %-8s: data: \"%s\"\n", v.sqlname.data, sqldata);
}

EXEC SQL CLOSE cur1;
EXEC SQL COMMIT;
EXEC SQL DISCONNECT ALL;

return 0;
}
>From 6e5d3a513209cb3df581b4d565537eae1013cf63 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 10 Apr 2023 16:38:43 +0900
Subject: [PATCH 1/2] test

---
 .../ecpg/test/compat_oracle/char_array.pgc|  33 ++-
 .../test/expected/compat_oracle-char_array.c  | 226 +-
 .../expected/compat_oracle-char_array.stderr  | 198 ---
 .../expected/compat_oracle-char_array.stdout  |   5 +
 4 files changed, 318 insertions(+), 144 deletions(-)

diff --git a/src/interfaces/ecpg/test/compat_oracle/char_array.pgc b/src/interfaces/ecpg/test/compat_oracle/char_array.pgc
index 6a5d383d4e..a39dc8f0c0 100644
--- a/src/interfaces/ecpg/test/compat_oracle/char_array.pgc
+++ b/src/interfaces/ecpg/test/compat_oracle/char_array.pgc
@@ -2,6 +2,10 @@
 #include 
 #include 
 
+#include 
+
+EXEC SQL INCLUDE sqlda.h;
+
 EXEC SQL INCLUDE ../regression;
 
 static void warn(void)
@@ -20,6 +24,8 @@ int main() {
 
   const char *p = "X";
   int loopcount;
+  sqlda_t *sqlda = NULL;
+
   EXEC SQL BEGIN DECLARE SECTION;
   char shortstr[5];
   char bigstr[11];
@@ -53,11 +59,34 @@ int main() {
 
   EXEC SQL CLOSE C;
   EXEC SQL DROP TABLE strdbase;
+  EXEC SQL COMMIT WORK;
 
+  /* SQLDA handling */
+  EXEC SQL WHENEVER SQLWARNING SQLPRINT;
+  EXEC SQL WHENEVER NOT FOUND STOP;
+  EXEC SQL PREPARE stmt1 FROM "SELECT 123::numeric(3,0), 't'::varchar(2)";
+  EXEC SQL DECLARE cur1 CURSOR FOR stmt1;
+  EXEC SQL OPEN cur1;
+  EXEC SQL FETCH NEXT FROM cur1 INTO DESCRIPTOR sqlda;
+
+  printf("\n-\ntype: data\n");
+  for (int i = 0 ; i < sqlda->sqld ; i++)
+  {
+	  sqlvar_t v = sqlda->sqlvar[i];
+	  char *sqldata = v.sqldata;
+
+	  if (v.sqltype == ECPGt_numeric)
+		  sqldata =
+			  PGTYPESnumeric_to_asc((numeric*)sqlda->sqlvar[i].sqldata, -1);
+
+	  printf("%-8s: \"%s\"\n", v.sqlname.data, sqldata);
+  }
+
+  EXEC SQL CLOSE cur1;
+  EXEC SQL COMMIT WORK;
+  
   printf("\nGOOD-BYE!!\n\n");
 
-  EXEC SQL COMMIT WORK;
-
   EXEC SQL DISCONNECT ALL;
 
   return 0;
diff --git a/src/interfaces/ecpg/test/expected/compat_oracle-char_array.c b/src/interfaces/ecpg/test/expected/compat_oracle-char_array.c
index 04d4e1969e..b82e45a090 100644
--- a/src/interfaces/ecpg/test/expected/compat_oracle-char_array.c
+++ b/src/interfaces/ecpg/test/expected/compat_oracle-char_array.c
@@ -11,6 +11,32 @@
 #include 
 #include 
 
+#i

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-04-10 Thread Pavel Luzanov

On 05.04.2023 03:41, Melanie Plageman wrote:

On Tue, Apr 4, 2023 at 4:35 PM Pavel Luzanov  wrote:


After a little thought... I'm not sure about the term 'bootstrap
process'. I can't find this term in the documentation.

There are various mentions of "bootstrap" peppered throughout the docs
but no concise summary of what it is. For example, initdb docs mention
the "bootstrap backend" [1].

Interestingly, 910cab820d0 added "Bootstrap superuser" in November. This
doesn't really cover what bootstrapping is itself, but I wonder if that
is useful? If so, you could propose a glossary entry for it?
(preferably in a new thread)


I'm not sure if this is the reason for adding a new entry in the glossary.


Do I understand correctly that this is a postmaster? If so, then the
postmaster process is not shown in pg_stat_activity.

No, bootstrap process is for initializing the template database. You
will not be able to see pg_stat_activity when it is running.


Oh, it's clear to me now. Thank you for the explanation.


You can query pg_stat_activity from single user mode, so it is relevant
to pg_stat_activity also. I take your point that bootstrap mode isn't
relevant for pg_stat_activity, but I am hesitant to add that distinction
to the pg_stat_io docs since the reason you won't see it in
pg_stat_activity is because it is ephemeral and before a user can access
the database and not because stats are not tracked for it.

Can you think of a way to convey this?


See my attempt attached.
I'm not sure about the wording. But I think we can avoid the term 
'bootstrap process'
by replacing it with "database cluster initialization", which should be 
clear to everyone.


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
From ff76b81a9d52581f6fdaf9c1f3885e8272d2ae3c Mon Sep 17 00:00:00 2001
From: Pavel Luzanov 
Date: Mon, 10 Apr 2023 10:25:52 +0300
Subject: [PATCH v2] [PATCH v2] Document standalone backend type in
 pg_stat_activity

Reported-by: Pavel Luzanov 
Discussion: https://www.postgresql.org/message-id/fcbe2851-f1fb-9863-54bc-a95dc7a0d946%40postgrespro.ru
---
 doc/src/sgml/monitoring.sgml | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f33a1c56c..45e20efbfb 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -991,6 +991,9 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
archiver,
startup, walreceiver,
walsender and walwriter.
+   The special type standalone backend is used
+   when initializing a database cluster by 
+   and when running in the .
In addition, background workers registered by extensions may have
additional types.
   
-- 
2.34.1



Re: Direct I/O

2023-04-10 Thread Thomas Munro
On Mon, Apr 10, 2023 at 7:27 PM Thomas Munro  wrote:
> Debian's 6.0.10-2 kernel (Debian 12 on a random laptop).

Realising I hadn't updated for a bit, I did so and it still reproduces on:

$ uname -a
Linux x1 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1
(2023-03-19) x86_64 GNU/Linux




Re: Direct I/O

2023-04-10 Thread Thomas Munro
On Mon, Apr 10, 2023 at 2:57 PM Andres Freund  wrote:
> Have you tried to write a reproducer for this that doesn't involve postgres?

I tried a bit.  I'll try harder soon.

> ... What kernel version did you repro
> this on Thomas?

Debian's 6.0.10-2 kernel (Debian 12 on a random laptop).  Here's how I
set up a test btrfs in case someone else wants a head start:

truncate -s2G 2GB.img
sudo losetup --show --find 2GB.img
sudo mkfs -t btrfs /dev/loop0 # the device name shown by losetup
sudo mkdir /mnt/tmp
sudo mount /dev/loop0 /mnt/tmp
sudo chown $(whoami) /mnt/tmp

cd /mnt/tmp
/path/to/initdb -D pgdata
... (see instructions further up for postgres command line + queries to run)




Re: Direct I/O

2023-04-10 Thread Andrea Gelmini
Il giorno lun 10 apr 2023 alle ore 04:58 Andres Freund
 ha scritto:
> We should definitely let the brtfs folks know of this issue... It's possible
> that this bug was recently introduced even. What kernel version did you repro
> this on Thomas?

In these days on BTRFS ml they are discussing about Direct I/O data
corruption. No patch at the moment, they are still discussing how to
address it:
https://lore.kernel.org/linux-btrfs/aa1fb69e-b613-47aa-a99e-a0a2c9ed2...@app.fastmail.com/

Ciao,
Gelma