Fix regression tests to work with REGRESS_OPTS=--no-locale

2023-06-14 Thread Michael Paquier
Hi all,

While doing some tests with the tree, I have noticed that we don't do
in the tests of unaccent the business that we have elsewhere
(test_regex, fuzzystrmatch, now hstore, collation tests, etc.) to make
the tests portable when these tests include UTF-8 characters but the
regression database cannot support that.

It took some time to notice that, which makes me wonder how relevant
this stuff is these days..  Anyway, I would like to do like the others
and fix it, so I am proposing the attached.

Thoughts?
--
Michael
From 4dd02315941db6c114950601cc9a0e6837e5d6e0 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 15 Jun 2023 15:50:43 +0900
Subject: [PATCH] Fix regression tests of unaccent to be portable across more
 setups

---
 contrib/unaccent/expected/unaccent.out   | 16 +---
 contrib/unaccent/expected/unaccent_1.out |  8 
 contrib/unaccent/sql/unaccent.sql| 14 +++---
 3 files changed, 28 insertions(+), 10 deletions(-)
 create mode 100644 contrib/unaccent/expected/unaccent_1.out

diff --git a/contrib/unaccent/expected/unaccent.out b/contrib/unaccent/expected/unaccent.out
index ee0ac71a1c..f080707c4a 100644
--- a/contrib/unaccent/expected/unaccent.out
+++ b/contrib/unaccent/expected/unaccent.out
@@ -1,11 +1,13 @@
+/*
+ * This test must be run in a database with UTF-8 encoding,
+ * because other encodings don't support all the characters used.
+ */
+SELECT getdatabaseencoding() <> 'UTF8'
+   AS skip_test \gset
+\if :skip_test
+\quit
+\endif
 CREATE EXTENSION unaccent;
--- must have a UTF8 database
-SELECT getdatabaseencoding();
- getdatabaseencoding 
--
- UTF8
-(1 row)
-
 SET client_encoding TO 'UTF8';
 SELECT unaccent('foobar');
  unaccent 
diff --git a/contrib/unaccent/expected/unaccent_1.out b/contrib/unaccent/expected/unaccent_1.out
new file mode 100644
index 00..37aead89c0
--- /dev/null
+++ b/contrib/unaccent/expected/unaccent_1.out
@@ -0,0 +1,8 @@
+/*
+ * This test must be run in a database with UTF-8 encoding,
+ * because other encodings don't support all the characters used.
+ */
+SELECT getdatabaseencoding() <> 'UTF8'
+   AS skip_test \gset
+\if :skip_test
+\quit
diff --git a/contrib/unaccent/sql/unaccent.sql b/contrib/unaccent/sql/unaccent.sql
index 3fc0c706be..663646c1ac 100644
--- a/contrib/unaccent/sql/unaccent.sql
+++ b/contrib/unaccent/sql/unaccent.sql
@@ -1,7 +1,15 @@
-CREATE EXTENSION unaccent;
+/*
+ * This test must be run in a database with UTF-8 encoding,
+ * because other encodings don't support all the characters used.
+ */
 
--- must have a UTF8 database
-SELECT getdatabaseencoding();
+SELECT getdatabaseencoding() <> 'UTF8'
+   AS skip_test \gset
+\if :skip_test
+\quit
+\endif
+
+CREATE EXTENSION unaccent;
 
 SET client_encoding TO 'UTF8';
 
-- 
2.40.1



signature.asc
Description: PGP signature


Re: Allow pg_archivecleanup to remove backup history files

2023-06-14 Thread Kyotaro Horiguchi
At Wed, 14 Jun 2023 00:49:39 +0900, torikoshia  
wrote in 
> On 2023-06-12 16:33, Michael Paquier wrote:
> > On Fri, Jun 09, 2023 at 12:32:15AM +0900, Fujii Masao wrote:
> Thanks for reviewing!
> 
> >>printf(_(" -n, --dry-run dry run, show the names of the files that
> >>would be removed\n"));
> >> + printf(_(" -b, --clean-backup-history clean up files including
> >> backup history files\n"));
> >> Shouldn't -b option be placed in alphabetical order?
> > +1.
> 
> Modified the place.

-   printf(_("  -d generate debug output (verbose mode)\n"));
-   printf(_("  -n dry run, show the names of the files that 
would be removed\n"));
-   printf(_("  -V, --version  output version information, then exit\n"));
-   printf(_("  -x EXT clean up files if they have this 
extension\n"));
-   printf(_("  -?, --help show this help, then exit\n"));
+   printf(_("  -d, --debug generate debug output (verbose 
mode)\n"));
+   printf(_("  -n, --dry-run   dry run, show the names of the 
files that would be removed\n"));
+   printf(_("  -V, --version   output version information, 
then exit\n"));
+   printf(_("  -x, --strip-extension=EXT   strip this extention before 
identifying files fo clean up\n"));
+   printf(_("  -?, --help  show this help, then exit\n"));

After this change, some of these lines corss the boundary of the 80
columns width.  (is that policy viable noadays? I am usually working
using terminal windows with such a width..) It's somewhat unrelated to
this patch, but a help line a few lines further down also exceeds the
width. We could shorten it by removing the "/mnt/server" portion, but
I'm not sure if it's worth doing.


> Or for use as a standalone archive cleaner:
> e.g.
>   pg_archivecleanup /mnt/server/archiverdir 
> 00010010.0020.backup


+   printf(_("  -x, --strip-extension=EXT   strip this extention before 
identifying files fo clean up\n"));

s/fo/for/ ?


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Initial Schema Sync for Logical Replication

2023-06-14 Thread Peter Smith
On Thu, Jun 8, 2023 at 1:24 PM Masahiko Sawada  wrote:
>
...

> We also need to research how to integrate the initial schema
> synchronization with tablesync workers.  We have a PoC patch[2].
>
> Regards,
>
> [1] 
> https://wiki.postgresql.org/wiki/Logical_replication_of_DDLs#Initial_Schema_Sync
> [2] 
> https://www.postgresql.org/message-id/CAD21AoCdfg506__qKz%2BHX8vqfdyKgQ5qeehgqq9bi1L-6p5Pwg%40mail.gmail.com
>

FYI -- the PoC patch fails to apply using HEAD fetched today.

git apply 
../patches_misc/0001-Poc-initial-table-structure-synchronization-in-logic.patch
error: patch failed: src/backend/replication/logical/tablesync.c:1245
error: src/backend/replication/logical/tablesync.c: patch does not apply

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Support to define custom wait events for extensions

2023-06-14 Thread Masahiro Ikeda

Hi,

Currently, only one PG_WAIT_EXTENSION event can be used as a
wait event for extensions. Therefore, in environments with multiple
extensions are installed, it could take time to identify which
extension is the bottleneck.

So, I'd like to support new APIs to define custom wait events
for extensions. It's discussed in [1].

I made patches to realize it. Although I have some TODOs,
I want to know your feedbacks. Please feel free to comment.


# Implementation of new APIs

I implemented 2 new APIs for extensions.
* RequestNamedExtensionWaitEventTranche()
* GetNamedExtensionWaitEventTranche()

Extensions can request custom wait events by calling
RequestNamedExtensionWaitEventTranche(). After that, use
GetNamedExtensionWaitEventTranche() to get the wait event information.

The APIs usage example by extensions are following.

```
shmem_request_hook = shmem_request;
shmem_startup_hook = shmem_startup;

static void
shmem_request(void)
{
/* request a custom wait event */
RequestNamedExtensionWaitEventTranche("custom_wait_event");
}

static void
shmem_startup(void)
{
/* get the wait event information */
	custom_wait_event = 
GetNamedExtensionWaitEventTranche("custom_wait_event");

}

void
extension_funtion()
{
(void) WaitLatch(MyLatch,
 WL_LATCH_SET | WL_TIMEOUT | 
WL_EXIT_ON_PM_DEATH,
 10L * 1000,
 custom_wait_event);  /* notify 
core with custom wait event */
ResetLatch(MyLatch);
}
```


# Implementation overview

I referenced the implementation of
RequestNamedLWLockTranche()/GetNamedLWLockTranche().
(0001-Support-to-define-custom-wait-events-for-extensions.patch)

Extensions calls RequestNamedExtensionWaitEventTranche() in
shmem_request_hook() to request wait events to be used by each 
extension.


In the core, the requested wait events are dynamically registered in 
shared

memory. The extension then obtains the wait event information with
GetNamedExtensionWaitEventTranche() and uses the value to notify the 
core

that it is waiting.

When a string representing of the wait event is requested,
it returns the name defined by calling 
RequestNamedExtensionWaitEventTranche().



# PoC extension

I created the PoC extension and you can use it, as shown here:
(0002-Add-a-extension-to-test-custom-wait-event.patch)

1. start PostgreSQL with the following configuration
shared_preload_libraries = 'inject_wait_event'

2. check wait events periodically
psql-1=# SELECT query, wait_event_type, wait_event FROM pg_stat_activity 
WHERE backend_type = 'client backend' AND pid != pg_backend_pid() ;

psql-1=# \watch

3. execute a function to inject a wait event
psql-2=# CREATE EXTENSION inject_wait_event;
psql-2=# SELECT inject_wait_event();

4. check the custom wait event
You can see the following results of psql-1.

(..snip..)

query| wait_event_type | wait_event
-+-+
 SELECT inject_wait_event(); | Extension   | Extension
(1 row)

(..snip..)

(...about 10 seconds later ..)


query| wait_event_type |wait_event
-+-+---
 SELECT inject_wait_event(); | Extension   | custom_wait_event   
 # requested wait event by the extension!

(1 row)

(..snip..)


# TODOs

* tests on windows (since I tested on Ubuntu 20.04 only)
* add custom wait events for existing contrib modules (ex. postgres_fdw)
* add regression code (but, it seems to be difficult)
* others? (Please let me know)


[1] 
https://www.postgresql.org/message-id/81290a48-b25c-22a5-31a6-3feff5864fe3%40gmail.com


Regards,

--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 59a118402e5e59685fb9e0fb086872e25a405736 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Thu, 15 Jun 2023 12:57:29 +0900
Subject: [PATCH 2/3] Support to define custom wait events for extensions.

Currently, only one PG_WAIT_EXTENSION event can be used as a
wait event for extensions. Therefore, in environments with multiple
extensions are installed, it could take time to identify bottlenecks.

This commit allows defining custom wait events for extensions and
introduces a new API called RequestNamedExtensionWaitEventTranche()/
GetNamedExtensionWaitEventTranche().

These refer to RequestNamedLWLockTranche()/GetNamedLWLockTranche(),
but do not require as much flexibility as LWLock and can be implemented
more simply.

The extension calls RequestNamedExtensionWaitEventTranche() in
shmem_request_hook() to request wait events to be used by each extension.
In the core, the requested wait events are dynamically registered in shared
memory. The extension then obtains the wait event information with
GetNamedExtensionWaitEventTranche() and uses the value to notify the core
that it is waiting.
---
 src/backend/postmaster/postmaster.c |   6 +
 src/backend/storage/ipc/ipci.c  |  

Re: add non-option reordering to in-tree getopt_long

2023-06-14 Thread Kyotaro Horiguchi
At Wed, 14 Jun 2023 15:46:08 -0700, Nathan Bossart  
wrote in 
> On Wed, Jun 14, 2023 at 03:11:54PM -0700, Noah Misch wrote:
> > Here's some output from this program (on AIX 7.1, same output when compiled
> > 32-bit or 64-bit):
> > 
> > $ ./a.out a b c d e f
> > f e d c b a ./a.out
> 
> Thanks again.
> 
> > Interesting discussion here, too:
> > https://github.com/libuv/libuv/pull/1187
> 
> Hm.  IIUC modifying the argv pointers on AIX will modify the process title,
> which could cause 'ps' to temporarily show duplicate/missing arguments
> during option parsing.  That doesn't seem too terrible, but if pointer
> assignments aren't atomic, maybe 'ps' could be sent off to another part of
> memory, which does seem terrible.

Hmm, the discussion seems to be based on the assumption that argv[0]
can be safely redirected to a different memory location. If that's the
case, we can prpbably rearrange the array, even if there's a small
window where ps might display a confusing command line, right?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Bypassing shared_buffers

2023-06-14 Thread Thomas Munro
On Thu, Jun 15, 2023 at 2:51 PM Vladimir Churyukin
 wrote:
> Do you foresee any difficulties in implementation of the "unwarm" operation? 
> It requires a cache flush operation,
> so I'm curious how complicated that is (probably there is a reason this is 
> not supported by Postgres by now? mssql and oracle support stuff like that 
> for a long time)

If they have a way to kick individual relations out of the buffer
pool, then I suspect they have an efficient way to find the relevant
buffers.  We'd have to scan the entire buffer pool, or (for small
relations), probe for blocks 0..n (when we know that n isn't too
high).  We'll probably eventually get something tree-based, like
operating system kernels and perhaps those other databases use for
their own buffer pools, which is useful for I/O merging and for faster
DROP, but until then you'll face the same problem while implementing
unwarm, and you'd probably have to understand a lot of details about
bufmgr.c and add some new interfaces.

As Tom says, in the end it's going to work out much like restarting,
which requires a pleasing zero lines of new code, perhaps explaining
why no one has tried this before...  Though of course you can be more
selective about which tables are zapped.

> Cluster restart is not an option for us unfortunately, as it will be required 
> for each query pretty much, and there are a lot of them.
> An ideal solution would be, if it's possible, to test it in parallel with 
> other activities...
> Evicting all the other stuff using pg_prewarm is an interesting idea though 
> (if a large prewarm operation really evicts all the previously stored data 
> reliably).
> It's a bit hacky, but thanks, I think it's possible to make this work with 
> some effort.
> It will require exclusive access just for that testing, which is not ideal 
> but may work for us.

You can use pg_buffercache to check the current contents of the buffer
pool, to confirm that a relation you're interested in is gone.

https://www.postgresql.org/docs/current/pgbuffercache.html#PGBUFFERCACHE-COLUMNS

I guess another approach if you really want to write code to do this
would be to introduce a function that takes a buffer ID and
invalidates it, and then you could use queries of pg_buffercache to
drive it.  It would simplify things greatly if you only supported
invalidating clean buffers, and then you could query pg_buffercache to
see if any dirty buffers are left and if so run a checkpoint and try
again or something like that...

Another thing I have wondered about while hacking on I/O code is
whether pg_prewarm should also have an unwarm-the-kernel-cache thing.
There is that drop_cache thing, but that's holus bolus and Linux-only.
Perhaps POSIX_FADV_WONTNEED could be used for this, though that would
seem to require a double decker bus-sized layering violation.




Re: Order changes in PG16 since ICU introduction

2023-06-14 Thread Jeff Davis
On Mon, 2023-06-12 at 23:04 +0200, Peter Eisentraut wrote:
> I object to adding a new provider for PG16 (patch 0001).

Added to July CF for 17.

> > 2. Patch 0004 is possibly out of scope for 16

> Also clearly a new feature.

Added to July CF for 17.

Regards,
Jeff Davis





[17] CREATE COLLATION default provider

2023-06-14 Thread Jeff Davis
Currently, CREATE COLLATION always defaults the provider to libc.

The attached patch causes it to default to libc if LC_COLLATE/LC_CTYPE
are specified, otherwise default to the current database default
collation's provider.

That way, the provider choice at initdb time then becomes the default
for "CREATE DATABASE ... TEMPLATE template0", which then becomes the
default provider for "CREATE COLLATION (LOCALE='...')".


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 329e32bfe5e1883a2cfd6e224c1d512b67256870 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 24 May 2023 09:53:02 -0700
Subject: [PATCH v11] CREATE COLLATION default provider.

If the LC_COLLATE or LC_CTYPE are specified for a new collation, the
default provider is libc. Otherwise, the default provider is the same
as the provider for the database default collation.

Previously, the default provider was always libc.
---
 contrib/citext/expected/create_index_acl.out |  2 +-
 contrib/citext/sql/create_index_acl.sql  |  2 +-
 doc/src/sgml/ref/create_collation.sgml   | 14 ++
 src/backend/commands/collationcmds.c |  7 ++-
 src/test/regress/expected/collate.linux.utf8.out | 10 +-
 src/test/regress/sql/collate.linux.utf8.sql  | 10 +-
 6 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/contrib/citext/expected/create_index_acl.out b/contrib/citext/expected/create_index_acl.out
index 33be13a92d..765867d36d 100644
--- a/contrib/citext/expected/create_index_acl.out
+++ b/contrib/citext/expected/create_index_acl.out
@@ -43,7 +43,7 @@ REVOKE ALL ON FUNCTION s.index_row_if FROM PUBLIC;
 -- Even for an empty table, CREATE INDEX checks ii_Predicate permissions.
 GRANT EXECUTE ON FUNCTION s.index_row_if TO regress_minimal;
 -- Non-extension, non-function objects.
-CREATE COLLATION s.coll (LOCALE="C");
+CREATE COLLATION s.coll (PROVIDER=libc, LOCALE="C");
 CREATE TABLE s.x (y s.citext);
 ALTER TABLE s.x OWNER TO regress_minimal;
 -- Empty-table DefineIndex()
diff --git a/contrib/citext/sql/create_index_acl.sql b/contrib/citext/sql/create_index_acl.sql
index 10b5225569..e338ac8799 100644
--- a/contrib/citext/sql/create_index_acl.sql
+++ b/contrib/citext/sql/create_index_acl.sql
@@ -45,7 +45,7 @@ REVOKE ALL ON FUNCTION s.index_row_if FROM PUBLIC;
 -- Even for an empty table, CREATE INDEX checks ii_Predicate permissions.
 GRANT EXECUTE ON FUNCTION s.index_row_if TO regress_minimal;
 -- Non-extension, non-function objects.
-CREATE COLLATION s.coll (LOCALE="C");
+CREATE COLLATION s.coll (PROVIDER=libc, LOCALE="C");
 CREATE TABLE s.x (y s.citext);
 ALTER TABLE s.x OWNER TO regress_minimal;
 -- Empty-table DefineIndex()
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..fd6c679694 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -121,10 +121,16 @@ CREATE COLLATION [ IF NOT EXISTS ] name FROM 
Specifies the provider to use for locale services associated with this
collation.  Possible values are
-   icuICU
-   (if the server was built with ICU support) or libc.
-   libc is the default.  See  for details.
+   icuICU (if
+   the server was built with ICU support) or libc. See
+for details.
+  
+  
+   If lc_colllate or
+   lc_ctype is specified, the default is
+   libc; otherwise, the default is the same as the
+   provider for the database default collation (see ).
   
  
 
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 2969a2bb21..20f976a3f8 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -226,7 +226,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+		{
+			if (lccollateEl || lcctypeEl)
+collprovider = COLLPROVIDER_LIBC;
+			else
+collprovider = default_locale.provider;
+		}
 
 		if (localeEl)
 		{
diff --git a/src/test/regress/expected/collate.linux.utf8.out b/src/test/regress/expected/collate.linux.utf8.out
index 01664f7c1b..588198d13e 100644
--- a/src/test/regress/expected/collate.linux.utf8.out
+++ b/src/test/regress/expected/collate.linux.utf8.out
@@ -1026,7 +1026,7 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
   quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
@@ -1034,7 +1034,7 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
 NOTICE:  collation "test0" already exists, skipping
-CREATE COLLATION IF NOT EXIS

Re: Add a perl function in Cluster.pm to generate WAL

2023-06-14 Thread Kyotaro Horiguchi
Mmm. It seems like the email I thought I'd sent failed to go out.

At Sun, 11 Jun 2023 07:14:54 +0530, Bharath Rupireddy 
 wrote in 
> On Wed, Aug 24, 2022 at 6:42 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Tue, 16 Aug 2022 18:40:49 +0200, Alvaro Herrera 
> >  wrote in
> > > On 2022-Aug-16, Andrew Dunstan wrote:
> > > Yeah, I agree with that for advance_wal.  Regarding find_in_log, that
> > > one seems general enough to warrant being in Cluster.pm -- consider
> > > issues_sql_like, which also slurps_file($log).  That could be unified a
> > > little bit, I think.
> >
> > +1
> 
> With the generalized function for find_in_log() has been added as part
> of 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e25e5f7fc6b74c9d4ce82627e9145ef5537412e2,
> I'm proposing a generalized function for advance_wal(). Please find
> the attached patch.
> 
> I tried to replace the existing tests with the new cluster function
> advance_wal(). Please let me know if I'm missing any other tests.
> Also, this new function can be used by an in-progress feature -
> https://commitfest.postgresql.org/43/3663/.
> 
> Thoughts?

Thanks!

+   "CREATE TABLE tt (); DROP TABLE tt; SELECT 
pg_switch_wal();");

At least since 11, we can utilize pg_logical_emit_message() for this
purpose. It's more lightweight and seems appropriate, not only because
it doesn't cause any side effects but also bacause we don't have to
worry about name conflicts.


-SELECT 'finished';",
-   timeout => $PostgreSQL::Test::Utils::timeout_default));
-is($result[1], 'finished', 'check if checkpoint command is not blocked');
-
+$node_primary2->advance_wal(1);
+$node_primary2->safe_psql('postgres', 'CHECKPOINT;');

This test anticipates that the checkpoint could get blocked. Shouldn't
we keep the timeout?


-$node_primary->safe_psql(
-   'postgres', "create table retain_test(a int);
-select 
pg_switch_wal();
-insert 
into retain_test values(1);
-
checkpoint;");
+$node_primary->advance_wal(1);
+$node_primary->safe_psql('postgres', "checkpoint;");

The original test generated some WAL after the segment switch, which
appears to be a significant characteristics of the test.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: Do we want a hashset type?

2023-06-14 Thread jian he
On Thu, Jun 15, 2023 at 5:04 AM Joel Jacobson  wrote:

> On Wed, Jun 14, 2023, at 15:16, Tomas Vondra wrote:
> > On 6/14/23 14:57, Joel Jacobson wrote:
> >> Would it be feasible to teach the planner to utilize the internal hash
> table of
> >> hashset directly? In the case of arrays, the hash table construction is
> an
> ...
> > It's definitely something I'd leave out of v0, personally.
>
> OK, thanks for guidance, I'll stay away from it.
>
> I've been doing some preparatory work on this todo item:
>
> > 3) support for other types (now it only works with int32)
>
> I've renamed the type from "hashset" to "int4hashset",
> and the SQL-functions are now prefixed with "int4"
> when necessary. The overloaded functions with
> int4hashset as input parameters don't need to be prefixed,
> e.g. hashset_add(int4hashset, int).
>
> Other changes since last update (4e60615):
>
> * Support creation of empty hashset using '{}'::hashset
> * Introduced a new function hashset_capacity() to return the current
> capacity
>   of a hashset.
> * Refactored hashset initialization:
>   - Replaced hashset_init(int) with int4hashset() to initialize an empty
> hashset
> with zero capacity.
>   - Added int4hashset_with_capacity(int) to initialize a hashset with
> a specified capacity.
> * Improved README.md and testing
>
> As a next step, I'm planning on adding int8 support.
>
> Looks and sounds good?
>
> /Joel

I am not sure the following results are correct.
with cte as (
select hashset(x) as x
,hashset_capacity(hashset(x))
,hashset_count(hashset(x))
from generate_series(1,10) g(x))
select *
,'|' as delim
, hashset_add(x,1::int)
,hashset_capacity(hashset_add(x,1::int))
,hashset_count(hashset_add(x,1::int))
fromcte \gx


results:
-[ RECORD 1 ]+-
x| {8,1,10,3,9,4,6,2,1,5,7}
hashset_capacity | 64
hashset_count| 10
delim| |
hashset_add  | {8,1,10,3,9,4,6,2,1,5,7}
hashset_capacity | 64
hashset_count| 11

but:
with cte as(select '{1,2}'::int4hashset as x)   select
x,hashset_add(x,3::int)  from cte;

returns
   x   | hashset_add
---+-
 {1,2} | {3,1,2}
(1 row)
last simple query seems more sensible to me.


Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2023-06-14 Thread Nathan Bossart
On Thu, Jun 15, 2023 at 09:46:33AM +0900, Michael Paquier wrote:
> The result after 0001 is applied is that a couple of
> object_ownercheck() calls that existed before ff9618e are removed from
> some ACL checks in the REINDEX, CLUSTER and VACUUM paths.  Is that OK
> for shared relations and shouldn't cluster_is_permitted_for_relation()
> include that?  vacuum_is_permitted_for_relation() is consistent on
> this side.

These object_ownercheck() calls were removed because they were redundant,
as owners have all privileges by default.  Privileges can be revoked from
the owner, so an extra ownership check would effectively bypass the
relation's ACL in that case.  I looked around and didn't see any other
examples of a combined ownership and ACL check like we were doing for
MAINTAIN.  The only thing that gives me pause is that the docs call out
ownership as sufficient for some maintenance commands.  With these patches,
that's only true as long as no one revokes privileges from the owner.  IMO
we should update the docs and leave out the ownership checks since MAINTAIN
is now a grantable privilege like any other.  WDYT?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Bypassing shared_buffers

2023-06-14 Thread Vladimir Churyukin
It could be cheaper, if the testing is done for many SELECT queries
sequentially - you need to flush dirty buffers just once pretty much.

-Vladimir Churyukin

On Wed, Jun 14, 2023 at 7:43 PM Tom Lane  wrote:

> Thomas Munro  writes:
> > There are two levels of cache.  If you're on Linux you can ask it to
> > drop its caches by writing certain values to /proc/sys/vm/drop_caches.
> > For PostgreSQL's own buffer pool, it would be nice if someone would
> > extend the pg_prewarm extension to have a similar 'unwarm' operation,
> > for testing like that.  But one thing you can do is just restart the
> > database cluster, or use pg_prewarm to fill its buffer pool up with
> > other stuff (and thus kick out the stuff you didn't want in there).
>
> But that'd also have to push out any dirty buffers.  I'm skeptical
> that it'd be noticeably cheaper than stopping and restarting the
> server.
>
> regards, tom lane
>


Re: Bypassing shared_buffers

2023-06-14 Thread Vladimir Churyukin
Do you foresee any difficulties in implementation of the "unwarm"
operation? It requires a cache flush operation,
so I'm curious how complicated that is (probably there is a reason this is
not supported by Postgres by now? mssql and oracle support stuff like that
for a long time)
Cluster restart is not an option for us unfortunately, as it will be
required for each query pretty much, and there are a lot of them.
An ideal solution would be, if it's possible, to test it in parallel with
other activities...
Evicting all the other stuff using pg_prewarm is an interesting idea though
(if a large prewarm operation really evicts all the previously stored data
reliably).
It's a bit hacky, but thanks, I think it's possible to make this work with
some effort.
It will require exclusive access just for that testing, which is not ideal
but may work for us.

-Vladimir )churyukin


On Wed, Jun 14, 2023 at 7:29 PM Thomas Munro  wrote:

> On Thu, Jun 15, 2023 at 1:37 PM Vladimir Churyukin
>  wrote:
> > Ok, got it, thanks.
> > Is there any alternative approach to measuring the performance as if the
> cache was empty?
>
> There are two levels of cache.  If you're on Linux you can ask it to
> drop its caches by writing certain values to /proc/sys/vm/drop_caches.
> For PostgreSQL's own buffer pool, it would be nice if someone would
> extend the pg_prewarm extension to have a similar 'unwarm' operation,
> for testing like that.  But one thing you can do is just restart the
> database cluster, or use pg_prewarm to fill its buffer pool up with
> other stuff (and thus kick out the stuff you didn't want in there).
>


Re: [DOC] Update ALTER SUBSCRIPTION documentation v3

2023-06-14 Thread Amit Kapila
On Wed, Jun 14, 2023 at 6:52 PM Peter Eisentraut  wrote:
>
> On 14.06.23 05:09, Amit Kapila wrote:
> > The latest version looks good to me as well. I think we should
> > backpatch this change as this is a user-facing message change in docs
> > and code. What do you guys think?
>
> Isn't that a reason *not* to backpatch it?
>

I wanted to backpatch the following change which provides somewhat
accurate information about what a user needs to do when it faces an
error.
To proceed in
-   this situation, disassociate the subscription from the replication slot by
-   executing ALTER SUBSCRIPTION ... SET (slot_name = NONE).
+   this situation, first DISABLE the subscription, and then
+   disassociate it from the replication slot by executing
+   ALTER SUBSCRIPTION ... SET (slot_name = NONE).

Now, along with this change, there is a change in errhint as well
which I am not sure about whether to backpatch or not. I think we have
the following options (a) commit both doc and code change in HEAD (b)
commit both doc and code change in v17 when the next version branch
opens (c) backpatch the doc change and commit the code change in HEAD
only (d) backpatch the doc change and commit the code change in v17
(e) backpatch both the doc and code change.

What do you think?

-- 
With Regards,
Amit Kapila.




Re: Bypassing shared_buffers

2023-06-14 Thread Tom Lane
Thomas Munro  writes:
> There are two levels of cache.  If you're on Linux you can ask it to
> drop its caches by writing certain values to /proc/sys/vm/drop_caches.
> For PostgreSQL's own buffer pool, it would be nice if someone would
> extend the pg_prewarm extension to have a similar 'unwarm' operation,
> for testing like that.  But one thing you can do is just restart the
> database cluster, or use pg_prewarm to fill its buffer pool up with
> other stuff (and thus kick out the stuff you didn't want in there).

But that'd also have to push out any dirty buffers.  I'm skeptical
that it'd be noticeably cheaper than stopping and restarting the
server.

regards, tom lane




Consistent coding for the naming of LR workers

2023-06-14 Thread Peter Smith
Hi,

There are different types of Logical Replication workers -- e.g.
tablesync workers, apply workers, and parallel apply workers.

The logging and errors often name these worker types, but during a
recent code review, I noticed some inconsistency in the way this is
done:
a) there is a common function get_worker_name() to return the name for
the worker type,  -- OR --
b) the worker name is just hardcoded in the message/error

I think it is not ideal to cut/paste the same hardwired strings over
and over. IMO it just introduces an unnecessary risk of subtle naming
differences creeping in.

~~

It is better to have a *single* point where these worker names are
defined, so then all output uses identical LR worker nomenclature.

PSA a small patch to modify the code accordingly. This is not intended
to be a functional change - just a code cleanup.

Thoughts?

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Consistent-naming-of-LR-workers.patch
Description: Binary data


Re: Bypassing shared_buffers

2023-06-14 Thread Thomas Munro
On Thu, Jun 15, 2023 at 1:37 PM Vladimir Churyukin
 wrote:
> Ok, got it, thanks.
> Is there any alternative approach to measuring the performance as if the 
> cache was empty?

There are two levels of cache.  If you're on Linux you can ask it to
drop its caches by writing certain values to /proc/sys/vm/drop_caches.
For PostgreSQL's own buffer pool, it would be nice if someone would
extend the pg_prewarm extension to have a similar 'unwarm' operation,
for testing like that.  But one thing you can do is just restart the
database cluster, or use pg_prewarm to fill its buffer pool up with
other stuff (and thus kick out the stuff you didn't want in there).




Re: Do we want a hashset type?

2023-06-14 Thread jian he
On Thu, Jun 15, 2023 at 5:04 AM Joel Jacobson  wrote:

> On Wed, Jun 14, 2023, at 15:16, Tomas Vondra wrote:
> > On 6/14/23 14:57, Joel Jacobson wrote:
> >> Would it be feasible to teach the planner to utilize the internal hash
> table of
> >> hashset directly? In the case of arrays, the hash table construction is
> an
> ...
> > It's definitely something I'd leave out of v0, personally.
>
> OK, thanks for guidance, I'll stay away from it.
>
> I've been doing some preparatory work on this todo item:
>
> > 3) support for other types (now it only works with int32)
>
> I've renamed the type from "hashset" to "int4hashset",
> and the SQL-functions are now prefixed with "int4"
> when necessary. The overloaded functions with
> int4hashset as input parameters don't need to be prefixed,
> e.g. hashset_add(int4hashset, int).
>
> Other changes since last update (4e60615):
>
> * Support creation of empty hashset using '{}'::hashset
> * Introduced a new function hashset_capacity() to return the current
> capacity
>   of a hashset.
> * Refactored hashset initialization:
>   - Replaced hashset_init(int) with int4hashset() to initialize an empty
> hashset
> with zero capacity.
>   - Added int4hashset_with_capacity(int) to initialize a hashset with
> a specified capacity.
> * Improved README.md and testing
>
> As a next step, I'm planning on adding int8 support.
>
> Looks and sounds good?
>
> /Joel


still playing around with hashset-0.0.1-a8a282a.patch.

I think "postgres.h" should be on the top, (someone have said it on another
email thread, I forgot who said that)

In my
local /home/jian/postgres/pg16/include/postgresql/server/libpq/pqformat.h:

> /*
>  * Append a binary integer to a StringInfo buffer
>  *
>  * This function is deprecated; prefer use of the functions above.
>  */
> static inline void
> pq_sendint(StringInfo buf, uint32 i, int b)


So I changed to pq_sendint32.

ending and beginning, and in between white space should be stripped. The
following c example seems ok for now. but I am not sure, I don't know how
to glue it in hashset_in.

forgive me the patch name

/*
gcc /home/jian/Desktop/regress_pgsql/strip_white_space.c && ./a.out
*/

#include
#include
#include
#include
#include 
#include

/*
 * array_isspace() --- a non-locale-dependent isspace()
 *
 * We used to use isspace() for parsing array values, but that has
 * undesirable results: an array value might be silently interpreted
 * differently depending on the locale setting.  Now we just hard-wire
 * the traditional ASCII definition of isspace().
 */
static bool
array_isspace(char ch)
{
if (ch == ' ' ||
ch == '\t' ||
ch == '\n' ||
ch == '\r' ||
ch == '\v' ||
ch == '\f')
return true;
return false;
}

int main(void)
{
long *temp   = malloc(10 * sizeof(long));
memset(temp,0,10);
charsource[5][50]   = {{0}};
snprintf(source[0],sizeof(source[0]),"%s","  { 1   ,   20  }");
snprintf(source[1],sizeof(source[0]),"%s","   { 1  ,20 ,   30 ");
snprintf(source[2],sizeof(source[0]),"%s","   {1  ,20 ,   30 ");
snprintf(source[3],sizeof(source[0]),"%s","   {1  ,  20 ,   30  }");
snprintf(source[4],sizeof(source[0]),"%s","   {1  ,  20 ,   30  }
");
/* Make a modifiable copy of the input */
char*p;
charstring_save[50];

for(int j = 0; j < 5; j++)
{
snprintf(string_save,sizeof(string_save),"%s",source[j]);
p = string_save;

int i = 0;
while (array_isspace(*p))
p++;
if (*p != '{')
{
printf("line: %d should be {\n",__LINE__);
exit(EXIT_FAILURE);
}

for (;;)
{
char   *q;
if (*p == '{')
p++;
temp[i] = strtol(p, &q,10);
printf("temp[j=%d] [%d]=%ld\n",j,i,temp[i]);

if (*q == '}' && (*(q+1) == '\0'))
{
printf("all works ok now exit\n");
break;
}
if( !array_isspace(*q) && *q != ',')
{
printf("wrong format. program will exit\n");
exit(EXIT_FAILURE);
}
while(array_isspace(*q))
q++;
if(*q != ',')
break;
else
p = q+1;
i++;
}
}
}
diff --git a/hashset.c b/hashset.c
index d62e0491..e71764c5 100644
--- a/hashset.c
+++ b/hashset.c
@@ -3,15 +3,13 @@
  *
  * Copyright (C) Tomas Vondra, 2019
  */
+#include "postgres.h"
 
-#include 
 #include 
-#include 
 #include 
 #include 
 #include 
 
-#include "postgres.h"
 #include "libpq/pqformat.h"
 #include "nodes/memnodes.h"
 #include "utils/array.h"
@@ -255,10 +253,10 @@ hashset_send(PG_FUNCTION_ARGS)
 	pq_begintypsend(&buf);
 
 	/* Send the non-data fields */
-	pq_sendint(&buf, set->flags, 4);
-	pq_sendint(&buf, set->maxelements, 4);
-	pq_sendint(&buf, set->nelements, 4);
-	pq_sendint(&buf, set->hashfn_id, 4);
+	pq_sendint32(&buf,set->flags)

Re: Shouldn't cost_append() also scale the partial path's cost?

2023-06-14 Thread Kyotaro Horiguchi
At Wed, 14 Jun 2023 14:36:54 +0200, Antonin Houska  wrote in 
> Like in cost_seqscan(), I'd expect the subpath cost to be divided among
> parallel workers. The patch below shows what I mean. Am I right?

If I've got it right, the total cost of a partial seqscan path
comprises a distributed CPU run cost and an undistributed disk run
cost. If we want to adjust for a different worker number, we should
only tweak the CPU component of the total cost. By default, if one
page contains 100 rows (I guess a moderate ratio), these two costs are
balanced at a 1:1 ratio and the CPU run cost and disk run cost in a
partial seqscan path is 1:n (n = #workers).  If we adjust the run cost
in the porposed manner, it adjusts the CPU run cost correctly but in
turn the disk run cost gets wrong (by a larger error in this premise).

In short, it will get wrong in a different way.

Actually it looks strange that rows are adjusted but cost is not, so
we might want to add an explanation in this aspect.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Replace (GUC_UNIT_MEMORY | GUC_UNIT_TIME) with GUC_UNIT in guc.c

2023-06-14 Thread Masahiko Sawada
On Wed, Jun 14, 2023 at 4:47 PM Michael Paquier  wrote:
>
> On Wed, Jun 14, 2023 at 03:38:10PM +0800, Richard Guo wrote:
> > +1.  BTW, it seems that GUC_UNIT_TIME is not used anywhere except in
> > GUC_UNIT.  I was wondering if we can retire it, but maybe we'd better
> > not.  It still indicates that we need to use time units table.
>
> Some out-of-core code declaring custom GUCs could rely on that, so
> it is better not to remove it.

+1 not to remove it.

I've attached the patch to fix  (GUC_UNIT_MEMORY | GUC_UNIT_TIME)
thing, and am going to push it later today to only master branch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v1-0001-Replace-GUC_UNIT_MEMORY-GUC_UNIT_TIME-with-GUC_UN.patch
Description: Binary data


Re: Bypassing shared_buffers

2023-06-14 Thread Vladimir Churyukin
Ok, got it, thanks.
Is there any alternative approach to measuring the performance as if the
cache was empty?
The goal is basically to calculate the max possible I/O time for a query,
to get a range between min and max timing.
It's ok if it's done during EXPLAIN ANALYZE call only, not for regular
executions.
One thing I can think of is even if the data in storage might be stale,
issue read calls from it anyway, for measuring purposes.
For EXPLAIN ANALYZE it should be fine as it doesn't return real data anyway.
Is it possible that some pages do not exist in storage at all? Is there a
different way to simulate something like that?

-Vladimir Churyukin

On Wed, Jun 14, 2023 at 6:22 PM Tom Lane  wrote:

> Vladimir Churyukin  writes:
> > There is often a need to test particular queries executed in the
> worst-case
> > scenario, i.e. right after a server restart or with no or minimal amount
> of
> > data in shared buffers. In Postgres it's currently hard to achieve (other
> > than to restart the server completely to run a single query, which is not
> > practical). Is there a simple way to introduce a GUC variable that makes
> > queries bypass shared_buffers and always read from storage? It would make
> > testing like that orders of magnitude simpler. I mean, are there serious
> > technical obstacles or any other objections to that idea in principle?
>
> It's a complete non-starter.  Pages on disk are not necessarily up to
> date; but what is in shared buffers is.
>
> regards, tom lane
>


Re: Bypassing shared_buffers

2023-06-14 Thread Tom Lane
Vladimir Churyukin  writes:
> There is often a need to test particular queries executed in the worst-case
> scenario, i.e. right after a server restart or with no or minimal amount of
> data in shared buffers. In Postgres it's currently hard to achieve (other
> than to restart the server completely to run a single query, which is not
> practical). Is there a simple way to introduce a GUC variable that makes
> queries bypass shared_buffers and always read from storage? It would make
> testing like that orders of magnitude simpler. I mean, are there serious
> technical obstacles or any other objections to that idea in principle?

It's a complete non-starter.  Pages on disk are not necessarily up to
date; but what is in shared buffers is.

regards, tom lane




Re: Bypassing shared_buffers

2023-06-14 Thread Vladimir Churyukin
To be clear, I'm talking about bypassing shared buffers for reading data /
indexes only, not about disabling it completely (which I guess is
impossible anyway).

-Vladimir Churyukin

On Wed, Jun 14, 2023 at 5:57 PM Vladimir Churyukin 
wrote:

> Hello,
>
> There is often a need to test particular queries executed in the
> worst-case scenario, i.e. right after a server restart or with no or
> minimal amount of data in shared buffers. In Postgres it's currently hard
> to achieve (other than to restart the server completely to run a single
> query, which is not practical). Is there a simple way to introduce a GUC
> variable that makes queries bypass shared_buffers and always read from
> storage? It would make testing like that orders of magnitude simpler. I
> mean, are there serious technical obstacles or any other objections to that
> idea in principle?
>
>  Thanks,
> -Vladimir Churyukin
>


Bypassing shared_buffers

2023-06-14 Thread Vladimir Churyukin
Hello,

There is often a need to test particular queries executed in the worst-case
scenario, i.e. right after a server restart or with no or minimal amount of
data in shared buffers. In Postgres it's currently hard to achieve (other
than to restart the server completely to run a single query, which is not
practical). Is there a simple way to introduce a GUC variable that makes
queries bypass shared_buffers and always read from storage? It would make
testing like that orders of magnitude simpler. I mean, are there serious
technical obstacles or any other objections to that idea in principle?

 Thanks,
-Vladimir Churyukin


Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2023-06-14 Thread Michael Paquier
On Wed, Jun 14, 2023 at 11:17:11AM -0700, Nathan Bossart wrote:
> On Tue, Jun 13, 2023 at 04:54:42PM -0700, Nathan Bossart wrote:
> > On Wed, Jun 14, 2023 at 08:16:15AM +0900, Michael Paquier wrote:
> >> So, yes, agreed about the removal of has_partition_ancestor_privs().
> >> I am adding an open item assigned to you and Jeff.
> > 
> > Thanks.  I suspect there's more discussion incoming, but I'm hoping to
> > close this item one way or another by 16beta2.
> 
> Concretely, I am proposing something like the attached patches.

The result after 0001 is applied is that a couple of
object_ownercheck() calls that existed before ff9618e are removed from
some ACL checks in the REINDEX, CLUSTER and VACUUM paths.  Is that OK
for shared relations and shouldn't cluster_is_permitted_for_relation()
include that?  vacuum_is_permitted_for_relation() is consistent on
this side.

Here are the paths that now differ:
cluster_rel
get_tables_to_cluster
get_tables_to_cluster_partitioned
RangeVarCallbackForReindexIndex
ReindexMultipleTables

0002 looks OK to retain the skip check for toast relations in the
VACUUM case.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Missing dep on Catalog.pm in meson rules

2023-06-14 Thread Tristan Partin
On Wed Jun 14, 2023 at 2:32 PM CDT, Andres Freund wrote:
> Hi,
>
> On 2023-06-09 11:43:54 -0700, Andres Freund wrote:
> > On 2023-06-02 10:13:44 -0500, Tristan Partin wrote:
> > > On Fri Jun 2, 2023 at 8:47 AM CDT, Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2023-06-02 08:10:43 -0500, Tristan Partin wrote:
> > > > > > I wonder if we instead could just make perl output the files it 
> > > > > > loads and
> > > > > > handle dependencies automatically that way? But that's more work, 
> > > > > > so it's
> > > > > > probably the right thing to go for the manual path for now.
> > > > > 
> > > > > I am not familar with Perl enough (at all haha) to know if that is
> > > > > possible. I don't know exactly what these Perl files do, but perhaps 
> > > > > it
> > > > > might make sense to have some global lookup table that is setup near 
> > > > > the
> > > > > beginning of the script.
> > > >
> > > > It'd be nice to have something more general - there are other perl 
> > > > modules we
> > > > load, e.g.
> > > > ./src/backend/catalog/Catalog.pm
> > > > ./src/backend/utils/mb/Unicode/convutils.pm
> > > > ./src/tools/PerfectHash.pm
> > > >
> > > >
> > > > > perl_files = {
> > > > >   'Catalog.pm': files('path/to/Catalog.pm'),
> > > > >   ...
> > > > > }
> > > >
> > > > I think you got it, but just to make sure: I was thinking of generating 
> > > > a
> > > > depfile from within perl. Something like what you propose doesn't quite 
> > > > seems
> > > > like a sufficient improvement.
> > > 
> > > Whatever I am proposing is definitely subpar to generating a depfile. So
> > > if that can be done, that is the best option!
> > 
> > I looked for a bit, but couldn't find an easy way to do so. I would still 
> > like
> > to pursue going towards dep files for the perl scripts, even if that 
> > requires
> > explicit support in the perl scripts, but that's a change for later.
>
> Took a second look - sure looks like just using values %INC should suffice?
>
> Ilmari, you're the perl expert, is there an issue with that?
>
> Tristan, any chance you're interested hacking that up for a bunch of the
> scripts? Might be worth adding a common helper for, I guess?
>
> Something like
>
> for (values %INC)
> {
>   print STDERR "$kw_def_file: $_\n";
> }
>
> seems to roughly do the right thing for gen_keywordlist.pl. Of course for
> something real it'd need an option where to put that data, instead of printing
> to stderr.

I would need to familiarize myself with perl, but since you've written
probably all or almost all that needs to be written, I can probably
scrape by :).

I definitely have the bandwidth to make this change though pending what
Ilmari says.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: Do we want a hashset type?

2023-06-14 Thread Tom Dunstan
On Wed, 14 Jun 2023 at 19:14, Tomas Vondra 
wrote:

> > ...So we'd want the same index usage as
> > =ANY(array) but would like faster row checking than we get with an array
> > when other indexes are used.
>
> We kinda already do this since PG14 (commit 50e17ad281), actually. If
> the list is long enough (9 values or more), we'll build a hash table
> during query execution. So pretty much exactly what you're asking for.
>

Ha! That is great. Unfortunately we can't rely on it as we have customers
using versions back to 12. But good to know that it's available when we
bump the required versions.

Thanks

Tom


Re: Inconsistent results with libc sorting on Windows

2023-06-14 Thread Thomas Munro
On Wed, Jun 14, 2023 at 10:50 PM Juan José Santamaría Flecha
 wrote:
> Yes, I think the situation is quite similar to what you describe, with its 
> WIN32 peculiarities. Take for example the attached program, it'll output:
>
> s1 = s2
> s2 = s3
> s1 > s3
> c1 > c2
> c2 > c3
> c1 > c3
>
> As you can see the test for CompareStringEx() is broken, but we get a sane 
> answer with LCMapStringEx().

Given that the documented behaviour is that ".. the sort key produces
the same order as when the source string is used in CompareString or
CompareStringEx"[1], this seems like a reportable bug, unless perhaps
your test program is hiding an error with that default case you have.

[1] 
https://learn.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications




Re: [17] collation provider "builtin"

2023-06-14 Thread Thomas Munro
On Thu, Jun 15, 2023 at 10:55 AM Jeff Davis  wrote:
> The locale "C" (and equivalently, "POSIX") is not really a libc locale;
> it's implemented internally with memcmp for collation and
> pg_ascii_tolower, etc., for ctype.
>
> The attached patch implements a new collation provider, "builtin",
> which only supports "C" and "POSIX". It does not change the initdb
> default provider, so it must be requested explicitly. The user will be
> guaranteed that collations with provider "builtin" will never change
> semantics; therefore they need no version and indexes are not at risk
> of corruption. See previous discussion[1].

I haven't studied the details yet but +1 for this idea.  It models
what we are actually doing.




[17] collation provider "builtin"

2023-06-14 Thread Jeff Davis
The locale "C" (and equivalently, "POSIX") is not really a libc locale;
it's implemented internally with memcmp for collation and
pg_ascii_tolower, etc., for ctype.

The attached patch implements a new collation provider, "builtin",
which only supports "C" and "POSIX". It does not change the initdb
default provider, so it must be requested explicitly. The user will be
guaranteed that collations with provider "builtin" will never change
semantics; therefore they need no version and indexes are not at risk
of corruption. See previous discussion[1].

(Caveat: the "C" locale ordering may depend on the specific encoding.
For UTF-8, memcmp is equivalent to code point order, but that may not
be true of other encodings. Encodings can't change during pg_upgrade,
so indexes are not at risk; but the encoding can change during
dump/reload so results may change.)

This built-in provider is just here to support "C" and "POSIX" using
memcmp/pg_ascii_*, and no other locales. It is not intended as a
general license to take on the problem of maintaining locales. We may
support some other locale name to mean "code point order", but like
UCS_BASIC, that would just be an alias for locale "C" that also checks
that the encoding is UTF-8.

Motivation:

Why not just use the "C" locale with the libc provider?

1. It's more clear to the user what's going on: Postgres is managing
the provider; we aren't passing it on to libc at all. With the libc
provider, something like C.UTF-8 leaves room for confusion[2]; with the
built-in provider, "C.UTF-8" is not a supported locale and the user
will get an error if it's requested.

2. The libc provider conflates LC_COLLATE/LC_CTYPE with the default
collation; whereas in the icu and built-in providers, they are separate
concepts. With ICU and builtin, you can set LC_COLLATE and LC_CTYPE for
a database to whatever you want at creation time

3. If you use libc with locale "C", then future CREATE DATABASE
commands will default to the libc provider (because that would be the
provider for template0), which is not what the user wants if the
purpose is to avoid problems with external collation providers. If you
use the built-in provider instead, then future CREATE DATABASE commands
will only succeed if the user either specifies locale C or explicitly
chooses a new provider; which will allow them a chance to prepare for
any challenges.

4. It makes it easier to document the trade-offs between various
providers without confusing special cases around the C locale.


[1]
https://www.postgresql.org/message-id/87sfb4gwgv.fsf%40news-spur.riddles.org.uk
[2]
https://www.postgresql.org/message-id/8a3dc06f-9b9d-4ed7-9a12-2070d8b01...@manitou-mail.org


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 065cdf57239280ef121b51d2616c0729946af9dd Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 1 May 2023 15:38:29 -0700
Subject: [PATCH v11] Introduce collation provider "builtin".

Only supports locale "C" (or equivalently, "POSIX"). Provides
locale-unaware semantics that are implemented as fast byte operations
in Postgres, independent of the operating system or any provider
libraries.

Equivalent (in semantics and implementation) to the libc provider with
locale "C", except that LC_COLLATE and LC_CTYPE can be set
independently.

Use provider "builtin" for built-in collation "ucs_basic" instead of
libc.

Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798...@joeconway.com
---
 doc/src/sgml/charset.sgml  |  89 +
 doc/src/sgml/ref/create_collation.sgml |  11 ++-
 doc/src/sgml/ref/create_database.sgml  |   8 +-
 doc/src/sgml/ref/createdb.sgml |   2 +-
 doc/src/sgml/ref/initdb.sgml   |   7 +-
 src/backend/catalog/pg_collation.c |   7 +-
 src/backend/commands/collationcmds.c   | 103 +
 src/backend/commands/dbcommands.c  |  69 ++---
 src/backend/utils/adt/pg_locale.c  |  27 ++-
 src/backend/utils/init/postinit.c  |  10 ++-
 src/bin/initdb/initdb.c|  24 --
 src/bin/initdb/t/001_initdb.pl |  47 +++
 src/bin/pg_dump/pg_dump.c  |  49 +---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl |  35 +++--
 src/bin/psql/describe.c|   4 +-
 src/bin/scripts/createdb.c |   2 +-
 src/bin/scripts/t/020_createdb.pl  |  56 ++
 src/include/catalog/pg_collation.dat   |   3 +-
 src/include/catalog/pg_collation.h |   3 +
 src/test/icu/t/010_database.pl |   9 +++
 src/test/regress/expected/collate.out  |  25 +-
 src/test/regress/sql/collate.sql   |  10 +++
 22 files changed, 518 insertions(+), 82 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index ed84465996..b38cf82f83 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -342,22 +342,14 @@ initdb --locale=sv_SE
Locale Providers
 

-PostgreSQL supports multiple locale
-providers.  This sp

Re: add non-option reordering to in-tree getopt_long

2023-06-14 Thread Nathan Bossart
On Wed, Jun 14, 2023 at 03:11:54PM -0700, Noah Misch wrote:
> Here's some output from this program (on AIX 7.1, same output when compiled
> 32-bit or 64-bit):
> 
> $ ./a.out a b c d e f
> f e d c b a ./a.out

Thanks again.

> Interesting discussion here, too:
> https://github.com/libuv/libuv/pull/1187

Hm.  IIUC modifying the argv pointers on AIX will modify the process title,
which could cause 'ps' to temporarily show duplicate/missing arguments
during option parsing.  That doesn't seem too terrible, but if pointer
assignments aren't atomic, maybe 'ps' could be sent off to another part of
memory, which does seem terrible.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread James Addison
On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik  wrote:
>
>
>
> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
> > Is the following true or not?
> >
> > 1. If we switch processes to threads but leave the amount of session
> > local variables unchanged, there would be hardly any performance gain.
> > 2. If we move some backend's local variables into shared memory then
> > the performance gain would be very near to what we get with threads
> > having equal amount of session-local variables.
> >
> > In other words, the overall goal in principle is to gain from less
> > memory copying wherever it doesn't add the burden of locks for
> > concurrent variables access?
> >
> > Regards,
> > Pavel Borisov,
> > Supabase
> >
> >
> IMHO both statements are not true.
> Switching to threads will cause less context switch overhead (because
> all threads are sharing the same memory space and so preserve TLB.
> How big will be this advantage? In my prototype I got ~10%. But may be
> it is possible to fin workloads when it is larger.

Hi Konstantin - do you have code/links that you can share for the
prototype and benchmarks used to gather those results?




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread James Addison
On Wed, 14 Jun 2023 at 20:48, Robert Haas  wrote:
>
> On Wed, Jun 14, 2023 at 3:16 PM James Addison  wrote:
> > I think that they're practical performance-related questions about the
> > benefits of performing a technical migration that could involve
> > significant development time, take years to complete, and uncover
> > problems that cause reliability issues for a stable, proven database
> > management system.
>
> I don't. I think they're reflecting confusion about what the actual,
> practical path forward is.

Ok.  My concern is that the balance between the downstream ecosystem
impact (people and processes that use PIDs to identify, monitor and
manage query and background processes, for example) compared to the
benefits (performance improvement for some -- but what kind of? --
workloads) seems unclear, and if it's unclear, it's less likely to be
compelling.

Pavel's message and questions seem to poke at some of the potential
limitations of the performance improvements, and Andres' response
mentions reduced complexity and reduced context-switching.  Elsewhere
I also see that TLB (translation lookaside buffer?) lookups in
particular should see improvements.  Those are good, but somewhat
unquantified.

The benefits are less of an immediate concern if there's going to be a
migration/transition phase where both the process model and the thread
model are available.  But again, if the benefits of the threading
model aren't clear, people are unlikely to want to switch, and I don't
think that the cost for people and systems to migrate from tooling and
methods built around processes will be zero.  That could lead to a bad
outcome, where the codebase includes both models and yet is unable to
plan to simplify to one.




Re: add non-option reordering to in-tree getopt_long

2023-06-14 Thread Noah Misch
On Wed, Jun 14, 2023 at 02:28:16PM -0700, Nathan Bossart wrote:
> On Tue, Jun 13, 2023 at 05:17:37PM -0700, Noah Misch wrote:
> > If you have a test program to be run, I can run it on AIX.
> 
> Thanks.  The patch above [0] adjusts 040_createuser.pl to test modifying
> argv, so that's one test program.  And here's a few lines for reversing
> argv:
> 
>   #include 
> 
>   int
>   main(int argc, char *argv[])
>   {
>   for (int i = 0; i < argc / 2; i++)
>   {
>   char   *tmp = argv[i];
> 
>   argv[i] = argv[argc - i - 1];
>   argv[argc - i - 1] = tmp;
>   }
> 
>   for (int i = 0; i < argc; i++)
>   printf("%s ", argv[i]);
>   printf("\n");
>   }

Here's some output from this program (on AIX 7.1, same output when compiled
32-bit or 64-bit):

$ ./a.out a b c d e f
f e d c b a ./a.out

Interesting discussion here, too:
https://github.com/libuv/libuv/pull/1187




Re: add non-option reordering to in-tree getopt_long

2023-06-14 Thread Nathan Bossart
On Tue, Jun 13, 2023 at 05:17:37PM -0700, Noah Misch wrote:
> If you have a test program to be run, I can run it on AIX.

Thanks.  The patch above [0] adjusts 040_createuser.pl to test modifying
argv, so that's one test program.  And here's a few lines for reversing
argv:

#include 

int
main(int argc, char *argv[])
{
for (int i = 0; i < argc / 2; i++)
{
char   *tmp = argv[i];

argv[i] = argv[argc - i - 1];
argv[argc - i - 1] = tmp;
}

for (int i = 0; i < argc; i++)
printf("%s ", argv[i]);
printf("\n");
}

[0] 
https://postgr.es/m/attachment/147420/v1-0001-Teach-in-tree-getopt_long-to-move-non-options-to-.patch

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Do we want a hashset type?

2023-06-14 Thread Joel Jacobson
On Wed, Jun 14, 2023, at 15:16, Tomas Vondra wrote:
> On 6/14/23 14:57, Joel Jacobson wrote:
>> Would it be feasible to teach the planner to utilize the internal hash table 
>> of
>> hashset directly? In the case of arrays, the hash table construction is an
...
> It's definitely something I'd leave out of v0, personally.

OK, thanks for guidance, I'll stay away from it.

I've been doing some preparatory work on this todo item:

> 3) support for other types (now it only works with int32)

I've renamed the type from "hashset" to "int4hashset",
and the SQL-functions are now prefixed with "int4"
when necessary. The overloaded functions with
int4hashset as input parameters don't need to be prefixed,
e.g. hashset_add(int4hashset, int).

Other changes since last update (4e60615):

* Support creation of empty hashset using '{}'::hashset
* Introduced a new function hashset_capacity() to return the current capacity
  of a hashset.
* Refactored hashset initialization:
  - Replaced hashset_init(int) with int4hashset() to initialize an empty hashset
with zero capacity.
  - Added int4hashset_with_capacity(int) to initialize a hashset with
a specified capacity.
* Improved README.md and testing

As a next step, I'm planning on adding int8 support.

Looks and sounds good?

/Joel

hashset-0.0.1-48e29eb
Description: Binary data


Re: trying again to get incremental backup

2023-06-14 Thread Andres Freund
Hi,

On 2023-06-14 16:10:38 -0400, Robert Haas wrote:
> On Wed, Jun 14, 2023 at 3:47 PM Andres Freund  wrote:
> > Could we just recompute the WAL summary for the [redo, end of chunk] for the
> > relevant summary file?
> 
> I'm not understanding how that would help. If we were going to compute
> a WAL summary on the fly rather than waiting for one to show up on
> disk, what we'd want is [end of last WAL summary that does exist on
> disk, redo].

Oh, right.


> But I'm not sure that's a great approach, because that LSN gap might be
> large and then we're duplicating a lot of work that the summarizer has
> probably already done most of.

I guess that really depends on what the summary granularity is. If you create
a separate summary every 32MB or so, recomputing just the required range
shouldn't be too bad.


> > FWIW, I like the idea of a special WAL record at that point, independent of
> > this feature. It wouldn't be a meaningful overhead compared to the cost of a
> > checkpoint, and it seems like it'd be quite useful for debugging. But I can
> > see uses going beyond that - we occasionally have been discussing 
> > associating
> > additional data with redo points, and that'd be a lot easier to deal with
> > during recovery with such a record.
> >
> > I don't really see a performance and concurrency angle right now - what are
> > you wondering about?
> 
> I'm not really sure. I expect Dilip would be happy to post his patch,
> and if you'd be willing to have a look at it and express your concerns
> or lack thereof, that would be super valuable.

Will do. Adding me to CC: might help, I have a backlog unfortunately :(.


Greetings,

Andres Freund




Re: trying again to get incremental backup

2023-06-14 Thread Matthias van de Meent
On Wed, 14 Jun 2023 at 20:47, Robert Haas  wrote:
>
> A few years ago, I sketched out a design for incremental backup, but
> no patch for incremental backup ever got committed. Instead, the whole
> thing evolved into a project to add backup manifests, which are nice,
> but not as nice as incremental backup would be. So I've decided to
> have another go at incremental backup itself. Attached are some WIP
> patches.

Nice, I like this idea.

> Let me summarize the design and some open questions and
> problems with it that I've discovered. I welcome problem reports and
> test results from others, as well.

Skimming through the 7th patch, I see claims that FSM is not fully
WAL-logged and thus shouldn't be tracked, and so it indeed doesn't
track those changes.
I disagree with that decision: we now have support for custom resource
managers, which may use the various forks for other purposes than
those used in PostgreSQL right now. It would be a shame if data is
lost because of the backup tool ignoring forks because the PostgreSQL
project itself doesn't have post-recovery consistency guarantees in
that fork. So, unless we document that WAL-logged changes in the FSM
fork are actually not recoverable from backup, regardless of the type
of contents, we should still keep track of the changes in the FSM fork
and include the fork in our backups or only exclude those FSM updates
that we know are safe to ignore.

Kind regards,

Matthias van de Meent
Neon, Inc.




Re: trying again to get incremental backup

2023-06-14 Thread Robert Haas
On Wed, Jun 14, 2023 at 3:47 PM Andres Freund  wrote:
> I assume this is "solely" required for keeping the incremental backups as
> small as possible, rather than being required for correctness?

I believe so. I want to spend some more time thinking about this to
make sure I'm not missing anything.

> Could we just recompute the WAL summary for the [redo, end of chunk] for the
> relevant summary file?

I'm not understanding how that would help. If we were going to compute
a WAL summary on the fly rather than waiting for one to show up on
disk, what we'd want is [end of last WAL summary that does exist on
disk, redo]. But I'm not sure that's a great approach, because that
LSN gap might be large and then we're duplicating a lot of work that
the summarizer has probably already done most of.

> FWIW, I like the idea of a special WAL record at that point, independent of
> this feature. It wouldn't be a meaningful overhead compared to the cost of a
> checkpoint, and it seems like it'd be quite useful for debugging. But I can
> see uses going beyond that - we occasionally have been discussing associating
> additional data with redo points, and that'd be a lot easier to deal with
> during recovery with such a record.
>
> I don't really see a performance and concurrency angle right now - what are
> you wondering about?

I'm not really sure. I expect Dilip would be happy to post his patch,
and if you'd be willing to have a look at it and express your concerns
or lack thereof, that would be super valuable.

> > Another thing that I'm not too sure about is: what happens if we find
> > a relation file on disk that doesn't appear in the backup_manifest for
> > the previous backup and isn't mentioned in the WAL summaries either?
>
> Wouldn't that commonly happen for unlogged relations at least?
>
> I suspect there's also other ways to end up with such additional files,
> e.g. by crashing during the creation of a new relation.

Yeah, this needs some more careful thought.

> > A few less-serious problems with the patch:
> >
> > - We don't have an incremental JSON parser, so if you have a
> > backup_manifest>1GB, pg_basebackup --incremental is going to fail.
> > That's also true of the existing code in pg_verifybackup, and for the
> > same reason. I talked to Andrew Dunstan at one point about adapting
> > our JSON parser to support incremental parsing, and he had a patch for
> > that, but I think he found some problems with it and I'm not sure what
> > the current status is.
>
> As a stopgap measure, can't we just use the relevant flag to allow larger
> allocations?

I'm not sure that's a good idea, but theoretically, yes. We can also
just choose to accept the limitation that your data directory can't be
too darn big if you want to use this feature. But getting incremental
JSON parsing would be better.

Not having the manifest in JSON would be an even better solution, but
regrettably I did not win that argument.

> That seems like a feature for the future...

Sure.

> I don't know the tar format well, but my understanding is that it doesn't have
> a "central metadata" portion. I.e. doing something like this would entail
> scanning the tar file sequentially, skipping file contents?  And wouldn't you
> have to create an entirely new tar file for the modified output? That kind of
> makes it not so incremental ;)
>
> IOW, I'm not sure it's worth bothering about this ever, and certainly doesn't
> seem worth bothering about now. But I might just be missing something.

Oh, yeah, it's just an idle thought. I'll get to it when I get to it,
or else I won't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Robert Haas
On Wed, Jun 14, 2023 at 3:46 PM Hannu Krosing  wrote:
> I remember a few times when memory leaks in some PostGIS packages
> cause slow memory exhaustion and the simple fix was limiting
> connection lifetime to something between 15 min and an hour.
>
> The main problem here is that PostGIS uses a few tens of other GPL GIS
> related packages which are all changing independently and thus it is
> quite hard to be sure that none of these have developed a leak. And
> you also likely can not just stop upgrading these as they also contain
> security fixes.
>
> I have no idea what the fix could be in case of threaded server.

Presumably, when a thread exits, we
MemoryContextDelete(TopMemoryContext). If the leak is into any memory
context managed by PostgreSQL, this still frees the memory. But it
might not be. Right now, if a library does a malloc() that it doesn't
free() every once in a while, it's no big deal. If it does it too
often, it's a problem now, too. But if it does it only every now and
then, process exit will prevent accumulation over time. In a threaded
model, that isn't true any longer: those allocations will accumulate
until we OOM.

And IMHO that's definitely a very significant downside of this
direction. I don't think it should be dispositive because such
problems are, hopefully, fixable, whereas some of the problems caused
by the process model are basically unfixable except by not using it
any more. However, if we lived in a world where both models were
supported and a particular user said, "hey, I'm sticking with the
process model because I don't trust my third-party libraries not to
leak," I would be like "yep, I totally get it."

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Should heapam_estimate_rel_size consider fillfactor?

2023-06-14 Thread Andres Freund
Hi,

On 2023-06-11 14:41:27 +0200, Tomas Vondra wrote:
> While testing some stuff, I noticed heapam_estimate_rel_size (or rather
> table_block_relation_estimate_size it calls) ignores fillfactor, so that
> for a table without statistics it ends up with reltuple estimate much
> higher than reality. For example, with fillfactor=10 the estimate is
> about 10x higher.
> 
> I ran into this while doing some tests with hash indexes, where I use
> fillfactor to make the table look bigger (as if the tuples were wider),
> and I ran into this:
> 
>   drop table hash_test;
>   create table hash_test (a int, b text) with (fillfactor=10);
>   insert into hash_test select 1 + ((i - 1) / 1), md5(i::text)
>  from generate_series(1, 100) s(i);
>   -- analyze hash_test;
>   create index hash_test_idx on hash_test using hash (a);
> 
>   select pg_size_pretty(pg_relation_size('hash_test_idx'));
> 
> If you run it like this (without the analyze), the index will be 339MB.
> With the analyze, it's 47MB.
> 
> This only happens if there are no relpages/reltuples statistics yet, in
> which case table_block_relation_estimate_size estimates density from
> tuple width etc.
> 
> So it seems the easiest "fix" is to do ANALYZE before creating the index
> (and not after it, as I had in my scripts). But I wonder how many people
> fail to realize this - it sure took me a while to realize the indexes
> are too large and even longer what is causing it. I wouldn't be very
> surprised if many people had bloated hash indexes after bulk loads.
> 
> So maybe we should make table_block_relation_estimate_size smarter to
> also consider the fillfactor in the "no statistics" branch, per the
> attached patch.

Seems like a good idea - I can't think of a reason why we shouldn't do so.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Andres Freund
Hi,

On 2023-06-13 16:55:12 +0900, Kyotaro Horiguchi wrote:
> At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik  
> wrote in 
> > Postgres backend is "thick" not because of large number of local
> > variables.
> > It is because of local caches: catalog cache, relation cache, prepared
> > statements cache,...
> > If they are not rewritten, then backend still may consume a lot of
> > memory even if it will be thread rather then process.
> > But threads simplify development of global caches, although it can be
> > done with DSM.
> 
> With the process model, that local stuff are flushed out upon
> reconnection. If we switch to the thread model, we will need an
> expiration mechanism for those stuff.

Isn't that just doing something like MemoryContextDelete(TopMemoryContext) at
the end of proc_exit() (or it's thread equivalent)?

Greetings,

Andres Freund




Re: On login trigger: take three

2023-06-14 Thread Mikhail Gribkov
Hi hackers,

The attached v40 patch is a fresh rebase on master branch to actualize the
state before the upcoming commitfest.
Nothing has changed beyond rebasing.

And just for convenience, here is a link to the exact message of the thread
describing the current approach:
https://www.postgresql.org/message-id/CAMEv5_vg4aJOoUC74XJm%2B5B7%2BTF1nT-Yhtg%2BawtBOESXG5Grfg%40mail.gmail.com

--
 best regards,
Mikhail A. Gribkov

e-mail: youzh...@gmail.com
*http://www.flickr.com/photos/youzhick/albums
*
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick



On Wed, Mar 22, 2023 at 10:38 PM Daniel Gustafsson  wrote:

> > On 22 Mar 2023, at 18:54, Robert Haas  wrote:
>
> > Basically, I think 0001 is a good idea -- I'm much more nervous about
> > 0002. I think we should get 0001 polished up and committed.
>
> Correct me if I'm wrong, but I believe you commented on v27-0001 of the
> login
> event trigger patch series?  Sorry about the confusion if so, this is a
> very
> old and lengthy thread with many twists and turns.  This series was closed
> downthread when the original authors of login EVT left, and the 0001 GUC
> patch
> extracted into its own thread.  That patch now lives at:
>
> https://commitfest.postgresql.org/42/4013/
>
> This thread was then later revived by Mikhail Gribkov but without 0001
> instead
> referring to the above patch for that part.
>
> The new patch for 0001 is quite different, and I welcome your eyes on that
> since I agree with you that it would be a good idea.
>
> --
> Daniel Gustafsson
>
>
>
>


v40-On_client_login_event_trigger.patch
Description: Binary data


Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Robert Haas
On Wed, Jun 14, 2023 at 3:16 PM James Addison  wrote:
> I think that they're practical performance-related questions about the
> benefits of performing a technical migration that could involve
> significant development time, take years to complete, and uncover
> problems that cause reliability issues for a stable, proven database
> management system.

I don't. I think they're reflecting confusion about what the actual,
practical path forward is.

For a first cut at this, all of our global variables become
thread-local. Every single last one of them. So there's no savings of
the type described in that email. We do each and every thing just as
we do it today, except that it's all in different parts of a single
address space instead of different address spaces with a chunk of
shared memory mapped into each one. Syscaches don't change, catcaches
don't change, memory copying is not reduced, literally nothing
changes. The coding model is just as it is today. Except for
decorating global variables, virtually no backend code needs to notice
or care about the transition. There are a few exceptions. For
instance, TopMemoryContext would need to be deleted explicitly, and
the FD caching stuff would have to be revised, because it uses up all
the FDs that the process can open, and having many threads doing that
in a single process isn't going to work. There's probably some other
things that I'm forgetting, but the typical effect on the average bit
of backend code should be very, very low. If it isn't, we're doing it
wrong.

So, I think saying "oh, this is going to destabliize PostgreSQL for
years" is just fear-mongering. If someone proposes a patch that we
think is going to have that effect, we should (and certainly will)
reject it. But I see no reason why we can't have a good patch for this
where most code changes only in mechanical ways that are easy to
validate.

> This reads like a code quality argument: that's worthwhile, but I
> don't see how it supports your 'False' assertions.  Do two queries
> running in separate processes spend much time allocating and waiting
> on resources that could be shared within a single thread?

I don't have any idea what this has to do with what Andres was talking
about, honestly. However, there certainly are cases of the thing
you're talking about here. Having many backends separately open the
same file means we've got a whole bunch of different file descriptors
accessing the same file instead of just one. That does have a
meaningful cost on some workloads. Passing tuples between cooperating
processes that are jointly executing a parallel query is costly in the
current scheme, too. There might be ways to improve on that somewhat
even without threads, but if you don't think that the process model
made getting parallel query working harder and less efficient, I'm
here as the guy who wrote a lot of that code to tell you that it very
much did.

> That seems valid.  Even so, I would expect that for many queries, I/O
> access and row processing time is the bulk of the work, and that
> context-switches to/from other query processes is relatively
> negligible.

That's completely true, but there are ALSO many OTHER situations in
which the overhead of frequent context switching is absolutely
crushing. You might as well argue that umbrellas don't need to exist
because there are lots of sunny days.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: trying again to get incremental backup

2023-06-14 Thread Andres Freund
Hi,

On 2023-06-14 14:46:48 -0400, Robert Haas wrote:
> A few years ago, I sketched out a design for incremental backup, but
> no patch for incremental backup ever got committed. Instead, the whole
> thing evolved into a project to add backup manifests, which are nice,
> but not as nice as incremental backup would be. So I've decided to
> have another go at incremental backup itself. Attached are some WIP
> patches. Let me summarize the design and some open questions and
> problems with it that I've discovered. I welcome problem reports and
> test results from others, as well.

Cool!


> I originally had the idea of summarizing a certain number of MB of WAL
> per WAL summary file, and so I added a GUC wal_summarize_mb for that
> purpose. But then I realized that actually, you really want WAL
> summary file boundaries to line up with possible redo points, because
> when you do an incremental backup, you need a summary that stretches
> from the redo point of the checkpoint written at the start of the
> prior backup to the redo point of the checkpoint written at the start
> of the current backup. The block modifications that happen in that
> range of WAL records are the ones that need to be included in the
> incremental.

I assume this is "solely" required for keeping the incremental backups as
small as possible, rather than being required for correctness?


> Unfortunately, there's no indication in the WAL itself
> that you've reached a redo point, but I wrote code that tries to
> notice when we've reached the redo point stored in shared memory and
> stops the summary there. But I eventually realized that's not good
> enough either, because if summarization zooms past the redo point
> before noticing the updated redo point in shared memory, then the
> backup sat around waiting for the next summary file to be generated so
> it had enough summaries to proceed with the backup, while the
> summarizer was in no hurry to finish up the current file and just sat
> there waiting for more WAL to be generated. Eventually the incremental
> backup would just time out. I tried to fix that by making it so that
> if somebody's waiting for a summary file to be generated, they can let
> the summarizer know about that and it can write a summary file ending
> at the LSN up to which it has read and then begin a new file from
> there. That seems to fix the hangs, but now I've got three
> overlapping, interconnected systems for deciding where to end the
> current summary file, and maybe that's OK, but I have a feeling there
> might be a better way.

Could we just recompute the WAL summary for the [redo, end of chunk] for the
relevant summary file?


> Dilip had an interesting potential solution to this problem, which was
> to always emit a special WAL record at the redo pointer. That is, when
> we fix the redo pointer for the checkpoint record we're about to
> write, also insert a WAL record there. That way, when the summarizer
> reaches that sentinel record, it knows it should stop the summary just
> before. I'm not sure whether this approach is viable, especially from
> a performance and concurrency perspective, and I'm not sure whether
> people here would like it, but it does seem like it would make things
> a whole lot simpler for this patch set.

FWIW, I like the idea of a special WAL record at that point, independent of
this feature. It wouldn't be a meaningful overhead compared to the cost of a
checkpoint, and it seems like it'd be quite useful for debugging. But I can
see uses going beyond that - we occasionally have been discussing associating
additional data with redo points, and that'd be a lot easier to deal with
during recovery with such a record.

I don't really see a performance and concurrency angle right now - what are
you wondering about?


> Another thing that I'm not too sure about is: what happens if we find
> a relation file on disk that doesn't appear in the backup_manifest for
> the previous backup and isn't mentioned in the WAL summaries either?

Wouldn't that commonly happen for unlogged relations at least?

I suspect there's also other ways to end up with such additional files,
e.g. by crashing during the creation of a new relation.


> A few less-serious problems with the patch:
> 
> - We don't have an incremental JSON parser, so if you have a
> backup_manifest>1GB, pg_basebackup --incremental is going to fail.
> That's also true of the existing code in pg_verifybackup, and for the
> same reason. I talked to Andrew Dunstan at one point about adapting
> our JSON parser to support incremental parsing, and he had a patch for
> that, but I think he found some problems with it and I'm not sure what
> the current status is.

As a stopgap measure, can't we just use the relevant flag to allow larger
allocations?


> - The patch does support differential backup, aka an incremental atop
> another incremental. There's no particular limit to how long a chain
> of backups can be. However, pg_combinebackup currently 

Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Hannu Krosing
On Tue, Jun 13, 2023 at 9:55 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 13 Jun 2023 09:55:36 +0300, Konstantin Knizhnik  
> wrote in
> > Postgres backend is "thick" not because of large number of local
> > variables.
> > It is because of local caches: catalog cache, relation cache, prepared
> > statements cache,...
> > If they are not rewritten, then backend still may consume a lot of
> > memory even if it will be thread rather then process.
> > But threads simplify development of global caches, although it can be
> > done with DSM.
>
> With the process model, that local stuff are flushed out upon
> reconnection. If we switch to the thread model, we will need an
> expiration mechanism for those stuff.

The part that can not be so easily solved is that "the local stuff"
can include some leakage that is not directly controlled by us.

I remember a few times when memory leaks in some PostGIS packages
cause slow memory exhaustion and the simple fix was limiting
connection lifetime to something between 15 min and an hour.

The main problem here is that PostGIS uses a few tens of other GPL GIS
related packages which are all changing independently and thus it is
quite hard to be sure that none of these have developed a leak. And
you also likely can not just stop upgrading these as they also contain
security fixes.

I have no idea what the fix could be in case of threaded server.




Re: [PATCH] Missing dep on Catalog.pm in meson rules

2023-06-14 Thread Andres Freund
Hi,

On 2023-06-09 11:43:54 -0700, Andres Freund wrote:
> On 2023-06-02 10:13:44 -0500, Tristan Partin wrote:
> > On Fri Jun 2, 2023 at 8:47 AM CDT, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2023-06-02 08:10:43 -0500, Tristan Partin wrote:
> > > > > I wonder if we instead could just make perl output the files it loads 
> > > > > and
> > > > > handle dependencies automatically that way? But that's more work, so 
> > > > > it's
> > > > > probably the right thing to go for the manual path for now.
> > > > 
> > > > I am not familar with Perl enough (at all haha) to know if that is
> > > > possible. I don't know exactly what these Perl files do, but perhaps it
> > > > might make sense to have some global lookup table that is setup near the
> > > > beginning of the script.
> > >
> > > It'd be nice to have something more general - there are other perl 
> > > modules we
> > > load, e.g.
> > > ./src/backend/catalog/Catalog.pm
> > > ./src/backend/utils/mb/Unicode/convutils.pm
> > > ./src/tools/PerfectHash.pm
> > >
> > >
> > > > perl_files = {
> > > >   'Catalog.pm': files('path/to/Catalog.pm'),
> > > >   ...
> > > > }
> > >
> > > I think you got it, but just to make sure: I was thinking of generating a
> > > depfile from within perl. Something like what you propose doesn't quite 
> > > seems
> > > like a sufficient improvement.
> > 
> > Whatever I am proposing is definitely subpar to generating a depfile. So
> > if that can be done, that is the best option!
> 
> I looked for a bit, but couldn't find an easy way to do so. I would still like
> to pursue going towards dep files for the perl scripts, even if that requires
> explicit support in the perl scripts, but that's a change for later.

Took a second look - sure looks like just using values %INC should suffice?

Ilmari, you're the perl expert, is there an issue with that?

Tristan, any chance you're interested hacking that up for a bunch of the
scripts? Might be worth adding a common helper for, I guess?

Something like

for (values %INC)
{
print STDERR "$kw_def_file: $_\n";
}

seems to roughly do the right thing for gen_keywordlist.pl. Of course for
something real it'd need an option where to put that data, instead of printing
to stderr.

Greetings,

Andres Freund




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread James Addison
On Mon, 12 Jun 2023 at 20:24, Andres Freund  wrote:
>
> Hi,
>
> On 2023-06-12 16:23:14 +0400, Pavel Borisov wrote:
> > Is the following true or not?
> >
> > 1. If we switch processes to threads but leave the amount of session
> > local variables unchanged, there would be hardly any performance gain.
>
> False.
>
>
> > 2. If we move some backend's local variables into shared memory then
> > the performance gain would be very near to what we get with threads
> > having equal amount of session-local variables.
>
> False.
>
>
> > In other words, the overall goal in principle is to gain from less
> > memory copying wherever it doesn't add the burden of locks for
> > concurrent variables access?
>
> False.
>
> Those points seems pretty much unrelated to the potential gains from switching
> to a threading model. The main advantages are:

I think that they're practical performance-related questions about the
benefits of performing a technical migration that could involve
significant development time, take years to complete, and uncover
problems that cause reliability issues for a stable, proven database
management system.

> 1) We'd gain from being able to share state more efficiently (using normal
>pointers) and more dynamically (not needing to pre-allocate). That'd remove
>a good amount of complexity. As an example, consider the work we need to do
>to ferry tuples from one process to another. Even if we just continue to
>use shm_mq, in a threading world we could just put a pointer in the queue,
>but have the tuple data be shared between the processes etc.
>
>Eventually this could include removing the 1:1 connection<->process/thread
>model. That's possible to do with processes as well, but considerably
>harder.

This reads like a code quality argument: that's worthwhile, but I
don't see how it supports your 'False' assertions.  Do two queries
running in separate processes spend much time allocating and waiting
on resources that could be shared within a single thread?

> 2) Making context switches cheaper / sharing more resources at the OS and
>hardware level.

That seems valid.  Even so, I would expect that for many queries, I/O
access and row processing time is the bulk of the work, and that
context-switches to/from other query processes is relatively
negligible.




Re: Should heapam_estimate_rel_size consider fillfactor?

2023-06-14 Thread Corey Huinker
>
> So maybe we should make table_block_relation_estimate_size smarter to
> also consider the fillfactor in the "no statistics" branch, per the
> attached patch.
>

I like this a lot. The reasoning is obvious, the fix is simple,it doesn't
upset any make-check-world tests, and in order to get a performance
regression we'd need a table whose fillfactor has been changed after the
data was loaded but before an analyze happens, and that's a narrow enough
case to accept.

My only nitpick is to swap

(usable_bytes_per_page * fillfactor / 100) / tuple_width

with

(usable_bytes_per_page * fillfactor) / (tuple_width * 100)


as this will eliminate the extra remainder truncation, and it also gets the
arguments "in order" algebraically.


Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2023-06-14 Thread Jeff Davis
On Tue, 2023-06-13 at 14:12 -0700, Nathan Bossart wrote:
> I've been reviewing ff9618e lately, and I'm wondering whether it has
> the
> same problem that 19de0ab solved.  Specifically, ff9618e introduces
> has_partition_ancestor_privs(), which is used to check whether a user
> has
> MAINTAIN on any partition ancestors.  This involves syscache lookups,
> and
> presently this function does not take any relation locks.  I did
> spend some
> time trying to induce cache lookup errors, but I didn't have any
> luck.
> However, unless this can be made safe without too much trouble, I
> think I'm
> inclined to partially revert ff9618e, leaving the TOAST-related parts
> intact.

Agreed. Having it work on partition hierarchies is a nice-to-have, but
not central to the usability of the feature. If it's causing problems,
best to take that out and reconsider in 17 if worthwhile.

Regards,
Jeff Davis





Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2023-06-14 Thread Nathan Bossart
On Tue, Jun 13, 2023 at 04:54:42PM -0700, Nathan Bossart wrote:
> On Wed, Jun 14, 2023 at 08:16:15AM +0900, Michael Paquier wrote:
>> So, yes, agreed about the removal of has_partition_ancestor_privs().
>> I am adding an open item assigned to you and Jeff.
> 
> Thanks.  I suspect there's more discussion incoming, but I'm hoping to
> close this item one way or another by 16beta2.

Concretely, I am proposing something like the attached patches.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 8e93be7a0b4218a31f83b9fca53aa224bc9f5788 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 14 Jun 2023 10:54:05 -0700
Subject: [PATCH v1 1/2] partial revert of ff9618e82a

---
 doc/src/sgml/ref/analyze.sgml |  5 +---
 doc/src/sgml/ref/cluster.sgml |  6 +---
 doc/src/sgml/ref/lock.sgml|  5 +---
 doc/src/sgml/ref/reindex.sgml |  6 +---
 doc/src/sgml/ref/vacuum.sgml  |  5 +---
 src/backend/commands/cluster.c| 10 ++-
 src/backend/commands/indexcmds.c  | 19 +---
 src/backend/commands/lockcmds.c   |  8 -
 src/backend/commands/tablecmds.c  | 29 +--
 src/backend/commands/vacuum.c |  6 +---
 src/include/commands/tablecmds.h  |  1 -
 .../expected/cluster-conflict-partition.out   | 14 -
 .../specs/cluster-conflict-partition.spec |  7 +++--
 src/test/regress/expected/cluster.out |  3 +-
 src/test/regress/expected/vacuum.out  | 18 
 15 files changed, 49 insertions(+), 93 deletions(-)

diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index 20c6f9939f..30a893230e 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -190,10 +190,7 @@ ANALYZE [ VERBOSE ] [ table_and_columnsANALYZE can only be performed by superusers and roles
-   with privileges of pg_maintain.)  If a role has
-   permission to ANALYZE a partitioned table, it is also
-   permitted to ANALYZE each of its partitions, regardless
-   of whether the role has the aforementioned privileges on the partition.
+   with privileges of pg_maintain.)
ANALYZE will skip over any tables that the calling user
does not have permission to analyze.
   
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 29f0f1fd90..c768175499 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -137,11 +137,7 @@ CLUSTER [VERBOSE]
 on the table or be the table's owner, a superuser, or a role with
 privileges of the
 pg_maintain
-role.  If a role has permission to CLUSTER a partitioned
-table, it is also permitted to CLUSTER each of its
-partitions, regardless of whether the role has the aforementioned
-privileges on the partition.  CLUSTER will skip over any
-tables that the calling user does not have permission to cluster.
+role.

 

diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml
index 5b3b2b793a..8524182211 100644
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -177,10 +177,7 @@ LOCK [ TABLE ] [ ONLY ] name [ * ]
 MODE (or a less-conflicting mode as described in ) is permitted. If a user has
 SELECT privileges on the table, ACCESS SHARE
-MODE is permitted.  If a role has permission to lock a
-partitioned table, it is also permitted to lock each of its partitions,
-regardless of whether the role has the aforementioned privileges on the
-partition.
+MODE is permitted.

 

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 71455dfdc7..23f8c7630b 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -306,11 +306,7 @@ REINDEX [ ( option [, ...] ) ] { DA
indexes on shared catalogs will be skipped unless the user owns the
catalog (which typically won't be the case), has privileges of the
pg_maintain role, or has the MAINTAIN
-   privilege on the catalog.  If a role has permission to
-   REINDEX a partitioned table, it is also permitted to
-   REINDEX each of its partitions, regardless of whether the
-   role has the aforementioned privileges on the partition.  Of course,
-   superusers can always reindex anything.
+   privilege on the catalog.  Of course, superusers can always reindex anything.
   
 
   
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 57bc4c23ec..445325e14c 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -452,10 +452,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ relname)));
 
 	/* Check permissions */
-	if (pg_class_aclcheck(relId, GetUserId(), ACL_MAINTAIN) != ACLCHECK_OK &&
-		!has_partition_ancestor_privs(relId, GetUserId(), ACL_MAINTAIN))
+	if (pg_class_aclcheck(relId, GetUserId(), ACL_MAINTAIN) != ACLCHECK_OK)
 		aclcheck_error(ACLCHECK_NOT_OWNER, OB

Re: remap the .text segment into huge pages at run time

2023-06-14 Thread Andres Freund
Hi,

On 2023-06-14 12:40:18 +0700, John Naylor wrote:
> On Sat, Nov 5, 2022 at 3:27 PM Andres Freund  wrote:
> 
> >/*
> > * Make huge pages out of it. Requires at least linux 6.1.  We
> could
> > * fall back to MADV_HUGEPAGE if it fails, but it doesn't do all
> that
> > * much in older kernels.
> > */
> > #define MADV_COLLAPSE25
> >r = madvise(addr, advlen, MADV_COLLAPSE);
> >if (r != 0)
> >fprintf(stderr, "MADV_COLLAPSE failed: %m\n");
> >
> >
> > A real version would have to open /proc/self/maps and do this for at least
> > postgres' r-xp mapping. We could do it for libraries too, if they're
> suitably
> > aligned (both in memory and on-disk).
> 
> Hi Andres, my kernel has been new enough for a while now, and since TLBs
> and context switches came up in the thread on... threads, I'm swapping this
> back in my head.

Cool - I think we have some real potential for substantial wins around this.


> For the postmaster, it should be simple to have a function that just takes
> the address of itself, then parses /proc/self/maps to find the boundaries
> within which it lies. I haven't thought about libraries much. Though with
> just the postmaster it seems that would give us the biggest bang for the
> buck?

I think that is the main bit, yes. We could just try to do this for the
libraries, but accept failure to do so?

Greetings,

Andres Freund




Re: RFC: Logging plan of the running query

2023-06-14 Thread James Coleman
On Tue, Jun 13, 2023 at 11:53 AM James Coleman  wrote:
>
> ...
> I'm going to re-run tests with my patch version + resetting the flag
> on SIGINT (and any other error condition) to be certain that the issue
> you uncovered (where backends get stuck after a SIGINT not responding
> to the requested plan logging) wasn't masking any other issues.
>
> As long as that run is clean also then I believe the patch is safe
> as-is even without the re-entrancy guard.
>
> I'll report back with the results of that testing.

The tests have been running since last night, but have been apparently
hung now for many hours. I haven't been able to fully look into it,
but so far I know the hung (100% CPU) backend last logged this:

2023-06-14 02:00:30.045 UTC client backend[84461]
pg_regress/updatable_views LOG:  query plan running on backend with
PID 84461 is:
Query Text: SELECT table_name, column_name, is_updatable
  FROM information_schema.columns
 WHERE table_name LIKE E'r_\\_view%'
 ORDER BY table_name, ordinal_position;

The last output from the regression test harness was:

# parallel group (5 tests):  index_including create_view
index_including_gist create_index create_index_spgist
ok 66+ create_index36508 ms
ok 67+ create_index_spgist 38588 ms
ok 68+ create_view  1394 ms
ok 69+ index_including   654 ms
ok 70+ index_including_gist 1701 ms
# parallel group (16 tests):  errors create_cast drop_if_exists
create_aggregate roleattributes constraints hash_func typed_table
infinite_recurse

Attaching gdb to the hung backend shows this:

#0  0x5601ab1f9529 in ProcLockWakeup
(lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
, lock=lock@entry=0x7f5325c913f0) at proc.c:1655
#1  0x5601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0,
proclock=proclock@entry=0x7f5325d40d60,
lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
,
hashcode=hashcode@entry=573498161, wakeupNeeded=)
at lock.c:1673
#2  0x5601ab1e9e21 in LockRefindAndRelease
(lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
, proc=,
locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1,
decrement_strong_lock_count=decrement_strong_lock_count@entry=false)
at lock.c:3150
#3  0x5601ab1edb27 in LockReleaseAll
(lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295
#4  0x5601ab1f8599 in ProcReleaseLocks
(isCommit=isCommit@entry=true) at proc.c:781
#5  0x5601ab37f1f4 in ResourceOwnerReleaseInternal
(owner=, phase=phase@entry=RESOURCE_RELEASE_LOCKS,
isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
resowner.c:618
#6  0x5601ab37f7b7 in ResourceOwnerRelease (owner=,
phase=phase@entry=RESOURCE_RELEASE_LOCKS,
isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
resowner.c:494
#7  0x5601aaec1d84 in CommitTransaction () at xact.c:2334
#8  0x5601aaec2b22 in CommitTransactionCommand () at xact.c:3067
#9  0x5601ab200a66 in finish_xact_command () at postgres.c:2783
#10 0x5601ab20338f in exec_simple_query (
query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name,
column_name, is_updatable\n  FROM information_schema.columns\n WHERE
table_name LIKE E'r__view%'\n ORDER BY table_name,
ordinal_position;") at postgres.c:1300

I am unable to connect to the regression test Postgres instance --
psql just hangs, so the lock seems to have affected the postmaster
also.

I'm wondering if this might represent a bug in the current patch.

Regards,
James Coleman




Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)

2023-06-14 Thread Gurjeet Singh
On Wed, Jun 14, 2023 at 5:12 AM Ranier Vilela  wrote:
>
> Em qua., 14 de jun. de 2023 às 06:51, Richard Guo  
> escreveu:
>>
>>
>> On Tue, Jun 13, 2023 at 3:39 PM Kyotaro Horiguchi  
>> wrote:
>>>
>>> Gurjeet has mentioned that eb.rel cannot be modified by another
>>> process since the value or memory is in the local stack, and I believe
>>> he's correct.
>>>
>>> If the pointed Relation had been blown out, eb.rel would be left
>>> dangling, not nullified. However, I don't believe this situation
>>> happens (or it shouldn't happen) as the entire relation should already
>>> be locked.
>>
>>
>> Yeah, Gurjeet is right.  I had a thinko here.  eb.rel should not be NULL
>> pointer in any case.  And as we've acquired the lock for it, it should
>> not have been closed.  So I think we can remove the check for eb.rel in
>> the two places.
>
> Ok,
> As there is a consensus on removing the tests and the comment is still 
> relevant,
> here is a new version for analysis.

LGTM.

Best regards,
Gurjeet
http://Gurje.et




Re: Use COPY for populating all pgbench tables

2023-06-14 Thread Tristan Partin
Again, I forget to actually attach. Holy guacamole.

-- 
Tristan Partin
Neon (https://neon.tech)
From 8c4eb4849b1282f1a0947ddcf3f599e384a5a428 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 23 May 2023 09:21:55 -0500
Subject: [PATCH v2 1/2] Move constant into format string

If we are always going to write a 0, just put the 0 in the format
string.
---
 src/bin/pgbench/pgbench.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 1d1670d4c2..320d348a0f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4947,8 +4947,8 @@ initGenerateDataClientSide(PGconn *con)
 
 		/* "filler" column defaults to blank padded empty string */
 		printfPQExpBuffer(&sql,
-		  INT64_FORMAT "\t" INT64_FORMAT "\t%d\t\n",
-		  j, k / naccounts + 1, 0);
+		  INT64_FORMAT "\t" INT64_FORMAT "\t0\t\n",
+		  j, k / naccounts + 1);
 		if (PQputline(con, sql.data))
 			pg_fatal("PQputline failed");
 
-- 
Tristan Partin
Neon (https://neon.tech)

From 8c573cbede228a5b5570d3ca37055bd40d4a4025 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 23 May 2023 11:48:16 -0500
Subject: [PATCH v2 2/2] Use COPY instead of INSERT for populating all tables

COPY is a better interface for bulk loading and/or high latency
connections. Previously COPY was only used for the pgbench_accounts
table because the amount of data was so much larger than the other
tables. However, as already said there are also gains to be had in the
high latency case, such as loading data across continents.
---
 src/bin/pgbench/pgbench.c | 152 ++
 1 file changed, 87 insertions(+), 65 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 320d348a0f..9c57f7e4a7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4869,17 +4869,46 @@ initTruncateTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
-/*
- * Fill the standard tables with some data generated and sent from the client
- */
+typedef void initRow(PGconn *con, PQExpBufferData *sql, int64 curr);
+
 static void
-initGenerateDataClientSide(PGconn *con)
+initBranch(PGconn *con, PQExpBufferData *sql, int64 curr)
+{
+	/* "filler" column defaults to NULL */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t0\t\n",
+	  curr + 1);
+}
+
+static void
+initTeller(PGconn *con, PQExpBufferData *sql, int64 curr)
+{
+	/* "filler" column defaults to NULL */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t" INT64_FORMAT "\t0\t\n",
+	  curr + 1, curr / ntellers + 1);
+}
+
+static void
+initAccount(PGconn *con, PQExpBufferData *sql, int64 curr)
+{
+	/* "filler" column defaults to blank padded empty string */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t" INT64_FORMAT "\t0\t\n",
+	  curr + 1, curr / naccounts + 1);
+}
+
+static void
+initPopulateTable(PGconn *con, const char *table, int64 base, initRow *init_row)
 {
+	int			n;
+	int			k;
+	int			chars = 0;
+	PGresult	*res;
 	PQExpBufferData sql;
-	PGresult   *res;
-	int			i;
-	int64		k;
-	char	   *copy_statement;
+	char 		copy_statement[256];
+	const char *copy_statement_fmt;
+	const int64 total = base * scale;
 
 	/* used to track elapsed time and estimate of the remaining time */
 	pg_time_usec_t start;
@@ -4888,50 +4917,19 @@ initGenerateDataClientSide(PGconn *con)
 	/* Stay on the same line if reporting to a terminal */
 	char		eol = isatty(fileno(stderr)) ? '\r' : '\n';
 
-	fprintf(stderr, "generating data (client-side)...\n");
-
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
-	/* truncate away any old data */
-	initTruncateTables(con);
-
 	initPQExpBuffer(&sql);
 
-	/*
-	 * fill branches, tellers, accounts in that order in case foreign keys
-	 * already exist
-	 */
-	for (i = 0; i < nbranches * scale; i++)
-	{
-		/* "filler" column defaults to NULL */
-		printfPQExpBuffer(&sql,
-		  "insert into pgbench_branches(bid,bbalance) values(%d,0)",
-		  i + 1);
-		executeStatement(con, sql.data);
-	}
-
-	for (i = 0; i < ntellers * scale; i++)
-	{
-		/* "filler" column defaults to NULL */
-		printfPQExpBuffer(&sql,
-		  "insert into pgbench_tellers(tid,bid,tbalance) values (%d,%d,0)",
-		  i + 1, i / ntellers + 1);
-		executeStatement(con, sql.data);
-	}
-
-	/*
-	 * accounts is big enough to be worth using COPY and tracking runtime
-	 */
-
 	/* use COPY with FREEZE on v14 and later without partitioning */
 	if (partitions == 0 && PQserverVersion(con) >= 14)
-		copy_statement = "copy pgbench_accounts from stdin with (freeze on)";
+		copy_statement_fmt = "copy %s from stdin with (freeze on)";
 	else
-		copy_statement = "copy pgbench_accounts from stdin";
+		copy_statement_fmt = "copy %s from stdin";
+
+	n = pg_snprintf(copy_statement, sizeof(copy_statement), copy_statement_fmt, table);
+	if (n >= sizeof(copy_statement))
+		pg_fatal("invalid buffer 

Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2023-06-14 Thread Tomas Vondra



On 6/14/23 15:39, Zhijie Hou (Fujitsu) wrote:
> On Wednesday, June 14, 2023 5:05 PM Tomas Vondra 
>  wrote:
>> ...
>>
>> Also, can you try if we still stream the partial transaction with
>> create_logical_replication_slot building a full snapshot?
> 
> Yes, It can fix this problem because force create_logical_replication_slot
> build a full snapshot can avoid restoring the snapshot. But I am not sure if
> this is the best fix for this for the same reason(it's costly) mentioned by
> Amit[1].
> 

Costly compared to the current behavior? Sure, but considering the
current behavior is incorrect/broken, that seems rather irrelevant.
Incorrect behavior can be infinitely faster.

I doubt it's significantly costlier than just setting the "full
snapshot" flag when building the initial snapshot - sure, it will take
more time than now, but that's kinda the whole point. It seems to me the
problem is exactly that it *doesn't* wait long enough.

I may be misunderstanding the solution you proposed, but this:

One idea to fix the partial change stream problem would be that we
record all the running transaction's xid when restoring the snapshot
in SnapBuildFindSnapshot(), and in the following decoding, we skip
decoding changes for the recorded transaction

sounds pretty close to what building a correct snapshot actually does.

But maybe I'm wrong - ultimately, the way to compare those approaches
seems to be to prototype this proposal, and do some tests.

There's also the question of back branches, and it seems way simpler to
just flip a flag and disable broken optimization than doing fairly
complex stuff to save it.

I'd also point out that (a) this only affects the initial snapshot, not
every time we start the decoding context, and (b) the slots created from
walsender already do that with (unless when copy_data=false).

So if needs_full_snapshot=true fixes the issue, I'd just do that as the
first step - in master and backpatches. And then if we want to salvage
the optimization, we can try fixing it in master.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Use COPY for populating all pgbench tables

2023-06-14 Thread Tristan Partin
Here is a v2. It cleans up the output when printing to a tty. The
last "x of y tuples" line gets overwritten now, so the final output
looks like:

dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 0.14 s (drop tables 0.01 s, create tables 0.01 s, client-side generate 
0.05 s, vacuum 0.03 s, primary keys 0.03 s).

-- 
Tristan Partin
Neon (https://neon.tech)




Re: Do we want a hashset type?

2023-06-14 Thread Andrew Dunstan


On 2023-06-14 We 05:44, Tomas Vondra wrote:


The thing is, adding stuff to core is not free - it means the community
becomes responsible for maintenance, testing, fixing issues, etc. It's
not feasible (or desirable) to have all extensions in core, and cloud
vendors generally do have ways to support some pre-vetted extensions
that they deem useful enough. Granted, it means vetting/maintenance for
them, but that's kinda the point of managed services. And it'd not be
free for us either.



I agree it's a judgement call. But the code burden here seems pretty 
small, far smaller than, say, the SQL/JSON patches. And I think the 
range of applications that could benefit is quite significant.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Replace (GUC_UNIT_MEMORY | GUC_UNIT_TIME) with GUC_UNIT in guc.c

2023-06-14 Thread Japin Li


On Wed, 14 Jun 2023 at 17:52, Richard Guo  wrote:
> On Wed, Jun 14, 2023 at 3:47 PM Michael Paquier  wrote:
>
>> On Wed, Jun 14, 2023 at 03:38:10PM +0800, Richard Guo wrote:
>> > +1.  BTW, it seems that GUC_UNIT_TIME is not used anywhere except in
>> > GUC_UNIT.  I was wondering if we can retire it, but maybe we'd better
>> > not.  It still indicates that we need to use time units table.
>>
>> Some out-of-core code declaring custom GUCs could rely on that, so
>> it is better not to remove it.
>
>
> I see.  Thanks for pointing that out.
>

Thanks for all of your reviews.  Agreed with Michael do not touch GUC_UNIT_TIME.

-- 
Regrads,
Japin Li.





RE: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2023-06-14 Thread Zhijie Hou (Fujitsu)
On Wednesday, June 14, 2023 5:05 PM Tomas Vondra 
 wrote:
> On 6/14/23 05:15, Zhijie Hou (Fujitsu) wrote:
> > On Tuesday, June 13, 2023 12:19 PM Zhijie Hou (Fujitsu)
>  wrote:
> >>
> >> On Tuesday, June 13, 2023 12:04 PM Amit Kapila
> >> 
> >> wrote:
> >>>
> >>> On Wed, Jun 7, 2023 at 6:02 PM Tomas Vondra
> >>>  wrote:
> 
> 
>  Well, I think the issue is pretty clear - we end up with an initial
>  snapshot that's in between the ASSIGNMENT and NEW_CID, and because
>  NEW_CID has both xact and subxact XID it fails because we add two
>  TXNs with the same LSN, not realizing one of them is subxact.
> 
>  That's obviously wrong, although somewhat benign in production
>  because it only fails because of hitting an assert.
> 
> >>>
> >>> Doesn't this indicate that we can end up decoding a partial
> >>> transaction when we restore a snapshot? Won't that be a problem even
> >>> for
> >> production?
> >>
> >> Yes, I think it can cause the problem that only partial changes of a
> >> transaction are streamed.
> >> I tried to reproduce this and here are the steps. Note, to make sure
> >> the test won't be affected by other running_xact WALs, I changed
> >> LOG_SNAPSHOT_INTERVAL_MS to a bigger number.
> >>
> >> session 1:
> >> -
> >> create table test(a int);
> >> SELECT 'init' FROM
> >> pg_create_logical_replication_slot('isolation_slot_1',
> >> 'test_decoding');
> >> -
> >>
> >> session 2:
> >> -
> >> - Start a transaction
> >> BEGIN;
> >> INSERT INTO test VALUES(1);
> >> -
> >>
> >> session 3:
> >> -
> >> - Create another slot isolation_slot_2, it should choose a
> >> restart_point which is
> >> - after the changes that happened in session 2. Note, to let the
> >> current slot
> >> - restore another snapshot, we need to use gdb to block the current
> >> backend at
> >> - SnapBuildFindSnapshot(), the backend should have logged the
> >> running_xacts WAL
> >> - before reaching SnapBuildFindSnapshot.
> >>
> >> SELECT 'init' FROM
> >> pg_create_logical_replication_slot('isolation_slot_2',
> >> 'test_decoding');
> >> -
> >>
> >> session 1:
> >> -
> >> - Since there is a running_xacts which session 3 logged, the current
> >> backend will
> >> - serialize the snapshot when decoding the running_xacts WAL, and the
> >> snapshot
> >> - can be used by other slots (e.g. isolation_slot_2)
> >>
> >> SELECT data FROM pg_logical_slot_get_changes('isolation_slot_1',
> >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
> >> -
> >>
> >> session 2:
> >> -
> >> - Insert some different data and commit the transaction.
> >>
> >> INSERT INTO test VALUES(2);
> >> INSERT INTO test VALUES(3);
> >> INSERT INTO test VALUES(4);
> >> COMMIT
> >> -
> >>
> >> session 3:
> >> -
> >> - Release the process and try to stream the changes, since the
> >> restart point is
> >> - at the middle of the transaction, it will stream partial changes of
> >> the
> >> - transaction which was committed in session 2:
> >>
> >> SELECT data FROM pg_logical_slot_get_changes('isolation_slot_2',
> >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
> >> -
> >>
> >> Results (partial streamed changes):
> >> postgres=# SELECT data FROM
> >> pg_logical_slot_get_changes('isolation_slot_2',
> >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
> >>   data
> >> -
> >>  BEGIN
> >>  table public.test: INSERT: a[integer]:2  table public.test: INSERT:
> >> a[integer]:3 table public.test: INSERT: a[integer]:4  COMMIT
> >> (5 rows)
> >>
> >
> > One idea to fix the partial change stream problem would be that we
> > record all the running transaction's xid when restoring the snapshot
> > in SnapBuildFindSnapshot(), and in the following decoding, we skip
> > decoding changes for the recorded transaction. Or we can do similar to
> > 7f13ac8(serialize the information of running xacts if any)
> >
> 
> We need to think about how to fix this in backbranches, and the idea with
> serializing running transactions seems rather unbackpatchable (as it changes
> on-disk state).
> 
> > But one point I am not very sure is that we might retore snapshot in
> > SnapBuildSerializationPoint() as well where we don't have running
> > transactions information. Although SnapBuildSerializationPoint() is
> > invoked for XLOG_END_OF_RECOVERY and XLOG_CHECKPOINT_SHUTDOWN
> records
> > which seems no running transaction will be there when logging. But I
> > am not 100% sure if the problem can happen in this case as well.
> >
...
> 
> Also, can you try if we still stream the partial transaction with
> create_logical_replication_slot building a full snapshot?

Yes, It can fix this problem because force create_logical_replication_slot
build a full snapshot can avoid restoring the snapshot. But I am not sure if
this is the best fix for this for the same reason(it's costly) mentioned by
Amit[1].

[1] 
https://www.postgresql.org/message-id/CAA4eK1Kr

Re: [DOC] Update ALTER SUBSCRIPTION documentation v3

2023-06-14 Thread Peter Eisentraut

On 14.06.23 05:09, Amit Kapila wrote:

The latest version looks good to me as well. I think we should
backpatch this change as this is a user-facing message change in docs
and code. What do you guys think?


Isn't that a reason *not* to backpatch it?




Re: Do we want a hashset type?

2023-06-14 Thread Tomas Vondra



On 6/14/23 14:57, Joel Jacobson wrote:
> On Wed, Jun 14, 2023, at 11:44, Tomas Vondra wrote:
>>> Perspective from a potential user: I'm currently working on something
>>> where an array-like structure with fast membership test performance
>>> would be very useful. The main type of query is doing an =ANY(the set)
>>> filter, where the set could contain anywhere from very few to thousands
>>> of entries (ints in our case). So we'd want the same index usage as
>>> =ANY(array) but would like faster row checking than we get with an array
>>> when other indexes are used.
>>>
>>
>> We kinda already do this since PG14 (commit 50e17ad281), actually. If
>> the list is long enough (9 values or more), we'll build a hash table
>> during query execution. So pretty much exactly what you're asking for.
> 
> Would it be feasible to teach the planner to utilize the internal hash table 
> of
> hashset directly? In the case of arrays, the hash table construction is an
> ad hoc operation, whereas with hashset, the hash table already exists, which
> could potentially lead to a faster execution.
> 
> Essentially, the aim would be to support:
> 
> =ANY(hashset)
> 
> Instead of the current:
> 
> =ANY(hashset_to_array(hashset))
> 
> Thoughts?

That should be possible, but probably only when hashset is a built-in
data type (maybe polymorphic).

I don't know if it'd be worth it, the general idea is that building the
hash table is way cheaper than repeated lookups in an array. Yeah, it
might save something, but likely only a tiny fraction of the runtime.

It's definitely something I'd leave out of v0, personally.

=ANY(set) should probably work with an implicit ARRAY cast, I believe.
It'll do the ad hoc build, ofc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1

2023-06-14 Thread Tom Lane
Richard Guo  writes:
> I just realized that we may still have holes in this area.  Until now
> we're mainly focusing on LATERAL subquery, in which case the lateral
> reference Vars are copied into rel->subplan_params and we've already
> adjusted the nulling bitmaps there.  But what about the lateral
> reference Vars in other cases?

Ugh.

> So it seems that we need to do nullingrel adjustments in a more common
> place.

I agree: this suggests that we fixed it in the wrong place.

> I think it exposes a new issue.  It seems that we extract a problematic
> lateral_relids from lateral references within PlaceHolderVars in
> create_lateral_join_info.  I doubt that we should use ph_lateral
> directly.  It seems more reasonable to me that we strip outer-join
> relids from ph_lateral and then use that for lateral_relids.

Hmm.  I don't have time to think hard about this today, but this
does feel similar to our existing decision that parameterized paths
should be generated with minimal nullingrels bits on their outer
references.  We only thought about pushed-down join clauses when we
did that.  But a lateral ref necessarily gives rise to parameterized
path(s), and what we seem to be seeing is that those need to be
handled just the same as ones generated by pushing down join clauses.

regards, tom lane




Re: Do we want a hashset type?

2023-06-14 Thread Joel Jacobson
On Wed, Jun 14, 2023, at 11:44, Tomas Vondra wrote:
>> Perspective from a potential user: I'm currently working on something
>> where an array-like structure with fast membership test performance
>> would be very useful. The main type of query is doing an =ANY(the set)
>> filter, where the set could contain anywhere from very few to thousands
>> of entries (ints in our case). So we'd want the same index usage as
>> =ANY(array) but would like faster row checking than we get with an array
>> when other indexes are used.
>> 
>
> We kinda already do this since PG14 (commit 50e17ad281), actually. If
> the list is long enough (9 values or more), we'll build a hash table
> during query execution. So pretty much exactly what you're asking for.

Would it be feasible to teach the planner to utilize the internal hash table of
hashset directly? In the case of arrays, the hash table construction is an
ad hoc operation, whereas with hashset, the hash table already exists, which
could potentially lead to a faster execution.

Essentially, the aim would be to support:

=ANY(hashset)

Instead of the current:

=ANY(hashset_to_array(hashset))

Thoughts?

/Joel




Shouldn't cost_append() also scale the partial path's cost?

2023-06-14 Thread Antonin Houska
Like in cost_seqscan(), I'd expect the subpath cost to be divided among
parallel workers. The patch below shows what I mean. Am I right?

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a1..5427822e0e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2313,19 +2313,29 @@ cost_append(AppendPath *apath)
 			 * Apply parallel divisor to subpaths.  Scale the number of rows
 			 * for each partial subpath based on the ratio of the parallel
 			 * divisor originally used for the subpath to the one we adopted.
-			 * Also add the cost of partial paths to the total cost, but
-			 * ignore non-partial paths for now.
+			 * Also add the scaled cost of partial paths to the total cost,
+			 * but ignore non-partial paths for now.
 			 */
 			if (i < apath->first_partial_path)
 apath->path.rows += subpath->rows / parallel_divisor;
 			else
 			{
 double		subpath_parallel_divisor;
+double		scale_factor;
+Cost		run_cost;
 
 subpath_parallel_divisor = get_parallel_divisor(subpath);
-apath->path.rows += subpath->rows * (subpath_parallel_divisor /
-	 parallel_divisor);
-apath->path.total_cost += subpath->total_cost;
+scale_factor = subpath_parallel_divisor / parallel_divisor;
+apath->path.rows += subpath->rows * scale_factor;
+/*
+ * XXX run_cost includes both CPU cost, which is divided among
+ * workers, and disk cost, which is not. Unfortunately we
+ * don't have enough information to separate the two, so scale
+ * the whole run_cost.
+ */
+run_cost = subpath->total_cost - subpath->startup_cost;
+apath->path.total_cost += subpath->startup_cost +
+	run_cost * scale_factor;;
 			}
 
 			apath->path.rows = clamp_row_est(apath->path.rows);


Re: Views no longer in rangeTabls?

2023-06-14 Thread David Steele

On 6/14/23 12:51, Amit Langote wrote:


Ah, did think it might be moderation.  Thanks for the confirmation,
Michael.

It’s out now:

https://www.postgresql.org/message-id/E1q9Gms-001h5g-8Q%40gemulon.postgresql.org 



Thank you for committing this and congratulations on your first commit!

Regards,
-David




Re: [PATCH] Add native windows on arm64 support

2023-06-14 Thread Niyas Sait

Hello,

Just wanted to give a heads up that I am moving to a new role outside 
Linaro and as a result wouldn't be able to continue contributing.


Anthony Roberts from Linaro is going to support the enablement work.

--
Niyas




Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)

2023-06-14 Thread Ranier Vilela
Em qua., 14 de jun. de 2023 às 06:51, Richard Guo 
escreveu:

>
> On Tue, Jun 13, 2023 at 3:39 PM Kyotaro Horiguchi 
> wrote:
>
>> Gurjeet has mentioned that eb.rel cannot be modified by another
>> process since the value or memory is in the local stack, and I believe
>> he's correct.
>>
>> If the pointed Relation had been blown out, eb.rel would be left
>> dangling, not nullified. However, I don't believe this situation
>> happens (or it shouldn't happen) as the entire relation should already
>> be locked.
>
>
> Yeah, Gurjeet is right.  I had a thinko here.  eb.rel should not be NULL
> pointer in any case.  And as we've acquired the lock for it, it should
> not have been closed.  So I think we can remove the check for eb.rel in
> the two places.
>
Ok,
As there is a consensus on removing the tests and the comment is still
relevant,
here is a new version for analysis.

regards,
Ranier Vilela


v3-0001-Remove-always-true-checks.patch
Description: Binary data


Re: [PATCH] Slight improvement of worker_spi.c example

2023-06-14 Thread Aleksander Alekseev
Hi Nathan,

> > That being said, I still don't understand why you focus on this tiny and not
> > really important detail while the module itself is actually broken (for 
> > dynamic
> > bgworker without s_p_l) and also has some broken behaviors with regards to 
> > the
> > naptime that are way more likely to hurt third party code that was written
> > using this module as an example.
>
> Are you or Aleksander interested in helping improve this module?  I'm happy
> to help review and/or write patches.

Unfortunately I'm not familiar with the problem in respect of naptime
Julien is referring to. If you know what this problem is and how to
fix it, go for it. I'll review and test the code then. I can write the
part of the patch that fixes the part regarding dynamic workers if
necessary.

-- 
Best regards,
Aleksander Alekseev




Re: Setting restrictedtoken in pg_regress

2023-06-14 Thread Andrew Dunstan


On 2023-06-12 Mo 19:43, Nathan Bossart wrote:

On Tue, Jun 13, 2023 at 08:29:19AM +0900, Michael Paquier wrote:

I am actually a bit confused with the return value of
CreateRestrictedProcess() on failures in restricted_token.c.  Wouldn't
it be cleaner to return INVALID_HANDLE_VALUE rather than 0 in these
cases?

My suspicion is that this was chosen to align with CreateProcess and to
allow things like

if (!CreateRestrictedProcess(...))



Probably, it's been a while. I doubt it's worth changing at this point, 
and we could just change pg_regress.c to use a boolean test like the above.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Pluggable toaster

2023-06-14 Thread Aleksander Alekseev
Hi,

> > I have a question for the community - why was this patch silently put to a 
> > "rejected" status?
> > Should there no be some kind of explanation?

I wouldn't say that it happened "silently" nor that the reason is so
mysterious [1].

[1]: 
https://www.postgresql.org/message-id/flat/CAM-w4HPjg7NwEWBtXn1empgAg3fqJHifHo_nhgqFWopiYaNxYg%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev




Re: Views no longer in rangeTabls?

2023-06-14 Thread Amit Langote
On Wed, Jun 14, 2023 at 15:49 Amit Langote  wrote:

> On Wed, Jun 14, 2023 at 15:44 Michael Paquier  wrote:
>
>> On Wed, Jun 14, 2023 at 02:34:56PM +0900, Amit Langote wrote:
>> > This being my first commit, I intently looked to check if everything’s
>> set
>> > up correctly. While it seemed to have hit gitweb and GitHub, it didn’t
>> > pgsql-committers for some reason.
>>
>> It seems to me that the email of pgsql-committers is just being held
>> in moderation.  Your first commit is in the tree, so this worked fine
>> seen from here.  Congrats!
>
>
> Ah, did think it might be moderation.  Thanks for the confirmation,
> Michael.
>

It’s out now:

https://www.postgresql.org/message-id/E1q9Gms-001h5g-8Q%40gemulon.postgresql.org


-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: Inconsistent results with libc sorting on Windows

2023-06-14 Thread Juan José Santamaría Flecha
On Wed, Jun 14, 2023 at 4:13 AM Peter Geoghegan  wrote:

> On Tue, Jun 13, 2023 at 5:59 PM Thomas Munro 
> wrote:
> > Trying to follow along here... you're doing the moral equivalent of
> > strxfrm(), so sort keys have the transitive property but direct string
> > comparisons don't?  Or is this because LCIDs reach a different
> > algorithm somehow (or otherwise why do you need to use LCIDs for this,
> > when there is a non-LCID version of that function, with a warning not
> > to use the older LCID version[1]?)
>
> I'm reminded of the fact that the abbreviated keys strxfrm() debacle
> (back when 9.5 was released) was caused by a bug in strcoll() -- not a
> bug in strxfrm() itself. From our point of view the problem was that
> strxfrm() failed to be bug compatible with strcoll() due to a buggy
> strcoll() optimization.
>
> I believe that strxfrm() is generally less likely to have bugs than
> strcoll(). There are far fewer opportunities to dodge unnecessary work
> in the case of strxfrm()-like algorithms (offering something like
> ICU's pg_strnxfrm_prefix_icu() prefix optimization is the only one).
> On the other hand, collation library implementers are likely to
> heavily optimize strcoll() for typical use-cases such as sorting and
> binary search. Using strxfrm() for everything is discouraged [1].
>

Yes, I think the situation is quite similar to what you describe, with its
WIN32 peculiarities. Take for example the attached program, it'll output:

s1 = s2
s2 = s3
s1 > s3
c1 > c2
c2 > c3
c1 > c3

As you can see the test for CompareStringEx() is broken, but we get a sane
answer with LCMapStringEx().

Regards,

Juan José Santamaría Flecha


compare_transitivity.c
Description: Binary data


Re:Re: [DOC] Update ALTER SUBSCRIPTION documentation v3

2023-06-14 Thread 刘庄
Hi:
  postgres ODBC's driver psqlodbcw.so supports Unicode. You can do this by 
checking the value of the SQL_ATTR_ANSI_APP attribute; if it is SQL_AA_FALSE, 
Unicode is supported; If the value is SQL_AA_TRUE, ANSI is supported

















At 2023-06-14 11:54:46, "Peter Smith"  wrote:
>On Wed, Jun 14, 2023 at 1:10 PM Amit Kapila  wrote:
>>
>> On Wed, May 17, 2023 at 11:57 AM Peter Smith  wrote:
>> >
>> > On Wed, May 17, 2023 at 2:53 PM Robert Sjöblom
>> >  wrote:
>> > >
>> > > Attached is the revised version.
>> > >
>> >
>> > v4 looks good to me.
>> >
>>
>> The latest version looks good to me as well. I think we should
>> backpatch this change as this is a user-facing message change in docs
>> and code. What do you guys think?
>>
>
>I do not know the exact criteria for deciding to back-patch, but I am
>not sure back-patching is so important for this one.
>
>It is not a critical bug-fix, and despite being a user-facing change,
>there is no functional change. Also, IIUC the previous docs existed
>for 6 years without problem.
>
>--
>Kind Regards,
>Peter Smith.
>Fujitsu Australia
>


Re: Pluggable toaster

2023-06-14 Thread Pavel Borisov
On Wed, 14 Jun 2023 at 14:05, Nikita Malakhov  wrote:
>
> Hi!
>
> I have a question for the community - why was this patch silently put to a 
> "rejected" status?
> Should there no be some kind of explanation?
>
> During this discussion I got the impression that for some reason some members 
> of the community
> do not want the TOAST functionality, which has such drawbacks that make it 
> really a curse for
> in many ways very good DBMS, to be touched. We cannot get rid of it because 
> of backwards
> compatibility, so the best way is to make it more adaptable and extensible - 
> this is what this thread
> is about. We proposed our vision on how to extend the TOAST Postgres-way, 
> like Pluggable
> Storage some time before.
>
> There are some very complex subjects left in this topic that really need a 
> community' attention.
> I've mentioned them above, but there was no feedback on them.
>
> Pavel, we've already had an update implementation for TOAST. But it is a part 
> of a Pluggable
> TOAST and it hardly would be here without it. I've started another thread on 
> extending the TOAST
> pointer, maybe you would want to participate there [1].
>
> We still would be grateful for feedback.
>
> [1] Extending the TOAST Pointer
I don't see a clear reason it's rejected, besides technically it's
Waiting on Author since January. If it's a mistake and the patch is
up-to-date you can set an appropriate status.

Regards,
Pavel Borisov,
Supabase.




Re: Pluggable toaster

2023-06-14 Thread Nikita Malakhov
Hi!

I have a question for the community - why was this patch silently put to a
"rejected" status?
Should there no be some kind of explanation?

During this discussion I got the impression that for some reason some
members of the community
do not want the TOAST functionality, which has such drawbacks that make it
really a curse for
in many ways very good DBMS, to be touched. We cannot get rid of it because
of backwards
compatibility, so the best way is to make it more adaptable and extensible
- this is what this thread
is about. We proposed our vision on how to extend the TOAST Postgres-way,
like Pluggable
Storage some time before.

There are some very complex subjects left in this topic that really need a
community' attention.
I've mentioned them above, but there was no feedback on them.

Pavel, we've already had an update implementation for TOAST. But it is a
part of a Pluggable
TOAST and it hardly would be here without it. I've started another thread
on extending the TOAST
pointer, maybe you would want to participate there [1].

We still would be grateful for feedback.

[1] Extending the TOAST Pointer


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: Replace (GUC_UNIT_MEMORY | GUC_UNIT_TIME) with GUC_UNIT in guc.c

2023-06-14 Thread Richard Guo
On Wed, Jun 14, 2023 at 3:47 PM Michael Paquier  wrote:

> On Wed, Jun 14, 2023 at 03:38:10PM +0800, Richard Guo wrote:
> > +1.  BTW, it seems that GUC_UNIT_TIME is not used anywhere except in
> > GUC_UNIT.  I was wondering if we can retire it, but maybe we'd better
> > not.  It still indicates that we need to use time units table.
>
> Some out-of-core code declaring custom GUCs could rely on that, so
> it is better not to remove it.


I see.  Thanks for pointing that out.

Thanks
Richard


Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)

2023-06-14 Thread Richard Guo
On Tue, Jun 13, 2023 at 3:39 PM Kyotaro Horiguchi 
wrote:

> Gurjeet has mentioned that eb.rel cannot be modified by another
> process since the value or memory is in the local stack, and I believe
> he's correct.
>
> If the pointed Relation had been blown out, eb.rel would be left
> dangling, not nullified. However, I don't believe this situation
> happens (or it shouldn't happen) as the entire relation should already
> be locked.


Yeah, Gurjeet is right.  I had a thinko here.  eb.rel should not be NULL
pointer in any case.  And as we've acquired the lock for it, it should
not have been closed.  So I think we can remove the check for eb.rel in
the two places.

Thanks
Richard


Re: Do we want a hashset type?

2023-06-14 Thread Tomas Vondra
On 6/14/23 06:31, Tom Dunstan wrote:
> On Mon, 12 Jun 2023 at 22:37, Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
> 
> Perhaps. So you're proposing to have this as a regular built-in type?
> It's hard for me to judge how popular this feature would be, but I guess
> people often use arrays while they actually want set semantics ...
> 
> 
> Perspective from a potential user: I'm currently working on something
> where an array-like structure with fast membership test performance
> would be very useful. The main type of query is doing an =ANY(the set)
> filter, where the set could contain anywhere from very few to thousands
> of entries (ints in our case). So we'd want the same index usage as
> =ANY(array) but would like faster row checking than we get with an array
> when other indexes are used.
> 

We kinda already do this since PG14 (commit 50e17ad281), actually. If
the list is long enough (9 values or more), we'll build a hash table
during query execution. So pretty much exactly what you're asking for.

> Our app runs connecting to either an embedded postgres database that we
> control or an external one controlled by customers - this is typically
> RDS or some other cloud vendor's DB. Having such a type as a separate
> extension would make it unusable for us until all relevant cloud vendors
> decided that it was popular enough to include - something that may never
> happen, or even if it did, now any time soon.
> 

Understood, but that's really a problem / choice of the cloud vendors.

The thing is, adding stuff to core is not free - it means the community
becomes responsible for maintenance, testing, fixing issues, etc. It's
not feasible (or desirable) to have all extensions in core, and cloud
vendors generally do have ways to support some pre-vetted extensions
that they deem useful enough. Granted, it means vetting/maintenance for
them, but that's kinda the point of managed services. And it'd not be
free for us either.

Anyway, that's mostly irrelevant, as PG14 already does the hash table
for this kind of queries. And I'm not strictly against adding some of
this into core, if it ends up being useful enough.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Using named captures in Catalog::ParseHeader()

2023-06-14 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Thu, Jun 01, 2023 at 01:12:22PM +0100, Dagfinn Ilmari Mannsåker wrote:
>> While I was rewriting the regexes I noticed that they were inconsistent
>> about whether they accepted whitespace in the parameter lists, so I took
>> the liberty to make them consistently allow whitespace after the opening
>> paren and the commas, which is what most of them already did.
>
> That's the business with \s* in CATALOG.  Is that right?  Indeed,
> that's more consistent.

Yes, \s* means "zero or more whitespace characters".

>> I've verified that the generated postgres.bki is identical to before,
>> and all tests pass.
>
> I find that pretty cool.  Nice.  Patch looks OK from here.

Thanks for the review!

- ilmari




Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2023-06-14 Thread Tomas Vondra



On 6/14/23 05:15, Zhijie Hou (Fujitsu) wrote:
> On Tuesday, June 13, 2023 12:19 PM Zhijie Hou (Fujitsu) 
>  wrote:
>>
>> On Tuesday, June 13, 2023 12:04 PM Amit Kapila 
>> wrote:
>>>
>>> On Wed, Jun 7, 2023 at 6:02 PM Tomas Vondra
>>>  wrote:


 Well, I think the issue is pretty clear - we end up with an initial
 snapshot that's in between the ASSIGNMENT and NEW_CID, and because
 NEW_CID has both xact and subxact XID it fails because we add two
 TXNs with the same LSN, not realizing one of them is subxact.

 That's obviously wrong, although somewhat benign in production
 because it only fails because of hitting an assert.

>>>
>>> Doesn't this indicate that we can end up decoding a partial
>>> transaction when we restore a snapshot? Won't that be a problem even for
>> production?
>>
>> Yes, I think it can cause the problem that only partial changes of a 
>> transaction
>> are streamed.
>> I tried to reproduce this and here are the steps. Note, to make sure the test
>> won't be affected by other running_xact WALs, I changed
>> LOG_SNAPSHOT_INTERVAL_MS to a bigger number.
>>
>> session 1:
>> -
>> create table test(a int);
>> SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot_1',
>> 'test_decoding');
>> -
>>
>> session 2:
>> -
>> - Start a transaction
>> BEGIN;
>> INSERT INTO test VALUES(1);
>> -
>>
>> session 3:
>> -
>> - Create another slot isolation_slot_2, it should choose a restart_point 
>> which is
>> - after the changes that happened in session 2. Note, to let the current slot
>> - restore another snapshot, we need to use gdb to block the current backend
>> at
>> - SnapBuildFindSnapshot(), the backend should have logged the running_xacts
>> WAL
>> - before reaching SnapBuildFindSnapshot.
>>
>> SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot_2',
>> 'test_decoding');
>> -
>>
>> session 1:
>> -
>> - Since there is a running_xacts which session 3 logged, the current backend
>> will
>> - serialize the snapshot when decoding the running_xacts WAL, and the
>> snapshot
>> - can be used by other slots (e.g. isolation_slot_2)
>>
>> SELECT data FROM pg_logical_slot_get_changes('isolation_slot_1', NULL, NULL,
>> 'skip-empty-xacts', '1', 'include-xids', '0');
>> -
>>
>> session 2:
>> -
>> - Insert some different data and commit the transaction.
>>
>> INSERT INTO test VALUES(2);
>> INSERT INTO test VALUES(3);
>> INSERT INTO test VALUES(4);
>> COMMIT
>> -
>>
>> session 3:
>> -
>> - Release the process and try to stream the changes, since the restart point 
>> is
>> - at the middle of the transaction, it will stream partial changes of the
>> - transaction which was committed in session 2:
>>
>> SELECT data FROM pg_logical_slot_get_changes('isolation_slot_2', NULL, NULL,
>> 'skip-empty-xacts', '1', 'include-xids', '0');
>> -
>>
>> Results (partial streamed changes):
>> postgres=# SELECT data FROM pg_logical_slot_get_changes('isolation_slot_2',
>> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
>>   data
>> -
>>  BEGIN
>>  table public.test: INSERT: a[integer]:2  table public.test: INSERT: 
>> a[integer]:3
>> table public.test: INSERT: a[integer]:4  COMMIT
>> (5 rows)
>>
> 
> One idea to fix the partial change stream problem would be that we record all
> the running transaction's xid when restoring the snapshot in
> SnapBuildFindSnapshot(), and in the following decoding, we skip decoding
> changes for the recorded transaction. Or we can do similar to 
> 7f13ac8(serialize the
> information of running xacts if any)
> 

We need to think about how to fix this in backbranches, and the idea
with serializing running transactions seems rather unbackpatchable (as
it changes on-disk state).

> But one point I am not very sure is that we might retore snapshot in
> SnapBuildSerializationPoint() as well where we don't have running transactions
> information. Although SnapBuildSerializationPoint() is invoked for
> XLOG_END_OF_RECOVERY and XLOG_CHECKPOINT_SHUTDOWN records which seems no
> running transaction will be there when logging. But I am not 100% sure if the
> problem can happen in this case as well.
> 

So, is the problem that we grab an existing snapshot in SnapBuildRestore
when called from SnapBuildFindSnapshot? If so, would it be possible to
just skip this while building the initial snapshot?

I tried that (by commenting out the block in SnapBuildFindSnapshot), but
it causes some output changes in test_decoding regression tests. I
haven't investigated why exactly.

Also, can you try if we still stream the partial transaction with
create_logical_replication_slot building a full snapshot?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Do we want a hashset type?

2023-06-14 Thread Joel Jacobson
On Wed, Jun 14, 2023, at 06:31, Tom Dunstan wrote:
> On Mon, 12 Jun 2023 at 22:37, Tomas Vondra 
>  wrote:
>> Perhaps. So you're proposing to have this as a regular built-in type?
>> It's hard for me to judge how popular this feature would be, but I guess
>> people often use arrays while they actually want set semantics ...
>
> Perspective from a potential user: I'm currently working on something 
> where an array-like structure with fast membership test performance 
> would be very useful. The main type of query is doing an =ANY(the set) 
> filter, where the set could contain anywhere from very few to thousands 
> of entries (ints in our case). So we'd want the same index usage as 
> =ANY(array) but would like faster row checking than we get with an 
> array when other indexes are used.

Thanks for providing an interesting use-case.

If you would like to help, one thing that would be helpful,
would be a complete runnable sql script,
that demonstrates exactly the various array-based queries
you currently use, with random data that resembles
reality as closely as possible, i.e. the same number of rows
in the tables, and similar distribution of values etc.

This would be helpful in terms of documentation,
as I think it would be good to provide Usage examples
that are based on real-life scenarios.

It would also be helpful to create realistic benchmarks when
evaluating and optimising the performance.

> Our app runs connecting to either an embedded postgres database that we 
> control or an external one controlled by customers - this is typically 
> RDS or some other cloud vendor's DB. Having such a type as a separate 
> extension would make it unusable for us until all relevant cloud 
> vendors decided that it was popular enough to include - something that 
> may never happen, or even if it did, now any time soon.

Good point.




Re: Replace (GUC_UNIT_MEMORY | GUC_UNIT_TIME) with GUC_UNIT in guc.c

2023-06-14 Thread Michael Paquier
On Wed, Jun 14, 2023 at 03:38:10PM +0800, Richard Guo wrote:
> +1.  BTW, it seems that GUC_UNIT_TIME is not used anywhere except in
> GUC_UNIT.  I was wondering if we can retire it, but maybe we'd better
> not.  It still indicates that we need to use time units table.

Some out-of-core code declaring custom GUCs could rely on that, so
it is better not to remove it.
--
Michael


signature.asc
Description: PGP signature


Re: Replace (GUC_UNIT_MEMORY | GUC_UNIT_TIME) with GUC_UNIT in guc.c

2023-06-14 Thread Richard Guo
On Wed, Jun 14, 2023 at 1:07 PM Masahiko Sawada 
wrote:

> On Wed, Jun 14, 2023 at 12:33 PM Japin Li  wrote:
> > Hi, hackers
> >
> > We use (GUC_UNIT_MEMORY | GUC_UNIT_TIME) instead of GUC_UNIT even though
> we
> > already define it in guc.h.
> >
> > Maybe using GUC_UNIT is better?  Here is a patch to fix it.
>
> Yeah, it seems more consistent with other places in guc.c. I'll push
> it, barring any objections.


+1.  BTW, it seems that GUC_UNIT_TIME is not used anywhere except in
GUC_UNIT.  I was wondering if we can retire it, but maybe we'd better
not.  It still indicates that we need to use time units table.

Thanks
Richard


Re: pgindent vs. pgperltidy command-line arguments

2023-06-14 Thread Peter Eisentraut

On 25.05.23 15:20, Tom Lane wrote:

Peter Eisentraut  writes:

Until PG15, calling pgindent without arguments would process the whole
tree.  Now you get
No files to process at ./src/tools/pgindent/pgindent line 372.
Is that intentional?


It was intentional, cf b16259b3c and the linked discussion.


Also, pgperltidy accepts no arguments and always processes the whole
tree.  It would be nice if there were a way to process individual files
or directories, like pgindent can.


+1, although I wonder if we shouldn't follow pgindent's new lead
and require some argument(s).


That makes sense to me.  Here is a small update with this behavior 
change and associated documentation update.
From 44f7bcdcc0849a55459d4c2da27ee1976c704933 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Jun 2023 09:33:03 +0200
Subject: [PATCH v2] Allow and require passing files on command line of
 pgperltidy

pgperltidy as well as pgperlcritic and pgperlsyncheck now allow
passing files and directories on the command line, like pgindent does.
(Previously, they would always operate on the whole tree.)

Also, for consistency with pgindent's new behavior (as of b16259b3c1),
passing an argument is now required.  To get the previous default
behavior, use "pgperltidy ." for example.

Discussion: 
https://www.postgresql.org/message-id/flat/45aacd8a-5265-d9da-8df2-b8e2c0cf6a07%40eisentraut.org
---
 src/tools/perlcheck/find_perl_files | 8 ++--
 src/tools/perlcheck/pgperlcritic| 2 +-
 src/tools/perlcheck/pgperlsyncheck  | 2 +-
 src/tools/pgindent/README   | 2 +-
 src/tools/pgindent/pgperltidy   | 2 +-
 5 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/tools/perlcheck/find_perl_files 
b/src/tools/perlcheck/find_perl_files
index fd99dab83b..20dceb800d 100644
--- a/src/tools/perlcheck/find_perl_files
+++ b/src/tools/perlcheck/find_perl_files
@@ -3,11 +3,15 @@
 # shell function to find all perl files in the source tree
 
 find_perl_files () {
+   if [ $# -eq 0 ]; then
+   echo 'No files to process' 1>&2
+   return
+   fi
 {
# take all .pl and .pm files
-   find . -type f -name '*.p[lm]' -print
+   find "$@" -type f -name '*.p[lm]' -print
# take executable files that file(1) thinks are perl files
-   find . -type f -perm -100 -exec file {} \; -print |
+   find "$@" -type f -perm -100 -exec file {} \; -print |
egrep -i ':.*perl[0-9]*\>' |
cut -d: -f1
} | sort -u | grep -v '^\./\.git/'
diff --git a/src/tools/perlcheck/pgperlcritic b/src/tools/perlcheck/pgperlcritic
index 1c2f787580..2ec6f20de3 100755
--- a/src/tools/perlcheck/pgperlcritic
+++ b/src/tools/perlcheck/pgperlcritic
@@ -14,7 +14,7 @@ PERLCRITIC=${PERLCRITIC:-perlcritic}
 
 . src/tools/perlcheck/find_perl_files
 
-find_perl_files | xargs $PERLCRITIC \
+find_perl_files "$@" | xargs $PERLCRITIC \
  --quiet \
  --program-extensions .pl \
  --profile=src/tools/perlcheck/perlcriticrc
diff --git a/src/tools/perlcheck/pgperlsyncheck 
b/src/tools/perlcheck/pgperlsyncheck
index 730f5927cd..da59c9727c 100755
--- a/src/tools/perlcheck/pgperlsyncheck
+++ b/src/tools/perlcheck/pgperlsyncheck
@@ -13,4 +13,4 @@ set -e
 # for zsh
 setopt shwordsplit 2>/dev/null || true
 
-find_perl_files | xargs -L 1 perl $INCLUDES -cw 2>&1 | grep -v OK
+find_perl_files "$@" | xargs -L 1 perl $INCLUDES -cw 2>&1 | grep -v OK
diff --git a/src/tools/pgindent/README b/src/tools/pgindent/README
index b2b134ee6a..f5fdfc5d2f 100644
--- a/src/tools/pgindent/README
+++ b/src/tools/pgindent/README
@@ -45,7 +45,7 @@ DOING THE INDENT RUN:
 
 4) Indent the Perl code using perltidy:
 
-   src/tools/pgindent/pgperltidy
+   src/tools/pgindent/pgperltidy .
 
If you want to use some perltidy version that's not in your PATH,
first set the PERLTIDY environment variable to point to it.
diff --git a/src/tools/pgindent/pgperltidy b/src/tools/pgindent/pgperltidy
index 5e704119eb..6af27d21d5 100755
--- a/src/tools/pgindent/pgperltidy
+++ b/src/tools/pgindent/pgperltidy
@@ -9,4 +9,4 @@ PERLTIDY=${PERLTIDY:-perltidy}
 
 . src/tools/perlcheck/find_perl_files
 
-find_perl_files | xargs $PERLTIDY --profile=src/tools/pgindent/perltidyrc
+find_perl_files "$@" | xargs $PERLTIDY --profile=src/tools/pgindent/perltidyrc

base-commit: 0f8cfaf8921fed35f0b92d918ce95eec7b46ff05
-- 
2.41.0



Re: pg_waldump: add test for coverage

2023-06-14 Thread Peter Eisentraut

On 06.09.22 07:57, Peter Eisentraut wrote:

I wrote a test for coverage.
Unfortunately, it seems to take quite a while to run the test.
I want to improve these execution times, but I don't know exactly what 
to do.

Therefore, I want to hear feedback from many people.


I think having some more test coverage for pg_waldump would be good, so 
I encourage you to continue working on this.


I made an updated patch that incorporates many of your ideas and code, 
just made it a bit more compact, and added more tests for various 
command-line options.  This moves the test coverage of pg_waldump from 
"bloodbath" to "mixed fruit salad", which I think is pretty good 
progress.  And now there is room for additional patches if someone wants 
to figure out, e.g., how to get more complete coverage in gindesc.c or 
whatever.
From 67d63a87cf9ea8de69801127607101b3a515fb42 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Jun 2023 08:46:47 +0200
Subject: [PATCH v2] Add more pg_waldump tests

This adds tests for most command-line options and tests for most
rmgrdesc routines.

Discussion: 
https://www.postgresql.org/message-id/flat/CAAcByaKM7zyJSDkPWv6_%2BapupY4fXXM3q3SRXas9MMNVPUGcsQ%40mail.gmail.com
---
 src/bin/pg_waldump/t/001_basic.pl | 191 ++
 1 file changed, 191 insertions(+)

diff --git a/src/bin/pg_waldump/t/001_basic.pl 
b/src/bin/pg_waldump/t/001_basic.pl
index 492a8cadd8..d4056e1b95 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,6 +3,7 @@
 
 use strict;
 use warnings;
+use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
@@ -10,4 +11,194 @@
 program_version_ok('pg_waldump');
 program_options_handling_ok('pg_waldump');
 
+# wrong number of arguments
+command_fails_like([ 'pg_waldump', ], qr/error: no arguments/, 'no arguments');
+command_fails_like([ 'pg_waldump', 'foo', 'bar', 'baz' ], qr/error: too many 
command-line arguments/, 'too many arguments');
+
+# invalid option arguments
+command_fails_like([ 'pg_waldump', '--block', 'bad' ], qr/error: invalid block 
number/, 'invalid block number');
+command_fails_like([ 'pg_waldump', '--fork', 'bad' ], qr/error: invalid fork 
name/, 'invalid fork name');
+command_fails_like([ 'pg_waldump', '--limit', 'bad' ], qr/error: invalid 
value/, 'invalid limit');
+command_fails_like([ 'pg_waldump', '--relation', 'bad' ], qr/error: invalid 
relation/, 'invalid relation specification');
+command_fails_like([ 'pg_waldump', '--rmgr', 'bad' ], qr/error: resource 
manager .* does not exist/, 'invalid rmgr name');
+command_fails_like([ 'pg_waldump', '--start', 'bad' ], qr/error: invalid WAL 
location/, 'invalid start LSN');
+command_fails_like([ 'pg_waldump', '--end', 'bad' ], qr/error: invalid WAL 
location/, 'invalid end LSN');
+
+# rmgr list: If you add one to the list, consider also adding a test
+# case exercising the new rmgr below.
+command_like([ 'pg_waldump', '--rmgr=list'], qr/^XLOG
+Transaction
+Storage
+CLOG
+Database
+Tablespace
+MultiXact
+RelMap
+Standby
+Heap2
+Heap
+Btree
+Hash
+Gin
+Gist
+Sequence
+SPGist
+BRIN
+CommitTs
+ReplicationOrigin
+Generic
+LogicalMessage$/,
+   'rmgr list');
+
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+
+# for standbydesc
+archive_mode=on
+archive_command=''
+
+# for XLOG_HEAP_TRUNCATE
+wal_level=logical
+});
+$node->start;
+
+my ($start_lsn, $start_walfile) = split /\|/, $node->safe_psql('postgres', 
q{SELECT pg_current_wal_insert_lsn(), 
pg_walfile_name(pg_current_wal_insert_lsn())});
+
+$node->safe_psql('postgres', q{
+-- heap, btree, hash, sequence
+CREATE TABLE t1 (a int GENERATED ALWAYS AS IDENTITY, b text);
+CREATE INDEX i1a ON t1 USING btree (a);
+CREATE INDEX i1b ON t1 USING hash (b);
+INSERT INTO t1 VALUES (default, 'one'), (default, 'two');
+DELETE FROM t1 WHERE b = 'one';
+TRUNCATE t1;
+
+-- abort
+START TRANSACTION;
+INSERT INTO t1 VALUES (default, 'three');
+ROLLBACK;
+
+-- unlogged/init fork
+CREATE UNLOGGED TABLE t2 (x int);
+CREATE INDEX i2 ON t2 USING btree (x);
+INSERT INTO t2 SELECT generate_series(1, 10);
+
+-- gin
+CREATE TABLE gin_idx_tbl (id bigserial PRIMARY KEY, data jsonb);
+CREATE INDEX gin_idx ON gin_idx_tbl USING gin (data);
+INSERT INTO gin_idx_tbl
+WITH random_json AS (
+SELECT json_object_agg(key, trunc(random() * 10)) as json_data
+FROM unnest(array['a', 'b', 'c']) as u(key))
+  SELECT generate_series(1,500), json_data FROM random_json;
+
+-- gist, spgist
+CREATE TABLE gist_idx_tbl (p point);
+CREATE INDEX gist_idx ON gist_idx_tbl USING gist (p);
+CREATE INDEX spgist_idx ON gist_idx_tbl USING spgist (p);
+INSERT INTO gist_idx_tbl (p) VALUES (point '(1, 1)'), (point '(3, 2)'), (point 
'(6, 3)');
+
+-- brin
+CREATE TABLE brin_idx_tbl (col1 int, col2 text, col3 text );
+CREATE INDEX brin_idx ON brin_idx_tbl USING brin (col1, col2, col3) WITH 
(autosum

Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Andreas Karlsson

On 6/14/23 09:01, Kyotaro Horiguchi wrote:

At Wed, 14 Jun 2023 08:46:05 +0300, Konstantin Knizhnik  
wrote in

But I do not think that it is somehow related with using threads
instead of process.
The question whether to use private or shared cache is not directly
related to threads vs. process choice.


Yeah, I unconsciously conflated the two things. We can use per-thread
cache on multithreading.


For sure, and we can drop the cache when dropping the memory context. 
And in the first versions of an imagined threaded PostgreSQL I am sure 
that is how things will work.


Then later someone will have to investigate which caches are worth 
making shared and what the eviction/expiration strategy should be.


Andreas




Re: Let's make PostgreSQL multi-threaded

2023-06-14 Thread Kyotaro Horiguchi
At Wed, 14 Jun 2023 08:46:05 +0300, Konstantin Knizhnik  
wrote in 
> But I do not think that it is somehow related with using threads
> instead of process.
> The question whether to use private or shared cache is not directly
> related to threads vs. process choice.

Yeah, I unconsciously conflated the two things. We can use per-thread
cache on multithreading.

> Yes, threads makes implementation of shared cache much easier. But it
> can be also done using dynamic
> memory segments, Definitely shared cache has its pros and cons, first
> if all it requires sycnhronization
> which may have negative impact o performance.

True.

> I have made an attempt to combine both caches: use relatively small
> per-backend local cache
> and large shared cache.
> I wonder what people think about the idea to make backends less thick
> by using shared cache.

I remember of a relatively old thread about that.

https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A567B9245%40G01JPEXMBKW04

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center