date:20210713

On Wed, Jul 14, 2021 at 12:51 AM Alvaro Herrera  wrote:
>
> On 2021-Jul-13, Tomas Vondra wrote:
>
> > On 7/13/21 5:44 PM, Jeff Davis wrote:
>
> > > * Andres also mentioned that the function should not leak memory.
> > > * One use case for this feature is when sharding a table, so the
> > > expression should allow things like "hashint8(x) between ...". I'd
> > > really like to see this problem solved, as well.
> >
..
> >
> > Not sure about the memory leaks - I suppose we'd free memory for each row,
> > so this shouldn't be an issue I guess ...
>
> I'm not sure we need to be terribly strict about expression evaluation
> not leaking any memory here.   I'd rather have a memory context that can
> be reset per row.
>

I also think that should be sufficient here and if I am reading
correctly patch already evaluates the expression in per-tuple context
and reset it for each tuple. Jeff, do you or Andres have something
else in mind?

-- 
With Regards,
Amit Kapila.

Re: [PATCH] document

2021-07-13 Thread Laurenz Albe

On Wed, 2021-07-14 at 14:43 +0900, Ian Lawrence Barwick wrote:
> Hi
> 
> The description for "pg_database" [1] mentions the function
> "pg_encoding_to_char()", but this is not described anywhere in the docs. Given
> that that it (and the corresponding "pg_char_to_encoding()") have been around
> since 7.0 [2], it's probably not a burning issue, but it seems not entirely
> unreasonable to add short descriptions for both (and link from "pg_conversion"
> while we're at it); see attached patch. "System Catalog Information Functions"
> seems the most logical place to put these.
> 
> [1] https://www.postgresql.org/docs/current/catalog-pg-database.html
> [2] 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5eb1d0deb15f2b7cd0051bef12f3e091516c723b
> 
> Will add to the next commitfest.

+1

Yours,
Laurenz Albe

Re: [PATCH] psql: \dn+ to show size of each schema..

2021-07-13 Thread Pavel Stehule

st 14. 7. 2021 v 7:42 odesílatel Laurenz Albe 
napsal:

> On Wed, 2021-07-14 at 14:05 +0900, Ian Lawrence Barwick wrote:
> > 2021年7月14日(水) 12:07 Justin Pryzby :
> > > \db+ and \l+ show sizes of tablespaces and databases, so I was
> surprised in the
> > > past that \dn+ didn't show sizes of schemas.  I would find that
> somewhat
> > > convenient, and I assume other people would use it even more useful.
> >
> > It's something which would be useful to have. But see this previous
> proposal:
> >
> >
> https://www.postgresql.org/message-id/flat/2d6d2ebf-4dbc-4f74-17d8-05461f4782e2%40dalibo.com
>
> Right, I would not like to cause a lot of I/O activity just to look at the
> permissions on a schema...
>
> Besides, schemas are not physical, but logical containers.  So I see a
> point in
> measuring the storage used in a certain tablespace, but not so much by all
> objects
> in a certain schema.  It might be useful for accounting purposes, though.
> But I don't expect it to be in frequent enough demand to add a psql
> command.
>
> What about inventing a function pg_schema_size(regnamespace)?
>

+1 good idea

Pavel


> Yours,
> Laurenz Albe
>
>
>
>

[PATCH] document

2021-07-13 Thread Ian Lawrence Barwick

Hi

The description for "pg_database" [1] mentions the function
"pg_encoding_to_char()", but this is not described anywhere in the docs. Given
that that it (and the corresponding "pg_char_to_encoding()") have been around
since 7.0 [2], it's probably not a burning issue, but it seems not entirely
unreasonable to add short descriptions for both (and link from "pg_conversion"
while we're at it); see attached patch. "System Catalog Information Functions"
seems the most logical place to put these.

[1] https://www.postgresql.org/docs/current/catalog-pg-database.html
[2] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5eb1d0deb15f2b7cd0051bef12f3e091516c723b

Will add to the next commitfest.

Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com
commit b8617bc6e6ca551ce51d661de10e6d3830e80694
Author: Ian Barwick 
Date:   Wed Jul 14 12:44:58 2021 +0900

document pg_encoding_to_char() and pg_char_to_encoding()

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 0f5d25b948..1ee07854ed 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2819,7 +2819,8 @@ SCRAM-SHA-256$iteration count:
conforencoding int4
   
   
-   Source encoding ID
+   Source encoding ID (pg_encoding_to_char()
+   can translate this number to the encoding name)
   
  
 
@@ -2828,7 +2829,8 @@ SCRAM-SHA-256$iteration count:
contoencoding int4
   
   
-   Destination encoding ID
+   Destination encoding ID (pg_encoding_to_char()
+   can translate this number to the encoding name)
   
  
 
@@ -2927,7 +2929,7 @@ SCRAM-SHA-256$iteration count:
   
   
Character encoding for this database
-   (pg_encoding_to_char() can translate
+   (pg_encoding_to_char() can translate
this number to the encoding name)
   
  
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6388385edc..af0488fed7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22987,6 +22987,36 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);

   
 
+  
+   
+
+ pg_char_to_encoding
+
+pg_char_to_encoding ( encoding name )
+int
+   
+   
+Converts the supplied encoding name into an integer representing the
+internal identifier used in some system catalog tables.
+Returns -1 if an unknown encoding name is provided.
+   
+  
+
+  
+   
+
+ pg_encoding_to_char
+
+pg_encoding_to_char ( encoding int )
+name
+   
+   
+Converts the integer used as the internal identifier of an encoding in some
+system catalog tables into a human-readable string.
+Returns an empty string if an invalid encoding number is provided.
+   
+  
+

Re: [PATCH] psql: \dn+ to show size of each schema..

2021-07-13 Thread Laurenz Albe

On Wed, 2021-07-14 at 14:05 +0900, Ian Lawrence Barwick wrote:
> 2021年7月14日(水) 12:07 Justin Pryzby :
> > \db+ and \l+ show sizes of tablespaces and databases, so I was surprised in 
> > the
> > past that \dn+ didn't show sizes of schemas.  I would find that somewhat
> > convenient, and I assume other people would use it even more useful.
> 
> It's something which would be useful to have. But see this previous proposal:
> 
>
> https://www.postgresql.org/message-id/flat/2d6d2ebf-4dbc-4f74-17d8-05461f4782e2%40dalibo.com

Right, I would not like to cause a lot of I/O activity just to look at the
permissions on a schema...

Besides, schemas are not physical, but logical containers.  So I see a point in
measuring the storage used in a certain tablespace, but not so much by all 
objects
in a certain schema.  It might be useful for accounting purposes, though.
But I don't expect it to be in frequent enough demand to add a psql command.

What about inventing a function pg_schema_size(regnamespace)?

Yours,
Laurenz Albe

Re: row filtering for logical replication

On Wed, Jul 14, 2021 at 6:28 AM Euler Taveira  wrote:
>
> On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:
>
> 1. if you use REPLICA IDENTITY FULL, then the expressions would work
> even if they use any other column with DELETE.  Maybe it would be
> reasonable to test for this in the code and raise an error if the
> expression requires a column that's not part of the replica identity.
> (But that could be relaxed if the publication does not publish
> updates/deletes.)
>

+1.

> I thought about it but came to the conclusion that it doesn't worth it.  Even
> with REPLICA IDENTITY FULL expression evaluates to false if the column allows
> NULL values. Besides that REPLICA IDENTITY is changed via another DDL (ALTER
> TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
> because some row filter uses the column you want to remove from it.
>

Yeah, that is required but is it not feasible to do so?

> 2. For UPDATE, does the expression apply to the old tuple or to the new
> tuple?  You say it's the new tuple, but from the user point of view I
> think it would make more sense that it would apply to the old tuple.
> (Of course, if you're thinking that the R.I. is the PK and the PK is
> never changed, then you don't really care which one it is, but I bet
> that some people would not like that assumption.)
>
> New tuple. The main reason is that new tuple is always there for UPDATEs.
>

I am not sure if that is a very good reason to use a new tuple.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] improve the pg_upgrade error message

2021-07-13 Thread Suraj Kharage

Thanks Jeevan for working on this.
Overall patch looks good to me.

+ pg_fatal("All non-template0 databases must allow connections, i.e.
their\n"
+ "pg_database.datallowconn must be true. Your installation contains\n"
+ "non-template0 databases with their pg_database.datallowconn set to\n"
+ "false. Consider allowing connection for all non-template0 databases\n"
+ "using:\n"
+ "UPDATE pg_catalog.pg_database SET datallowconn='true' WHERE datname
NOT LIKE 'template0';\n"
+ "A list of databases with the problem is given in the file:\n"
+ "%s\n\n", output_path);

Instead of giving suggestion about updating the pg_database catalog, can we
give "ALTER DATABASE  ALLOW_CONNECTIONS true;" command?
Also, it would be good if we give 2 spaces after full stop in an error
message.

On Tue, Jul 13, 2021 at 6:57 PM Jeevan Ladhe 
wrote:

>
> The admin might fix DB123, restart their upgrade procedure, spend 5 or 15
>> minutes with that, only to have it then fail on DB1234.
>>
>
> Agree with this observation.
>
> Here is a patch that writes the list of all the databases other than
> template0
> that are having their pg_database.datallowconn to false in a file. Similar
> approach is seen in other functions like check_for_data_types_usage(),
> check_for_data_types_usage() etc. Thanks Suraj Kharage for the offline
> suggestion.
>
> PFA patch.
>
> For experiment, here is how it turns out after the fix.
>
> postgres=# update pg_database set datallowconn='false' where datname in
> ('mydb', 'mydb1', 'mydb2');
> UPDATE 3
>
> $ pg_upgrade -d /tmp/v96/data -D /tmp/v13/data -b $HOME/v96/install/bin -B
> $HOME/v13/install/bin
> Performing Consistency Checks
> -
> Checking cluster versions   ok
> Checking database user is the install user  ok
> Checking database connection settings   fatal
>
> All non-template0 databases must allow connections, i.e. their
> pg_database.datallowconn must be true. Your installation contains
> non-template0 databases with their pg_database.datallowconn set to
> false. Consider allowing connection for all non-template0 databases
> using:
> UPDATE pg_catalog.pg_database SET datallowconn='true' WHERE datname
> NOT LIKE 'template0';
> A list of databases with the problem is given in the file:
> databases_with_datallowconn_false.txt
>
> Failure, exiting
>
> $ cat databases_with_datallowconn_false.txt
> mydb
> mydb1
> mydb2
>
>
> Regards,
> Jeevan Ladhe
>


-- 
--

Thanks & Regards,
Suraj kharage,



edbpostgres.com

Re: [PATCH] psql: \dn+ to show size of each schema..

2021-07-13 Thread Ian Lawrence Barwick

Hi

2021年7月14日(水) 12:07 Justin Pryzby :
>
> \db+ and \l+ show sizes of tablespaces and databases, so I was surprised in 
> the
> past that \dn+ didn't show sizes of schemas.  I would find that somewhat
> convenient, and I assume other people would use it even more useful.

It's something which would be useful to have. But see this previous proposal:

   
https://www.postgresql.org/message-id/flat/2d6d2ebf-4dbc-4f74-17d8-05461f4782e2%40dalibo.com


Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com

Re: proposal: possibility to read dumped table's name from file

2021-07-13 Thread Pavel Stehule

Hi


> You're right- no one followed up on that.  Instead, one group continues
> to push for 'simple' and to just accept what's been proposed, while
> another group counters that we should be looking at the broader design
> question and work towards a solution which will work for us down the
> road, and not just right now.
>
> One thing remains clear- there's no consensus here.
>

I think there should be some misunderstanding about the target of this
patch, and I am afraid so there cannot be consensus, because the people are
speaking about two very different features. And it is not possible to push
it to one thing. It cannot work I am afraid.

1. The main target of this patch is to solve the problem with the too large
command line of pg_dump when there are a lot of dumped objects. You need to
call pg_dump only once to ensure dump in one transaction. And sometimes it
is not possible to use wild characters effectively, because the state of
objects is in different databases. Enhancing the length of the command line
is not secure, and there are other production issues. In this case you need
a very simple format - just because you want to use pg_dump in pipe. This
format should be line oriented - and usually it will contain just "dump
this table, dump second table". Nothing else. Nobody will read this format,
nobody will edit this format. Because the main platform for this format is
probably the UNIX shell, the format should be simple. I really don't see
any joy in generating JSON and parsing JSON later. These data will be
processed locally. This is one purpose designed format, and it is not
designed for holding configuration. For this purpose the complex format has
not any advantage. There is not a problem with parsing JSON or other
formats on the pg_dump side, but it is pretty hard to generate valid JSON
from bash script. For a unix shell we need the most possible simple format.
Theoretically this format (this file) can hold any pg_dump's option, but
for usual streaming processing the only filter's options will be there.
Originally this feature had the name "filter file". There are a lot of
examples of successful filter's file formats in the UNIX world, and I think
so nobody doubts about sense and usability. Probably there is a consensus
so filter's files are not config files.

The format of the filter file can look like "+d tablename" or "include data
tablename". If we find a consensus so the filter file is a good thing, then
the format design and implementation is easy work. Isn't problem to invent
comment lines.

2. Is true, so there is only a small step from filter's file to option's
file. I rewrote this patch in this direction. The advantage is universality
- it can support any options without necessity to modify related code.
Still this format is not difficult for producers, and it is simple for
parsing. Now, the format should be defined by command line format: "-t
tablename" or "--table tablename" or "table tablename". There can be issues
related to different parsers in shell and in implemented code, but it can
be solved. Isn't problem to introduce comment lines. The big advantage is
simplicity of usage, simplicity of implementation - more the implementation
is generic.

3. But the option's file is just a small step to config file. I can imagine
somebody wanting to store typical configuration (and usual options) for
psql, pg_dump, pg_restore, pgAdmin, ... somewhere. The config files are
very different creatures than filter's files. Although they can be
generated, usually are edited and can be very complex. There can be shared
parts for all applications, and specific sections for psql, and specific
sections for every database. The config files can be brutally complex. The
simple text format is not good for this purpose. And some people prefer
YAML, some people hate this format. Other people prefer XML or JSON or
anything else. Sometimes the complexity of config files is too big, and
people prefer startup scripting.

Although there is an intersection between filter's files and config files,
I see very big differences in usage. Filter's files are usually temporal
and generated and non shared. Config file's are persistent, usually
manually modified and can be shared. The requests are different, and should
be different too. I don't propose any configuration's file related
features, and my proposal doesn't block the introduction of configuration's
file in any format in future. I think these features are very different,
and should be implemented differently. The filter's file or option's file
will be a pretty ugly config file, and config's file will be a pretty
impractical filter's file.

So can we talk about implementation of filter's file or option's file? And
can we talk about implementation config's files in separate topics? Without
it, I am afraid so there is no possibility of finding an agreement and
moving forward.

Regards

Pavel















> Thanks,
>
> Stephen
>

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-07-13 Thread Peter Eisentraut


On 13.07.21 10:58, Magnus Hagander wrote:

Maybe "each distinct database id, each distinct user id, each distinct
query id, and whether it is a top level statement or not"?

Or maybe "each distinct combination of database id, user id, query id
and whether it's a top level statement or not"?


Okay, now I understand what is meant here.  The second one sounds good 
to me.

Re: Detecting File Damage & Inconsistencies

On Tue, Jul 13, 2021 at 8:29 PM Simon Riggs
 wrote:
>
> On Tue, Jul 6, 2021 at 4:21 AM Amit Kapila  wrote:
> >
> > >
> > > > If you don't think the sorts of use cases I presented are worth the 
> > > > trouble that's fair enough. I'm not against adding it on the commit 
> > > > record. It's just that with logical decoding moving toward a streaming 
> > > > model I suspect only having it at commit time may cause us some pain 
> > > > later.
> > >
> > > I think you have some good ideas about how to handle larger
> > > transactions with streaming. As a separate patch it might be worth
> > > keeping track of transaction size and logging something when a
> > > transaction gets too large.
> > >
> >
> > If we want this additional information for streaming mode in logical
> > replication then can't we piggyback it on the very first record
> > written for a transaction when this info is required? Currently, we do
> > something similar for logging top_level_xid for subtransaction in
> > XLogRecordAssemble().
>
> It's possible, but I'm struggling to believe anybody would accept that
> as an approach because it breaks simplicity, modularity and makes it
> harder to search for this info in the WAL.
>
> I was imagining that we'd keep track of amount of WAL written by a
> transaction and when it reaches a certain size generate a "streaming
> info" record as an early warning that we have a big transaction coming
> down the pipe.
>

I am not sure if that satisfies Craig's requirement of making it
available during the streaming of in-progress xacts during logical
replication. It is quite possible that by the time we decide to start
streaming a transaction this information won't be logged yet.

> I'm feeling that a simple patch is expanding well beyond its original
> scope and timeline. How can we do this simply?
>

The patch is simple but its use doesn't seem to be very clear. You
have mentioned its use for future PITR patches and Craig mentioned
some use cases in logical decoding and it appears to me that to
support the use cases mentioned by Craig, it is important to LOG this
earlier than at commit time. As there are no details about how it will
be used for PITR patches and whether such patch ideas are accepted, it
makes it harder to judge the value of this patch.

I think if we would have patches (even at WIP/POC stage) for the ideas
you and Craig have in mind, it would have been much easier to see the
value of this patch.

-- 
With Regards,
Amit Kapila.

[PATCH] psql: \dn+ to show size of each schema..

2021-07-13 Thread Justin Pryzby

\db+ and \l+ show sizes of tablespaces and databases, so I was surprised in the
past that \dn+ didn't show sizes of schemas.  I would find that somewhat
convenient, and I assume other people would use it even more useful.

\db+ and \l+ seem to walk the filesystem, and this is distinguished from those
cases.  (Also, schemas are per-DB, not global).

Maybe it's an issue if \dn+ is slow and expensive, since that's how to display
ACL.  But \db+ has the same issue.  Maybe there should be a \db++ and \dn++ to
allow \dn+ to showing the ACL but not the size.

pg_relation_size() only includes one fork, and the other functions include
toast, which should be in its separate schema, so it has to be summed across
forks.

postgres=# \dnS+
 child  | postgres |  | 
 | 946 MB
 information_schema | postgres | postgres=UC/postgres+| 
 | 88 kB
|  | =U/postgres  | 
 | 
 pg_catalog | postgres | postgres=UC/postgres+| system catalog schema   
 | 42 MB
|  | =U/postgres  | 
 | 
 pg_toast   | postgres |  | reserved schema for 
TOAST tables | 3908 MB
 public | postgres | postgres=UC/postgres+| standard public schema  
 | 5627 MB
|  | =UC/postgres | 
 | 

>From c2d68eb54f785c759253d4100460aa1af9cbc676 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH] psql: \dn+ to show size of each schema..

See also: 358a897fa, 528ac10c7
---
 src/bin/psql/describe.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 2abf255798..6b9b6ea34a 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -5036,6 +5036,11 @@ listSchemas(const char *pattern, bool verbose, bool 
showSystem)
appendPQExpBuffer(,
  ",\n  
pg_catalog.obj_description(n.oid, 'pg_namespace') AS \"%s\"",
  gettext_noop("Description"));
+
+   appendPQExpBuffer(,
+ ",\n  (SELECT 
pg_catalog.pg_size_pretty(sum(pg_relation_size(oid,fork))) FROM 
pg_catalog.pg_class c,\n"
+ " 
(VALUES('main'),('fsm'),('vm'),('init')) AS fork(fork) WHERE c.relnamespace = 
n.oid) AS \"%s\"",
+ gettext_noop("Size"));
}
 
appendPQExpBufferStr(,
-- 
2.17.0

Re: [Proposal] Global temporary tables

2021-07-13 Thread Ming Li

Hi Wenjing,

Some suggestions may help:

1) It seems that no test case covers the below scenario: 2 sessions attach
the same gtt, and insert/update/select concurrently. It is better to use
the test framework in src/test/isolation like the code changes in
https://commitfest.postgresql.org/24/2233/.

2) CREATE GLOBAL TEMP SEQUENCE also need to be supported
in src/bin/psql/tab-complete.c

On Wed, Jul 14, 2021 at 10:36 AM wenjing  wrote:

> Rebase code based on the latest version.
>
> Regards,
> wenjing
>
>

Re: Introduce pg_receivewal gzip compression tests

On Tue, Jul 13, 2021 at 11:16:06AM +, gkokola...@pm.me wrote:
> Agreed. For the record that is why I said v6 :)

Okay, thanks.
--
Michael


signature.asc
Description: PGP signature

RE: Partition Check not updated when insert into a partition

2021-07-13 Thread houzj.f...@fujitsu.com

On Tuesday, July 13, 2021 2:52 AM Alvaro Herrera  
wrote:
> On 2021-Jun-23, houzj.f...@fujitsu.com wrote:
> 
> > For a multi-level partition, for example: table 'A' is partition of
> > table 'B', and 'B' is also partition of table 'C'. After I 'ALTER
> > TABLE C DETACH B', I thought partition constraint check of table 'C'
> > does not matter anymore if INSERT INTO table 'A'. But it looks like
> > the relcache of 'A' is not invalidated after detaching 'B'. And the
> > relcache::rd_partcheck still include the partition constraint of table
> > 'C'. Note If I invalidate the table 'A''s relcache manually, then next
> > time the relcache::rd_partcheck will be updated to the expected one which
> does not include partition constraint check of table 'C'.
> 
> Hmm, if I understand correctly, this means that we need to invalidate relcache
> for all partitions of the partition being detached.  Maybe like in the 
> attached
> WIP ("XXX VERY CRUDE XXX DANGER EATS DATA") patch, which solves what
> you complained about, but I didn't run any other tests.
> (Also, in the concurrent case I think this should be done during the first
> transaction, so this patch is wrong for it.)
> 
> Did you have a misbehaving test for the ATTACH case?

Thanks for the response.

Yes, I think the following example of ATTACH doesn't work as expected.

---
create table parttable1 (a int, b int, c int) partition by list(a);
create table parttable2 (a int, b int, c int) partition by list(b);
create table parttable3 (a int, b int, c int);
alter table parttable2 attach partition parttable3 for values in (1);

-
- INSERT a tuple into parttable3
- Cache the partitioncheck in relcache::rd_partcheck
-
insert into parttable3 values(1, 1, 0);

- Attach a new top parent
alter table parttable1 attach partition parttable2 for values in (1);

-
- INSERT a tuple which doesn't satisfy the new top parent(parttable1)'s 
partitioncheck
- But the INSERT will succeed which looks not as expected.
-
insert into parttable3 values(999, 1, 0);

-
- And when I reconnect to clean the cache
- INSERT a tuple which doesn't satisfy the new top parent(parttable1)'s 
partitioncheck
- INSERT will fail due to partition check violation.
-
insert into parttable3 values(999, 1, 0);

Best regards,
Hou zhijie

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

Em ter., 13 de jul. de 2021 às 22:15, David Rowley 
escreveu:

> On Tue, 13 Jul 2021 at 23:45, Ranier Vilela  wrote:
> > The question not answered is if *argno* can '>=' that
> pertrans->numTransInputs,
> > before entering the loop?
> > If *can*, the loop might be useless in that case.
> >
> >>
> >>
> >> Note that we're doing argno++ inside the loop.
> >
> > Another question is, if *argno* can '>' that pertrans->numTransInputs,
> > before the loop, the test will fail?
> > if (argno == pertrans->numTransInputs)
>
> argno is *always* 0 before the loop starts.
>
Good. Thanks.

regards,
Ranier Vilela

Re: [PATCH] Pull general SASL framework out of SCRAM

On Tue, Jul 13, 2021 at 10:41:01PM +, Jacob Champion wrote:
> Just to make sure -- do we want to export the fe-auth-sasl.h header as
> opposed to forward-declaring the pg_fe_sasl_mech struct?

Installing fe-auth-sasl.h has the advantage to make the internals of
the callbacks available to applications playing with the internals.
For SASL, it makes things easier to define new mechanisms out of
core.

> Is the use
> case for libpq-int.h just "here, have at the internals, and if you
> break it then it's on you"?

Yes, it can be useful for applications willing to use the internals of
libpq, like in forks.  There is no guarantee that this will not break
across major version upgrades, so that's up to the user to fix things
once they play with the internals.
--
Michael


signature.asc
Description: PGP signature

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-13 Thread Kyotaro Horiguchi

Thanks for the discussion.

At Tue, 13 Jul 2021 09:28:30 +0900, Michael Paquier  wrote 
in 
> On Fri, Jul 09, 2021 at 04:50:28PM +0900, Kyotaro Horiguchi wrote:
> > At Fri, 9 Jul 2021 10:29:07 +0900, Michael Paquier  
> > wrote in 
> >> Er, wait.  We've actually allowed negative values for pg_ctl
> >> --timeout or the subcommand kill!?
> >
> > --timeout accepts values less than 1, which values cause the command
> > ends with the timeout error before checking for the ending state. Not
> > destructive but useless as a behavior.  We have two choices here. One
> > is simply inhibiting zero or negative timeouts, and another is
> > allowing zero as timeout and giving it the same meaning to
> > --no-wait. The former is strictly right behavior but the latter is
> > casual and convenient. I took the former way in this version.
> 
> Yeah, that's not useful.

^^; Ok, I'm fine with taking the second way. Changed it.

"-t 0" means "-W" in the attached and that behavior is described in
the doc part. The environment variable PGCTLTIMEOUT accepts the same
range of values. I added a warning that notifies that -t 0 overrides
explicit -w .

> $ pg_ctl -w -t 0 start
> pg_ctl: WARNING: -w is ignored because timeout is set to 0
> server starting


> >> This one in pg_upgrade is incomplete.  Perhaps the missing comment
> >> should tell that negative job values are checked later on?
> > 
> > Zero or negative job numbers mean non-parallel mode and don't lead to
> > an error.  If we don't need to preserve that behavior (I personally
> > don't think it is ether useful and/or breaks someone's existing
> > usage.), it is sensible to do range-check here.
> 
> Hmm.  It would be good to preserve some compatibility here, but I can
> see more benefits in being consistent between all the tools, and make
> people aware that such commands are not generated more carefully.

Ageed.

> >  case 'j':
> > -opts.jobs = atoi(optarg);
> > -if (opts.jobs < 1)
> > +errno = 0;
> > +opts.jobs = strtoint(optarg, , 10);
> > +if (*endptr || errno == ERANGE || opts.jobs < 1)
> >  {
> >  pg_log_error("number of parallel jobs must be at least 
> > 1");
> >  exit(1);
> 
> specifying a value that triggers ERANGE could be confusing for values
> higher than INT_MAX with pg_amcheck --jobs:
> $ pg_amcheck --jobs=40
> pg_amcheck: error: number of parallel jobs must be at least 1
> I think that this should be reworded as "invalid number of parallel
> jobs \"$s\"", or "number of parallel jobs must be in range %d..%d".
> Perhaps we could have a combination of both?  Using the first message
> is useful with junk, non-numeric values or trailing characters.  The
> second is useful to make users aware that the value is numeric, but
> not quite right.

Yeah, I noticed that but ignored as a kind of impossible, or
something-needless-to-say:p

> "invalid number of parallel jobs \"$s\""
> "number of parallel jobs must be in range %d..%d"

The resulting combined message looks like this:

> "number of parallel jobs must be an integer in range 1..2147483647"

I don't think it's not great that the message explicitly suggests a
limit like INT_MAX, which is far above the practical limit.  So, (even
though I avoided to do that but) in the attached, I changed my mind to
split most of the errors into two messages to avoid suggesting such
impractical limits.

$ pg_amcheck -j -1
$ pg_amcheck -j 1x
$ pg_amcheck -j 10x
pg_amcheck: error: number of parallel jobs must be an integer greater than 
zero: ""
$ pg_amcheck -j 10
pg_amcheck: error: number of parallel jobs out of range: "10"

If you feel it's too-much or pointless to split that way, I'll happy
to change it the "%d..%d" form.


Still I used the "%d..%d" notation for some parameters that has a
finite range by design.  They look like the following:
(%d..%d doesn't work well for real numbers.)

"sampling rate must be an real number between 0.0 and 1.0: \"%s\""

I'm not sure what to do for numWorkers of pg_dump/restore.  The limit
for numWorkers are lowered on Windows to quite low value, which would
be worth showing, but otherwise the limit is INT_MAX. The attached
makes pg_dump respond to -j 100 with the following error message which
doesn't suggest the finite limit 64 on Windows.

$ pg_dump -j 100
pg_dump: error: number of parallel jobs out of range: "100"


> > @@ -587,8 +602,10 @@ main(int argc, char **argv)
> >  
> >  case 8:
> >  have_extra_float_digits = true;
> > -extra_float_digits = atoi(optarg);
> > -if (extra_float_digits < -15 || extra_float_digits > 3)
> > +errno = 0;
> > +extra_float_digits = strtoint(optarg, , 10);
> > +if (*endptr || errno == ERANGE ||
> > +extra_float_digits < -15 || extra_float_digits >

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2021-07-13 Thread David Rowley

On Tue, 13 Jul 2021 at 23:45, Ranier Vilela  wrote:
> The question not answered is if *argno* can '>=' that 
> pertrans->numTransInputs,
> before entering the loop?
> If *can*, the loop might be useless in that case.
>
>>
>>
>> Note that we're doing argno++ inside the loop.
>
> Another question is, if *argno* can '>' that pertrans->numTransInputs,
> before the loop, the test will fail?
> if (argno == pertrans->numTransInputs)

argno is *always* 0 before the loop starts.

David

Re: row filtering for logical replication

On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:
> 1. if you use REPLICA IDENTITY FULL, then the expressions would work
> even if they use any other column with DELETE.  Maybe it would be
> reasonable to test for this in the code and raise an error if the
> expression requires a column that's not part of the replica identity.
> (But that could be relaxed if the publication does not publish
> updates/deletes.)
I thought about it but came to the conclusion that it doesn't worth it.  Even
with REPLICA IDENTITY FULL expression evaluates to false if the column allows
NULL values. Besides that REPLICA IDENTITY is changed via another DDL (ALTER
TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
because some row filter uses the column you want to remove from it.

> 2. For UPDATE, does the expression apply to the old tuple or to the new
> tuple?  You say it's the new tuple, but from the user point of view I
> think it would make more sense that it would apply to the old tuple.
> (Of course, if you're thinking that the R.I. is the PK and the PK is
> never changed, then you don't really care which one it is, but I bet
> that some people would not like that assumption.)
New tuple. The main reason is that new tuple is always there for UPDATEs.
Hence, row filter might succeed even if the row filter contains a column that
is not part of PK or REPLICA IDENTITY. pglogical also chooses to use new tuple
when it is available (e.g. for INSERT and UPDATE). If you don't like this
approach we can (a) create a new publication option to choose between old tuple
and new tuple for UPDATEs or (b) qualify columns using a special reference
(such as NEW.id or OLD.foo). Both options can provide flexibility but (a) is
simpler.

> I think it is sensible that it's the old tuple that is matched, not the
> new; consider what happens if you change the PK in the update and the
> replica already has that tuple.  If you match on the new tuple and it
> doesn't match the expression (so you filter out the update), but the old
> tuple does match the expression, then the replica will retain the
> mismatching tuple forever.
> 
> 3. You say that a NULL value in any of those columns causes the
> expression to become false and thus the tuple is not published.  This
> seems pretty unfriendly, but maybe it would be useful to have examples
> of the behavior.  Does ExecInitCheck() handle things in the other way,
> and if so does using a similar trick give more useful behavior?
ExecInitCheck() is designed for CHECK constraints and SQL standard requires
taht NULL constraint conditions are not treated as errors. This feature uses a
WHERE clause and behaves like it. I mean, a NULL result does not return the
row. See ExecQual().


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: enable_resultcache confusion

2021-07-13 Thread David Rowley

On Tue, 13 Jul 2021 at 12:01, David Rowley  wrote:
> I plan on pushing the patch to master and PG14 in 24 hours time.  If
> anyone is still on the fence or wishes to object to the name, please
> let it be known before then.

Renaming complete.   Result Cache is gone. Welcome Memoize.

David

Re: Remove repeated calls to PQserverVersion

On 2021-Jul-14, Peter Smith wrote:

> But I never made any claims about performance; my motivation for this
> trivial patch was more like just "code tidy" or "refactor", so
> applying performance as the only worthiness criteria for a "code tidy"
> patch seemed like a misrepresentation here.
> 
> Of course you can judge the patch is still not worthwhile for other
> reasons. So be it.

Personally, I like the simplicity of the function call in those places,
because when reading just that line one immediately knows where the
value is coming from.  If you assign it to a variable, the line is not
standalone and I have to find out where the assignment is, and verify
that there hasn't been any other assignment in between.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)

Re: proposal: possibility to read dumped table's name from file

Greetings,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> On 2021-Jul-13, Stephen Frost wrote:
> > The simplest possible format isn't going to work with all the different
> > pg_dump options and it still isn't going to be 'simple' since it needs
> > to work with the flexibility that we have in what we support for object
> > names,
> 
> That's fine.  If people want a mechanism that allows changing the other
> pg_dump options that are not related to object filtering, they can
> implement a configuration file for that.

It's been said multiple times that people *do* want that and that they
want it to all be part of this one file, and specifically that they
don't want to end up with a file structure that actively works against
allowing other options to be added to it.

> > I don't know that the options that I suggested previously would
> > definitely work or not but they at least would allow other projects like
> > pgAdmin to leverage existing code for parsing and generating these
> > config files.
> 
> Keep in mind that this patch is not intended to help pgAdmin
> specifically.  It would be great if pgAdmin uses the functionality
> implemented here, but if they decide not to, that's not terrible.  They
> have survived decades without a pg_dump configuration file; they still
> can.

The adding of a config file for pg_dump should specifically be looking
at pgAdmin as the exact use-case for having such a capability.

> There are several votes in this thread for pg_dump to gain functionality
> to filter objects based on a simple specification -- particularly one
> that can be written using shell pipelines.  This patch gives it.

And several votes for having a config file that supports, or at least
can support in the future, the various options which pg_dump supports-
and active voices against having a new file format that doesn't allow
for that.

> > I'm not completely against inventing something new, but I'd really
> > prefer that we at least try to make something existing work first
> > before inventing something new that everyone is going to have to deal
> > with.
> 
> That was discussed upthread and led nowhere.

You're right- no one followed up on that.  Instead, one group continues
to push for 'simple' and to just accept what's been proposed, while
another group counters that we should be looking at the broader design
question and work towards a solution which will work for us down the
road, and not just right now.

One thing remains clear- there's no consensus here.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Remove repeated calls to PQserverVersion

2021-07-13 Thread Peter Smith

On Wed, Jul 14, 2021 at 12:15 AM Tom Lane  wrote:
>
> Michael Paquier  writes:
> > On Tue, Jul 13, 2021 at 07:02:27PM +1000, Peter Smith wrote:
> >> I found a few functions making unnecessary repeated calls to
> >> PQserverVersion(conn); instead of just calling once and assigning to a
> >> local variable.
>
> > Does it really matter?  PQserverVersion() does a simple lookup at the
> > internals of PGconn.
>
> Yeah, it'd have to be mighty hot code to be worth caring about that,
> and none of these spots look like it could be worth it.

I agree there would be no observable performance improvements.

But I never made any claims about performance; my motivation for this
trivial patch was more like just "code tidy" or "refactor", so
applying performance as the only worthiness criteria for a "code tidy"
patch seemed like a misrepresentation here.

Of course you can judge the patch is still not worthwhile for other
reasons. So be it.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PATCH] Pull general SASL framework out of SCRAM

2021-07-13 Thread Jacob Champion

On Tue, 2021-07-13 at 22:41 +, Jacob Champion wrote:
> On Tue, 2021-07-13 at 19:31 +0900, Michael Paquier wrote:
> > On Tue, Jul 13, 2021 at 12:41:27PM +0300, Mikhail Kulagin wrote:
> > > 
> > > I think the new fe-auth-sasl.h file should be installed too.
> > > Correction proposal in the attached file (but I'm not sure that fix
> > > of Install.pm is correct). 
> > 
> > That looks correct to me.  I'll check that tomorrow.
> 
> Looks right to me too. I'm currently rebuilding my Windows dev
> environment so I haven't been able to double-check that piece of it.

(Confirmed that this patch works for me on Windows.)

Thanks,
--Jacob

Re: [HACKERS] Preserving param location

Julien Rouhaud  writes:
> On Sun, Jun 27, 2021 at 11:31:53AM +0800, Julien Rouhaud wrote:
>> On Sat, Mar 11, 2017 at 11:09:32PM +0100, Julien Rouhaud wrote:
>>> When a query contains parameters, the original param node contains the token
>>> location.  However, this information is lost when the Const node is 
>>> generated,
>>> this one will only contain position -1 (unknown).
>>> 
>>> FWIW, we do have a use case for this (custom extension that tracks quals
>>> statistics, which among other thing is used to regenerate query string from 
>>> a
>>> pgss normalized query, thus needing the original param location).

I looked at this for a bit.  It's certainly a simple and
harmless-looking patch, but I'm having a hard time buying that it's
actually useful in isolation.  In particular, even if we keep the
Param's location in the immediately-resulting Const, what happens
when that node is further mashed by planner transformations?
As an example, given

regression=# prepare foo(int) as select $1 + 1;
PREPARE
regression=# explain verbose execute foo(42);
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 43
(2 rows)

this patch isn't gonna do anything to help you figure out where the
42 went.

I thought for a bit about also having evaluate_expr() inject the
exprLocation(expr) into its result Const, but in this example
that would likely result in having the 43 labeled with the offset
of the $1 (given exprLocation's penchant for preferring the leftmost
token location).  That could be more confusing not less, especially
when you consider that the variant case "1 + $1" would not be
traceable to the Param at all.

I also fear that this is still just the tip of the iceberg, in terms
of when rewriting and planning preserve useful info about the source
location of modified Nodes.  There's probably other places to touch
if we want to have consistent results.

So I'm not really convinced that there's a fully-baked use case
here, and would like more detail about how you hope to use the
location value.

regards, tom lane

Re: proposal: possibility to read dumped table's name from file

On 2021-Jul-13, Stephen Frost wrote:

> The simplest possible format isn't going to work with all the different
> pg_dump options and it still isn't going to be 'simple' since it needs
> to work with the flexibility that we have in what we support for object
> names,

That's fine.  If people want a mechanism that allows changing the other
pg_dump options that are not related to object filtering, they can
implement a configuration file for that.

> and is still going to require people write a new parser and
> generator for it instead of using something existing.

Sure.  That's not part of this patch.

> I don't know that the options that I suggested previously would
> definitely work or not but they at least would allow other projects like
> pgAdmin to leverage existing code for parsing and generating these
> config files.

Keep in mind that this patch is not intended to help pgAdmin
specifically.  It would be great if pgAdmin uses the functionality
implemented here, but if they decide not to, that's not terrible.  They
have survived decades without a pg_dump configuration file; they still
can.

There are several votes in this thread for pg_dump to gain functionality
to filter objects based on a simple specification -- particularly one
that can be written using shell pipelines.  This patch gives it.

> I'm not completely against inventing something new, but I'd really
> prefer that we at least try to make something existing work first
> before inventing something new that everyone is going to have to deal
> with.

That was discussed upthread and led nowhere.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)

Re: [PATCH] Pull general SASL framework out of SCRAM

2021-07-13 Thread Jacob Champion

On Tue, 2021-07-13 at 19:31 +0900, Michael Paquier wrote:
> On Tue, Jul 13, 2021 at 12:41:27PM +0300, Mikhail Kulagin wrote:
> > I got an error while building one of the extensions.
> > /home/mkulagin/pg-install/postgresql-master/include/internal/libpq-int.h:44:10:
> >  fatal error: fe-auth-sasl.h: No such file or directory
> >   #include "fe-auth-sasl.h"
> >   ^~~~
> 
> Right.  I overlooked the fact that libpq-int.h is installed.

Thanks for catching that Mikhail.

> > I think the new fe-auth-sasl.h file should be installed too.
> > Correction proposal in the attached file (but I'm not sure that fix
> > of Install.pm is correct). 
> 
> That looks correct to me.  I'll check that tomorrow.

Looks right to me too. I'm currently rebuilding my Windows dev
environment so I haven't been able to double-check that piece of it.

Just to make sure -- do we want to export the fe-auth-sasl.h header as
opposed to forward-declaring the pg_fe_sasl_mech struct? Is the use
case for libpq-int.h just "here, have at the internals, and if you
break it then it's on you"?

--Jacob

Re: proposal: possibility to read dumped table's name from file

Greetings,

* Tomas Vondra (tomas.von...@enterprisedb.com) wrote:
> On 7/13/21 10:55 PM, Stephen Frost wrote:
> >On Tue, Jul 13, 2021 at 16:44 Daniel Gustafsson  >> wrote:
> >
> > > On 13 Jul 2021, at 18:14, Tomas Vondra
> > >> wrote:
> >
> > > FWIW I don't understand why would they need to write parsers.
> >
> >It's quite common to write unit tests for VM recipes/playbooks wheen
> >using
> >tools like Chef etc, parsing and checking the installed/generated
> >files is part
> >of that. This would be one very real use case for writing a parser.
>
> >Consider pgAdmin and the many other tools which essentially embed pg_dump
> >and pg_restore.  There’s no shortage of use cases for a variety of tools
> >to be able to understand, read, parse, generate, rewrite, and probably do
> >more, with such a pg_dump/restore config file.
> 
> Sure. Which is why I'm advocating for the simplest possible format (and not
> expanding the scope of this patch beyond filtering), because that makes this
> kind of processing simpler.

The simplest possible format isn't going to work with all the different
pg_dump options and it still isn't going to be 'simple' since it needs
to work with the flexibility that we have in what we support for object
names, and is still going to require people write a new parser and
generator for it instead of using something existing.

I don't know that the options that I suggested previously would
definitely work or not but they at least would allow other projects like
pgAdmin to leverage existing code for parsing and generating these
config files.  I'm not completely against inventing something new, but
I'd really prefer that we at least try to make something existing work
first before inventing something new that everyone is going to have to
deal with.  If we do invent a new thing for $reasons, then we should
really look at what exists today and try to design it properly instead
of just throwing something together and formally document it because
it's absolutely going to become a standard of sorts that people are
going to almost immediately write their own parsers/generators in
various languages for.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: psql - factor out echo code

Fabien COELHO  writes:
> Attached v4 simplifies the format and fixes this one.

I think this goes way way overboard in terms of invasiveness.
There's no need to identify individual call sites of PSQLexec.
We didn't have anything like that level of detail before, and
there has been no field demand for it either.  What I had
in mind was basically to identify the call sites of echoQuery,
ie distinguish user commands from psql-generated commands
with labels like "QUERY:" vs "INTERNAL QUERY:".  We don't
need to change the APIs of existing functions, I don't think.

It also looks like a mess from the translatibility standpoint.
You can't expect "%s QUERY" to be a useful thing for translators.

regards, tom lane

Re: closing heap relation

2021-07-13 Thread Zhihong Yu

Hi,

On Tue, Jul 13, 2021 at 3:13 PM Zhihong Yu  wrote:

> Hi,
> I was looking at index_drop() in PG 11 branch.
> In if (concurrent)block, the heap and index relations are overwritten
> since they were opened a few lines above the concurrent check.
>
> Shouldn't the two relations be closed first ?
>
> thanks
>
> diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
> index 9d8f873944..625b72ae85 100644
> --- a/src/backend/catalog/index.c
> +++ b/src/backend/catalog/index.c
> @@ -1641,6 +1641,9 @@ index_drop(Oid indexId, bool concurrent)
>  * conflicts with existing predicate locks, so now is the
> time to move
>  * them to the heap relation.
>  */
> +   heap_close(userHeapRelation, NoLock);
> +   index_close(userIndexRelation, NoLock);
> +
> userHeapRelation = heap_open(heapId,
> ShareUpdateExclusiveLock);
> userIndexRelation = index_open(indexId,
> ShareUpdateExclusiveLock);
> TransferPredicateLocksToHeapRelation(userIndexRelation);
>
Please disregard the above.

The relations were closed a bit earlier.

closing heap relation

2021-07-13 Thread Zhihong Yu

Hi,
I was looking at index_drop() in PG 11 branch.
In if (concurrent)block, the heap and index relations are overwritten since
they were opened a few lines above the concurrent check.

Shouldn't the two relations be closed first ?

thanks

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9d8f873944..625b72ae85 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1641,6 +1641,9 @@ index_drop(Oid indexId, bool concurrent)
 * conflicts with existing predicate locks, so now is the
time to move
 * them to the heap relation.
 */
+   heap_close(userHeapRelation, NoLock);
+   index_close(userIndexRelation, NoLock);
+
userHeapRelation = heap_open(heapId,
ShareUpdateExclusiveLock);
userIndexRelation = index_open(indexId,
ShareUpdateExclusiveLock);
TransferPredicateLocksToHeapRelation(userIndexRelation);

Re: proposal: possibility to read dumped table's name from file

On 7/13/21 10:55 PM, Stephen Frost wrote:

Greetings,

On Tue, Jul 13, 2021 at 16:44 Daniel Gustafsson > wrote:

 > On 13 Jul 2021, at 18:14, Tomas Vondra
mailto:tomas.von...@enterprisedb.com>> wrote:

 > FWIW I don't understand why would they need to write parsers.

It's quite common to write unit tests for VM recipes/playbooks wheen
using
tools like Chef etc, parsing and checking the installed/generated
files is part
of that. This would be one very real use case for writing a parser.

Consider pgAdmin and the many other tools which essentially embed 
pg_dump and pg_restore.  There’s no shortage of use cases for a variety 
of tools to be able to understand, read, parse, generate, rewrite, and 
probably do more, with such a pg_dump/restore config file.

Sure. Which is why I'm advocating for the simplest possible format (and 
not expanding the scope of this patch beyond filtering), because that 
makes this kind of processing simpler.

 > I think the case when the filter file needs to be modified is
rather rare - it certainly is not what the original use case Pavel
tried to address needs. (I know that customer and the filter would
be generated and used for a single dump.)

I'm not convinced that basing design decisions on a single customer
reference
who only want to use the code once is helpful. 

Agreed.

I wasn't really basing this on a single customer - that was merely an 
example, of course. FWIW Justin Pryzby already stated having to use some 
more complex format would likely mean they would not use the feature, so 
that's another data point to consider.

FWIW I believe it's clear what my opinions on this topic are. Repeating 
that seems a bit pointless, so I'll step aside and let this thread move 
forward in whatever direction.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Transactions involving multiple postgres foreign servers, take 2

2021-07-13 Thread r.takahash...@fujitsu.com

Hi,


> Wouldn't it be better to explicitly initialize the pointer with NULL?

Thank you for your advice.
You are correct.

Anyway, I fixed it and re-run the performance test, it of course does not 
affect tps.

Regards,
Ryohei Takahashi

Re: row filtering for logical replication

On 2021-Jul-13, Euler Taveira wrote:

> +  
> +   The WHERE clause should contain only columns that are
> +   part of the primary key or be covered  by REPLICA
> +   IDENTITY otherwise, DELETE operations will 
> not
> +   be replicated. That's because old row is used and it only contains primary
> +   key or columns that are part of the REPLICA IDENTITY; 
> the
> +   remaining columns are NULL. For 
> INSERT
> +   and UPDATE operations, any column might be used in the
> +   WHERE clause. New row is used and it contains all
> +   columns. A NULL value causes the expression to evaluate
> +   to false; avoid using columns without not-null constraints in the
> +   WHERE clause. The WHERE clause does
> +   not allow functions and user-defined operators.
> +  

There's a couple of points in this paragraph ..

1. if you use REPLICA IDENTITY FULL, then the expressions would work
even if they use any other column with DELETE.  Maybe it would be
reasonable to test for this in the code and raise an error if the
expression requires a column that's not part of the replica identity.
(But that could be relaxed if the publication does not publish
updates/deletes.)

2. For UPDATE, does the expression apply to the old tuple or to the new
tuple?  You say it's the new tuple, but from the user point of view I
think it would make more sense that it would apply to the old tuple.
(Of course, if you're thinking that the R.I. is the PK and the PK is
never changed, then you don't really care which one it is, but I bet
that some people would not like that assumption.)

I think it is sensible that it's the old tuple that is matched, not the
new; consider what happens if you change the PK in the update and the
replica already has that tuple.  If you match on the new tuple and it
doesn't match the expression (so you filter out the update), but the old
tuple does match the expression, then the replica will retain the
mismatching tuple forever.

3. You say that a NULL value in any of those columns causes the
expression to become false and thus the tuple is not published.  This
seems pretty unfriendly, but maybe it would be useful to have examples
of the behavior.  Does ExecInitCheck() handle things in the other way,
and if so does using a similar trick give more useful behavior?

 The WHERE clause may only contain references to columns that are part
 of the table's replica identity.
 If <>DELETE or <>UPDATE operations are published, this
 restriction can be bypassed by making the replica identity be the whole
 row with ALTER TABLE .. SET REPLICA IDENTITY FULL.
 The WHERE clause does not allow functions or
 user-defined operators.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)

Re: row filtering for logical replication

On Tue, Jul 13, 2021, at 4:07 PM, Tomas Vondra wrote:
> On 7/13/21 5:44 PM, Jeff Davis wrote:
> > On Tue, 2021-07-13 at 10:24 +0530, Amit Kapila wrote:
8<

> >> (c) the columns
> >> referred to in the filter should be part of PK or Replica Identity.
> > 
> > Why?
> > 
> 
> I'm not sure either.
This patch uses the old row for DELETE operations and new row for INSERT and
UPDATE operations. Since we usually don't use REPLICA IDENTITY FULL, all
columns in an old row that are not part of the PK or REPLICA IDENTITY are NULL.
The row filter evaluates NULL to false. Documentation says

   The WHERE clause should contain only columns that are
   part of the primary key or be covered  by REPLICA
   IDENTITY otherwise, DELETE operations will not
   be replicated. That's because old row is used and it only contains primary
   key or columns that are part of the REPLICA IDENTITY; the
   remaining columns are NULL. For INSERT
   and UPDATE operations, any column might be used in the
   WHERE clause. New row is used and it contains all
   columns. A NULL value causes the expression to evaluate
   to false; avoid using columns without not-null constraints in the
   WHERE clause. The WHERE clause does
   not allow functions and user-defined operators.

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: proposal: possibility to read dumped table's name from file

Greetings,

On Tue, Jul 13, 2021 at 16:44 Daniel Gustafsson  wrote:

> > On 13 Jul 2021, at 18:14, Tomas Vondra 
> wrote:
>
> > FWIW I don't understand why would they need to write parsers.
>
> It's quite common to write unit tests for VM recipes/playbooks wheen using
> tools like Chef etc, parsing and checking the installed/generated files is
> part
> of that. This would be one very real use case for writing a parser.


Consider pgAdmin and the many other tools which essentially embed pg_dump
and pg_restore.  There’s no shortage of use cases for a variety of tools to
be able to understand, read, parse, generate, rewrite, and probably do
more, with such a pg_dump/restore config file.

> I think the case when the filter file needs to be modified is rather rare
> - it certainly is not what the original use case Pavel tried to address
> needs. (I know that customer and the filter would be generated and used for
> a single dump.)
>
> I'm not convinced that basing design decisions on a single customer
> reference
> who only want to use the code once is helpful.


Agreed.

Thanks,

Stephen

>

Re: proposal: possibility to read dumped table's name from file

2021-07-13 Thread Daniel Gustafsson

> On 13 Jul 2021, at 00:59, Alvaro Herrera  wrote:
> 
> On 2021-Jul-13, Tomas Vondra wrote:
> 
>> I'm not going to fight against some sort of versioning, but I think keeping
>> the scope as narrow as possible would make it unnecessary. That is, let's
>> stick to the original goal to allow passing filtering rules that would not
>> fit on the command-line, and maybe let's make it a bit more flexible to
>> support other object types etc.
>> 
>> IMHO the filtering rules are simple enough to not really need elaborate
>> versioning, and if a more advanced rule is proposed in the future it can be
>> supported in the existing format (extra field, ...).
> 
> I don't understand why is versioning needed for this file.  Surely we
> can just define some line-based grammar that's accepted by the current
> pg_dump[1] and that would satisfy the current need as well as allowing
> for extending the grammar in the future; even JSON or Windows-INI format
> (ugh?) if that's necessary to tailor the output file in some other way
> not covered by that.

I wasn't expressing myself very well; by "versioning" I mean a way to be able
to add to/change/fix the format and still be able to deterministically parse it
without having to resort to ugly heuristics and hacks.  If that's achieved by
an explicit version number or if it's an inherit characteristic of the format
doesn't really matter (to me).  My worry is that the very simple proposed
format might not fit that bill, but since I don't know what the future of the
feature might bring it's (mostly) a gut feeling.

--
Daniel Gustafsson   https://vmware.com/

Re: proposal: possibility to read dumped table's name from file

2021-07-13 Thread Daniel Gustafsson

> On 13 Jul 2021, at 18:14, Tomas Vondra  wrote:

> FWIW I don't understand why would they need to write parsers.

It's quite common to write unit tests for VM recipes/playbooks wheen using
tools like Chef etc, parsing and checking the installed/generated files is part
of that. This would be one very real use case for writing a parser.

> I think the case when the filter file needs to be modified is rather rare - 
> it certainly is not what the original use case Pavel tried to address needs. 
> (I know that customer and the filter would be generated and used for a single 
> dump.)

I'm not convinced that basing design decisions on a single customer reference
who only want to use the code once is helpful.  I hear what you're saying, but
I think this will see more diverse use cases than what we can foresee here.

--
Daniel Gustafsson   https://vmware.com/

Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

Ibrar Ahmed  writes:
> The patch is failing the regression, @Tom Lane  can you
> please take a look at that.

Seems to just need an update of the expected-file to account for test
cases added recently.  (I take no position on whether the new results
are desirable; some of these might be breaking the intent of the case.
But this should quiet the cfbot anyway.)

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c..d68720423e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -321,7 +321,9 @@ foreign_expr_walker(Node *node,
 
 /*
  * If the Var is from the foreign table, we consider its
- * collation (if any) safe to use.  If it is from another
+ * collation (if any) safe to use, *unless* it's
+ * DEFAULT_COLLATION_OID.  We treat that as meaning "we don't
+ * know which collation this is".  If it is from another
  * table, we treat its collation the same way as we would a
  * Param's collation, ie it's not safe for it to have a
  * non-default collation.
@@ -343,7 +345,12 @@ foreign_expr_walker(Node *node,
 
 	/* Else check the collation */
 	collation = var->varcollid;
-	state = OidIsValid(collation) ? FDW_COLLATE_SAFE : FDW_COLLATE_NONE;
+	if (collation == InvalidOid)
+		state = FDW_COLLATE_NONE;
+	else if (collation == DEFAULT_COLLATION_OID)
+		state = FDW_COLLATE_UNSAFE;
+	else
+		state = FDW_COLLATE_SAFE;
 }
 else
 {
@@ -814,8 +821,24 @@ foreign_expr_walker(Node *node,
 
 	/*
 	 * Now, merge my collation information into my parent's state.
+	 *
+	 * If one branch of an expression derives a non-default collation safely
+	 * (that is, from a foreign Var) and another one derives the same
+	 * collation unsafely, we can consider the expression safe overall.  This
+	 * allows cases such as "foreign_var = ('foo' COLLATE x)" where x is the
+	 * same collation the foreign_var has anyway.  Note that we will not ship
+	 * any explicit COLLATE clause to the remote, but rely on it to re-derive
+	 * the correct collation based on the foreign_var.
 	 */
-	if (state > outer_cxt->state)
+	if (collation == outer_cxt->collation &&
+		((state == FDW_COLLATE_UNSAFE &&
+		  outer_cxt->state == FDW_COLLATE_SAFE) ||
+		 (state == FDW_COLLATE_SAFE &&
+		  outer_cxt->state == FDW_COLLATE_UNSAFE)))
+	{
+		outer_cxt->state = FDW_COLLATE_SAFE;
+	}
+	else if (state > outer_cxt->state)
 	{
 		/* Override previous parent state */
 		outer_cxt->collation = collation;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index e5bbf3b0af..4408543f55 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -32,29 +32,29 @@ CREATE SCHEMA "S 1";
 CREATE TABLE "S 1"."T 1" (
 	"C 1" int NOT NULL,
 	c2 int NOT NULL,
-	c3 text,
+	c3 text collate "C",
 	c4 timestamptz,
 	c5 timestamp,
-	c6 varchar(10),
-	c7 char(10),
+	c6 varchar(10) collate "C",
+	c7 char(10) collate "C",
 	c8 user_enum,
 	CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
 );
 CREATE TABLE "S 1"."T 2" (
 	c1 int NOT NULL,
-	c2 text,
+	c2 text collate "C",
 	CONSTRAINT t2_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 3" (
 	c1 int NOT NULL,
 	c2 int NOT NULL,
-	c3 text,
+	c3 text collate "C",
 	CONSTRAINT t3_pkey PRIMARY KEY (c1)
 );
 CREATE TABLE "S 1"."T 4" (
 	c1 int NOT NULL,
 	c2 int NOT NULL,
-	c3 text,
+	c3 text collate "C",
 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
 );
 -- Disable autovacuum for these tables to avoid unexpected effects of that
@@ -94,16 +94,18 @@ ANALYZE "S 1"."T 3";
 ANALYZE "S 1"."T 4";
 -- ===
 -- create foreign tables
+-- Note: to ensure stable regression results, all collatable columns
+-- in these tables must have explicitly-specified collations.
 -- ===
 CREATE FOREIGN TABLE ft1 (
 	c0 int,
 	c1 int NOT NULL,
 	c2 int NOT NULL,
-	c3 text,
+	c3 text collate "C",
 	c4 timestamptz,
 	c5 timestamp,
-	c6 varchar(10),
-	c7 char(10) default 'ft1',
+	c6 varchar(10) collate "C",
+	c7 char(10) default 'ft1' collate "C",
 	c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
@@ -111,28 +113,28 @@ CREATE FOREIGN TABLE ft2 (
 	c1 int NOT NULL,
 	c2 int NOT NULL,
 	cx int,
-	c3 text,
+	c3 text collate "C",
 	c4 timestamptz,
 	c5 timestamp,
-	c6 varchar(10),
-	c7 char(10) default 'ft2',
+	c6 varchar(10) collate "C",
+	c7 char(10) default 'ft2' collate "C",
 	c8 user_enum
 ) SERVER loopback;
 ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
 CREATE FOREIGN TABLE ft4 (
 	c1 int NOT NULL,
 	c2 int NOT NULL,
-	c3 text
+	c3 text collate "C"
 ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');
 CREATE FOREIGN TABLE ft5 (
 	c1 int NOT NULL,
 	c2 int NOT NULL,
-	c3 text
+	c3 text collate "C"
 ) SERVER loopback OPTIONS

Re: row filtering for logical replication

On Sun, Jul 11, 2021, at 8:09 PM, Tomas Vondra wrote:
> I took a look at this patch, which seems to be in CF since 2018. I have 
> only some basic comments and observations at this point:
Tomas, thanks for reviewing this patch again.

> 1) alter_publication.sgml
> 
> I think "expression is executed" sounds a bit strange, perhaps 
> "evaluated" would be better?
Fixed.

> 2) create_publication.sgml
> 
> Why is the patch changing publish_via_partition_root docs? That seems 
> like a rather unrelated bit.
Removed. I will submit a separate patch for this.

> The WHERE clause should probably contain only
> columns that are part of the primary key or be covered by
> REPLICA ...
> 
> I'm not sure what exactly is this trying to say. What does "should 
> probably ..." mean in practice for the users? Does that mean something 
> bad will happen for other columns, or what? I'm sure this wording will 
> be quite confusing for users.
Reading again it seems "probably" is confusing. Let's remove it.

> It may also be unclear whether the condition is evaluated on the old or 
> new row, so perhaps add an example illustrating that & more detailed 
> comment, or something. E.g. what will happen with
> 
> UPDATE departments SET active = false WHERE active;
Yeah. I avoided to mention this internal detail about old/new row but it seems
better to be clear. How about the following paragraph?

  
   The WHERE clause should contain only columns that are
   part of the primary key or be covered  by REPLICA
   IDENTITY otherwise, DELETE operations will not
   be replicated. That's because old row is used and it only contains primary
   key or columns that are part of the REPLICA IDENTITY; the
   remaining columns are NULL. For INSERT
   and UPDATE operations, any column might be used in the
   WHERE clause. New row is used and it contains all
   columns. A NULL value causes the expression to evaluate
   to false; avoid using columns without not-null constraints in the
   WHERE clause. The WHERE clause does
   not allow functions and user-defined operators.
  

> 3) publication_add_relation
> 
> Does this need to build the parse state even for whereClause == NULL?
No. Fixed.

> 4) AlterPublicationTables
> 
> I wonder if this new reworked code might have issues with subscriptions 
> containing many tables, but I haven't tried.
This piece of code is already complicated. Amit complained about it too [1].
Are you envisioning any specific issue (other than open thousands of relations,
do some stuff, and close them all)? IMO the open/close relation should be
postponed for as long as possible.

> 5) OpenTableList
> 
> I really dislike that the list can have two different node types 
> (Relation and PublicationTable). In principle we don't actually need the 
> extra flag, we can simply check the node type directly by IsA() and act 
> based on that. However, I think it'd be better to just use a single node 
> type from all places.
Amit complained about having a runtime test for ALTER PUBLICATION ... DROP
TABLE in case user provides a WHERE clause [2]. I did that way (runtime test)
because it simplified the code. I would tend to avoid moving grammar task into
a runtime, that's why I agreed to change it. I didn't like the multi-node
argument handling for OpenTableList() (mainly because of the extra argument in
the function signature) but with your suggestion (IsA()) maybe it is
acceptable. What do you think? I included IsA() in v19.

> I don't see why not to set whereClause every time, I don't think the 
> extra if saves anything, it's just a bit more complex.
See runtime test in [2].

> 
> 5) CloseTableList
> 
> The comment about node types seems pointless, this function has no flag 
> and the element type does not matter.
Fixed.

> 6) parse_agg.c
> 
> ... are not allowed in publication WHERE expressions
> 
> I think all similar cases use "WHERE conditions" instead.
No. Policy, index, statistics, partition, column generation use expressions.
COPY and trigger use conditions. It is also referred as expression in the
synopsis.

> 7) transformExprRecurse
> 
> The check at the beginning seems rather awkward / misplaced - it's way 
> too specific for this location (there are no other p_expr_kind 
> references in this function). Wouldn't transformFuncCall (or maybe 
> ParseFuncOrColumn) be a more appropriate place?
Probably. I have to try the multiple possibilities to make sure it forbids all
cases.

> Initially I was wondering why not to allow function calls in WHERE 
> conditions, but I see that was discussed in the past as problematic. But 
> that reminds me that I don't see any docs describing what expressions 
> are allowed in WHERE conditions - maybe we should explicitly list what 
> expressions are allowed?
I started to investigate how to safely allow built-in functions. There is a
long discussion about using functions in a logical decoding context. As I said
during the last CF for v14, I prefer this to be a separate feature. I

Re: row filtering for logical replication

On Tue, Jul 13, 2021, at 12:25 AM, Peter Smith wrote:
> I have reviewed the latest v18 patch. Below are some more review
> comments and patches.
Peter, thanks for quickly check the new patch. I'm attaching a new patch (v19)
that addresses (a) this new review, (b) Tomas' review and (c) Greg's review. I
also included the copy/equal node support for the new node (PublicationTable)
mentioned by Tomas in another email.

> 1. Commit comment - wording
8<

> =>
> 
> I think this means to say: "Rows that don't satisfy an optional WHERE
> clause will be filtered out."
Agreed.

> 2. Commit comment - wording
> 
> "The row filter is per table, which allows different row filters to be
> defined for different tables."
> 
> =>
> 
> I think all that is the same as just saying: "The row filter is per table."
Agreed.

> 3. PG docs - independent improvement
> 
> You wrote (ref [1] point 3):
> 
> "I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that 
> the
> root partitioned table is used. We should improve that sentence too."
> 
> I agree, but that PG docs improvement is independent of your RowFilter
> patch; please make another thread for that idea.
I will. And I will also include the next item that I removed from the patch.

> 4. doc/src/sgml/ref/create_publication.sgml - independent improvement
> 
> @@ -131,9 +135,9 @@ CREATE PUBLICATION  class="parameter">name
>on its partitions) contained in the publication will be published
>using the identity and schema of the partitioned table rather than
>that of the individual partitions that are actually changed; the
> -  latter is the default.  Enabling this allows the changes to be
> -  replicated into a non-partitioned table or a partitioned table
> -  consisting of a different set of partitions.
> +  latter is the default (false).  Enabling this
> +  allows the changes to be replicated into a non-partitioned table 
> or a
> +  partitioned table consisting of a different set of partitions.
>   
> 
> I think that Tomas wrote (ref [2] point 2) that this change seems
> unrelated to your RowFilter patch.
> 
> I agree; I liked the change, but IMO you need to propose this one in
> another thread too.
Reverted.

> 5. doc/src/sgml/ref/create_subscription.sgml - wording
8<

> I felt that the sentence: "If any table in the publications has a
> WHERE clause, data synchronization does not use it
> if the subscriber is a PostgreSQL version
> before 15."
> 
> Could be expressed more simply like: "If the subscriber is a
> PostgreSQL version before 15 then any row
> filtering is ignored."
Agreed.

> 6. src/backend/commands/publicationcmds.c - wrong function comment
8<

> /*
>   * Close all relations in the list.
> + *
> + * Publication node can have a different list element, hence, pub_drop_table
> + * indicates if it has a Relation (true) or PublicationTable (false).
>   */
> static void
> CloseTableList(List *rels)
> 
> =>
> 
> The 2nd parameter does not exist in v18, so that comment about
> pub_drop_table seems to be a cut/paste error from the OpenTableList.
Oops. Removed.

> src/backend/replication/logical/tablesync.c - bug ?
> 
> @@ -829,16 +883,23 @@ copy_table(Relation rel)
>   relmapentry = logicalrep_rel_open(lrel.remoteid, NoLock);
>   Assert(rel == relmapentry->localrel);
> 
> + /* List of columns for COPY */
> + attnamelist = make_copy_attnamelist(relmapentry);
> +
>   /* Start copy on the publisher. */
> =>
> 
> I did not understand the above call to make_copy_attnamelist. The
> result seems unused before it is overwritten later in this same
> function (??)
Good catch. This seems to be a leftover from an ancient version.

> 7. src/backend/replication/logical/tablesync.c  -
> fetch_remote_table_info enhancement
> 
> + /* Get relation qual */
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 15)
> + {
> + resetStringInfo();
> + appendStringInfo(,
> + "SELECT pg_get_expr(prqual, prrelid) "
> + "  FROM pg_publication p "
> + "  INNER JOIN pg_publication_rel pr "
> + "   ON (p.oid = pr.prpubid) "
> + " WHERE pr.prrelid = %u "
> + "   AND p.pubname IN (", lrel->remoteid);
> 
> =>
> 
> I think a small improvement is possible in this SQL.
> 
> If we change that to "SELECT DISTINCT pg_get_expr(prqual, prrelid)"...
> then it avoids the copy SQL from having multiple WHERE clauses which
> are all identical. This could happen when subscribed to multiple
> publications which had the same filter for the same table.
Good catch!

> 8. src/backend/replication/pgoutput/pgoutput.c - qual member is redundant
> 
> @@ -99,6 +108,9 @@ typedef struct RelationSyncEntry
> 
>   bool replicate_valid;
>   PublicationActions pubactions;
> + List*qual; /* row filter */
> + List*exprstate; /* ExprState for row filter */
> + TupleTableSlot *scantuple; /* tuple table slot for row filter */
> 
> =>
> 
> Now that the exprstate is introduced I think that the other member
> "qual" is

Re: Reducing memory consumption for pending inval messages

I wrote:
> It turns out that the existing implementation in inval.c is quite
> inefficient when a lot of individual commands each register just
> a few invalidations --- but a few invalidations per command is
> pretty typical.

Per the cfbot, here's a rebase over 3788c6678 (actually just
undoing its effects on inval.c, since that code is removed here).

regards, tom lane

diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 9c79775725..9352c68090 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -71,11 +71,6 @@
  *	manipulating the init file is in relcache.c, but we keep track of the
  *	need for it here.
  *
- *	The request lists proper are kept in CurTransactionContext of their
- *	creating (sub)transaction, since they can be forgotten on abort of that
- *	transaction but must be kept till top-level commit otherwise.  For
- *	simplicity we keep the controlling list-of-lists in TopTransactionContext.
- *
  *	Currently, inval messages are sent without regard for the possibility
  *	that the object described by the catalog tuple might be a session-local
  *	object such as a temporary table.  This is because (1) this code has
@@ -106,7 +101,6 @@
 #include "catalog/catalog.h"
 #include "catalog/pg_constraint.h"
 #include "miscadmin.h"
-#include "port/pg_bitutils.h"
 #include "storage/sinval.h"
 #include "storage/smgr.h"
 #include "utils/catcache.h"
@@ -121,35 +115,86 @@
 
 
 /*
- * To minimize palloc traffic, we keep pending requests in successively-
- * larger chunks (a slightly more sophisticated version of an expansible
- * array).  All request types can be stored as SharedInvalidationMessage
- * records.  The ordering of requests within a list is never significant.
+ * Pending requests are stored as ready-to-send SharedInvalidationMessages.
+ * We keep the messages themselves in arrays in TopTransactionContext
+ * (there are separate arrays for catcache and relcache messages).  Control
+ * information is kept in a chain of TransInvalidationInfo structs, also
+ * allocated in TopTransactionContext.  (We could keep a subtransaction's
+ * TransInvalidationInfo in its CurTransactionContext; but that's more
+ * wasteful not less so, since in very many scenarios it'd be the only
+ * allocation in the subtransaction's CurTransactionContext.)
+ *
+ * We can store the message arrays densely, and yet avoid moving data around
+ * within an array, because within any one subtransaction we need only
+ * distinguish between messages emitted by prior commands and those emitted
+ * by the current command.  Once a command completes and we've done local
+ * processing on its messages, we can fold those into the prior-commands
+ * messages just by changing array indexes in the TransInvalidationInfo
+ * struct.  Similarly, we need distinguish messages of prior subtransactions
+ * from those of the current subtransaction only until the subtransaction
+ * completes, after which we adjust the array indexes in the parent's
+ * TransInvalidationInfo to include the subtransaction's messages.
+ *
+ * The ordering of the individual messages within a command's or
+ * subtransaction's output is not considered significant, although this
+ * implementation happens to preserve the order in which they were queued.
+ * (Previous versions of this code did not preserve it.)
+ *
+ * For notational convenience, control information is kept in two-element
+ * arrays, the first for catcache messages and the second for relcache
+ * messages.
  */
-typedef struct InvalidationChunk
+#define CatCacheMsgs 0
+#define RelCacheMsgs 1
+
+/* Pointers to main arrays in TopTransactionContext */
+typedef struct InvalMessageArray
 {
-	struct InvalidationChunk *next; /* list link */
-	int			nitems;			/* # items currently stored in chunk */
-	int			maxitems;		/* size of allocated array in this chunk */
-	SharedInvalidationMessage msgs[FLEXIBLE_ARRAY_MEMBER];
-} InvalidationChunk;
+	SharedInvalidationMessage *msgs;	/* palloc'd array (can be expanded) */
+	int			maxmsgs;		/* current allocated size of array */
+} InvalMessageArray;
 
-typedef struct InvalidationListHeader
+static InvalMessageArray InvalMessageArrays[2];
+
+/* Control information for one logical group of messages */
+typedef struct InvalidationMsgsGroup
 {
-	InvalidationChunk *cclist;	/* list of chunks holding catcache msgs */
-	InvalidationChunk *rclist;	/* list of chunks holding relcache msgs */
-} InvalidationListHeader;
+	int			firstmsg[2];	/* first index in relevant array */
+	int			nextmsg[2];		/* last+1 index */
+} InvalidationMsgsGroup;
+
+/* Macros to help preserve InvalidationMsgsGroup abstraction */
+#define SetSubGroupToFollow(targetgroup, priorgroup, subgroup) \
+	do { \
+		(targetgroup)->firstmsg[subgroup] = \
+			(targetgroup)->nextmsg[subgroup] = \
+			(priorgroup)->nextmsg[subgroup]; \
+	} while (0)
+
+#define SetGroupToFollow(targetgroup, priorgroup) \
+	do { \
+

Re: row filtering for logical replication

On 2021-Jul-13, Tomas Vondra wrote:

> On 7/13/21 5:44 PM, Jeff Davis wrote:

> > * Andres also mentioned that the function should not leak memory.
> > * One use case for this feature is when sharding a table, so the
> > expression should allow things like "hashint8(x) between ...". I'd
> > really like to see this problem solved, as well.
> 
> I think built-in functions should be fine, because generally don't get
> dropped etc. (And if you drop built-in function, well - sorry.)
> 
> Not sure about the memory leaks - I suppose we'd free memory for each row,
> so this shouldn't be an issue I guess ...

I'm not sure we need to be terribly strict about expression evaluation
not leaking any memory here.   I'd rather have a memory context that can
be reset per row.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: row filtering for logical replication

On 7/13/21 12:57 PM, Amit Kapila wrote:

On Tue, Jul 13, 2021 at 10:24 AM Amit Kapila wrote:

On Mon, Jul 12, 2021 at 3:01 PM Tomas Vondra
wrote:

In terms of implementation, I think there are two basic options - either
we can define a new "expression" type in gram.y, which would be a subset
of a_expr etc. Or we can do it as some sort of expression walker, kinda
like what the transform* functions do now.

I think it is better to use some form of walker here rather than
extending the grammar for this. However, the question is do we need
some special kind of expression walker here or can we handle all
required cases via transformWhereClause() call as the patch is trying
to do. AFAIU, the main things we want to prohibit in the filter are:
(a) it doesn't refer to any relation other than catalog in where
clause, (b) it doesn't use UDFs in any way (in expressions, in
user-defined operators, user-defined types, etc.), (c) the columns
referred to in the filter should be part of PK or Replica Identity.
Now, if all such things can be detected by the approach patch has
taken then why do we need a special kind of expression walker? OTOH,
if we can't detect some of this then probably we can use a special
walker.

I think in the long run one idea to allow UDFs is probably by
explicitly allowing users to specify whether the function is
publication predicate safe and if so, then we can allow such functions
in the filter clause.

Another idea here could be to read the publication-related catalog
with the latest snapshot instead of a historic snapshot. If we do that
then if the user faces problems as described by Petr [1] due to
missing dependencies via UDFs then she can Alter the Publication to
remove/change the filter clause and after that, we would be able to
recognize the updated filter clause and the system will be able to
move forward.

I might be missing something but reading publication catalogs with
non-historic snapshots shouldn't create problems as we use the
historic snapshots are required to decode WAL.

IMHO the best option for v1 is to just restrict the filters to
known-safe expressions. That is, just built-in operators, no UDFs etc.
Yes, it's not great, but both alternative proposals (allowing UDFs or
using current snapshot) are problematic for various reasons.

Even with those restrictions the row filtering seems quite useful, and
we can relax those restrictions later if we find acceptable compromise
and/or decide it's worth the risk. Seems better than having to introduce
new restrictions later.

I think the problem described by Petr[1] is also possible today if the
user drops the publication and there is a corresponding subscription,
basically, the system will stuck with error: "ERROR: publication
"mypub" does not exist. I think allowing to use non-historic snapshots
just for publications will resolve that problem as well.

[1] -
https://www.postgresql.org/message-id/92e5587d-28b8-5849-2374-5ca3863256f1%402ndquadrant.com

That seems like a completely different problem, TBH. For example the
slot is dropped too, which means the WAL is likely gone etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: row filtering for logical replication


On 7/13/21 5:44 PM, Jeff Davis wrote:

On Tue, 2021-07-13 at 10:24 +0530, Amit Kapila wrote:

to do. AFAIU, the main things we want to prohibit in the filter are:
(a) it doesn't refer to any relation other than catalog in where
clause,


Right, because the walsender is using a historical snapshot.


(b) it doesn't use UDFs in any way (in expressions, in
user-defined operators, user-defined types, etc.),


Is this a reasonable requirement? Postgres has a long history of
allowing UDFs nearly everywhere that a built-in is allowed. It feels
wrong to make built-ins special for this feature.



Well, we can either prohibit UDF or introduce a massive foot-gun.

The problem with functions in general (let's ignore SQL functions) is 
that they're black boxes, so we don't know what's inside. And if the 
function gets broken after an object gets dropped, the replication is 
broken and the only way to fix it is to recover the subscription.


And this is not hypothetical issue, we've seen this repeatedly :-(

So as much as I'd like to see support for UDFs here, I think it's better 
to disallow them - at least for now. And maybe relax that restriction 
later, if possible.



(c) the columns
referred to in the filter should be part of PK or Replica Identity.


Why?



I'm not sure either.



Also:

* Andres also mentioned that the function should not leak memory.
* One use case for this feature is when sharding a table, so the
expression should allow things like "hashint8(x) between ...". I'd
really like to see this problem solved, as well.



I think built-in functions should be fine, because generally don't get 
dropped etc. (And if you drop built-in function, well - sorry.)


Not sure about the memory leaks - I suppose we'd free memory for each 
row, so this shouldn't be an issue I guess ...



I think in the long run one idea to allow UDFs is probably by
explicitly allowing users to specify whether the function is
publication predicate safe and if so, then we can allow such
functions
in the filter clause.


This sounds like a better direction. We probably need some kind of
catalog information here to say what functions/operators are "safe" for
this kind of purpose. There are a couple questions:



Not sure. It's true it's a bit like volatile/stable/immutable categories 
where we can't guarantee those labels are correct, and it's up to the 
user to keep the pieces if they pick the wrong category.


But we can achieve the same goal by introducing a simple GUC called 
dangerous_allow_udf_in_decoding, I think.




regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Em ter., 13 de jul. de 2021 às 15:26, Tom Lane  escreveu:

> Ranier Vilela  writes:
> > Em ter., 13 de jul. de 2021 às 11:29, Tom Lane 
> escreveu:
> >> I think you're missing my main point, which is that it seems certain
> that
> >> there are corner cases that do this *now*.  I'm proposing that we
> redefine
> >> this as not being a crash case, full stop.
>
> > I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must
> > crash.
>
> Did you see my followup?

I am trying.


>   The vast majority of live systems do not do
> that, so we are accomplishing nothing of value by insisting it's a
> crash-worthy bug.
>
I agreed.


> I flat out don't agree that "crash on debug builds but it's okay on
> production" is a useful way to define this.  I spend way too much
> time already on bug reports that only manifest with asserts enabled.
>
I understand.
Bug reports will decrease, because of that, people will lose interest and
motivation to report,
because (null), it doesn't seem like a serious error and my server didn't
crash.

It's a tricky tradeoff.

regards,
Ranier Vilela

Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Ranier Vilela  writes:
> Em ter., 13 de jul. de 2021 às 11:29, Tom Lane  escreveu:
>> I think you're missing my main point, which is that it seems certain that
>> there are corner cases that do this *now*.  I'm proposing that we redefine
>> this as not being a crash case, full stop.

> I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must
> crash.

Did you see my followup?  The vast majority of live systems do not do
that, so we are accomplishing nothing of value by insisting it's a
crash-worthy bug.

I flat out don't agree that "crash on debug builds but it's okay on
production" is a useful way to define this.  I spend way too much
time already on bug reports that only manifest with asserts enabled.

regards, tom lane

Re: proposal - psql - use pager for \watch command

2021-07-13 Thread Pavel Stehule

út 13. 7. 2021 v 19:50 odesílatel Tom Lane  napsal:

> Thomas Munro  writes:
> > Pushed, after retesting on macOS (with the fixed pspg that has by now
> > arrived in MacPorts), FreeBSD and Linux.  Thanks!
>
> After playing with this along the way to fixing the sigwait issues,
> I have a gripe/suggestion.  If I hit control-C while the thing
> is waiting between queries, eg
>
> regression=# select now() \watch
> Tue Jul 13 13:44:44 2021 (every 2s)
>
>   now
> ---
>  2021-07-13 13:44:44.396565-04
> (1 row)
>
> Tue Jul 13 13:44:46 2021 (every 2s)
>
>   now
> ---
>  2021-07-13 13:44:46.396572-04
> (1 row)
>
> ^Cregression=#
>
> then as you can see I get nothing but the "^C" echo before the next
> psql prompt.  The problem with this is that now libreadline is
> misinformed about the cursor position, messing up any editing I
> might try to do on the next line of input.  So I think it would
> be a good idea to have some explicit final output when the \watch
> command terminates, along the line of
>
> ...
> Tue Jul 13 13:44:46 2021 (every 2s)
>
>   now
> ---
>  2021-07-13 13:44:46.396572-04
> (1 row)
>
> ^C\watch cancelled
> regression=#
>
> This strikes me as a usability improvement even without the
> readline-confusion angle.
>
>
I'll look at this issue.

Pavel




> regards, tom lane
>

Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Em ter., 13 de jul. de 2021 às 11:29, Tom Lane  escreveu:

> Laurenz Albe  writes:
> > On Mon, 2021-07-12 at 13:20 -0400, Tom Lane wrote:
> >> So my feeling about this is that switching snprintf.c's behavior
> >> would produce some net gain in robustness for v12 and up, while
> >> not making things any worse for the older branches.  I still hold
> >> to the opinion that we've already flushed out all the cases of
> >> passing NULL that we're likely to find via ordinary testing.
>
> > New cases could be introduced in the future and might remain undetected.
> > What about adding an Assert that gags on NULLs, but still printing them
> > as "(null)"?  That would help find such problems in a debug build.
>
> I think you're missing my main point, which is that it seems certain that
> there are corner cases that do this *now*.  I'm proposing that we redefine
> this as not being a crash case, full stop.
>
I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must
crash.
On production builds, fine, printing (null).
This will put a little more pressure on support, "Hey what mean's (null) in
my logs?",
but it's ok.

regards,
Ranier Vilela

Re: More time spending with "delete pending"

2021-07-13 Thread Alexander Lakhin

Hello Michael,
12.07.2021 08:52, Michael Paquier wrote:
> On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote:
>> And fairywren, that uses MinGW, is unhappy:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2021-07-12%2004%3A47%3A13
>> Looking at it now.
> I am not sure what to do here to cool down MinGW, so the patch has
> been reverted for now:
> - Plain stat() is not enough to do a proper detection of the files
> pending for deletion on MSVC, so reverting only the part with
> microsoft_native_stat() is not going to help.
> - We are going to need an evaluation of the stat() implementation in
> MinGW and check if is it possible to rely on it more.  If yes, we
> could get away with more tweaks based on __MINGW32__/__MINGW64__.
I've managed to adapt open/win32stat for MinGW by preventing stat
overriding at the implementation level. Thus, the code that calls stat()
will use the overridden struct/function(s), but inside of
open/win32_stat we could access original stat and reference to
__pgstat64 where we want to use our.
Tested on MSVC, MinGW64 and MinGW32 — the fixed code compiles everywhere.

But when using perl from msys (not ActivePerl) while testing I've got
the same test failure due to another error condition:



In this case CreateFile() fails with ERROR_SHARING_VIOLATION (not
ERROR_ACCESS_DENIED) and it leads to the same "Permission denied" error.
This condition is handled in the pgwin32_open() but not in _pgstat64() yet.

I think I should develop another test for MSVC environment that will
demonstrate a such failure too.
But as it's not related to the DELETE_PENDING state, I wonder whether
this should be fixed in a separate patch.

Best regards,
Alexander
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 05c5a53442..d72950880a 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -53,11 +53,13 @@
 #undef near
 
 /* needed before sys/stat hacking below: */
+#if !defined(NO_STAT_OVERRIDE)
 #define fstat microsoft_native_fstat
 #define stat microsoft_native_stat
 #include 
 #undef fstat
 #undef stat
+#endif
 
 /* Must be here to avoid conflicting with prototype in windows.h */
 #define mkdir(a,b)	mkdir(a)
@@ -253,7 +255,7 @@ typedef int pid_t;
  * The struct stat is 32 bit in MSVC, so we redefine it as a copy of
  * struct __stat64.  This also fixes the struct size for MINGW builds.
  */
-struct stat		/* This should match struct __stat64 */
+struct _pgstat64	/* This should match struct __stat64 */
 {
 	_dev_t		st_dev;
 	_ino_t		st_ino;
@@ -268,12 +270,14 @@ struct stat		/* This should match struct __stat64 */
 	__time64_t	st_ctime;
 };
 
-extern int	_pgfstat64(int fileno, struct stat *buf);
-extern int	_pgstat64(const char *name, struct stat *buf);
+extern int	_pgfstat64(int fileno, struct _pgstat64 *buf);
+extern int	_pgstat64(const char *name, struct _pgstat64 *buf);
 
+#if !defined(NO_STAT_OVERRIDE)
 #define fstat(fileno, sb)	_pgfstat64(fileno, sb)
-#define stat(path, sb)		_pgstat64(path, sb)
+#define stat_pgstat64
 #define lstat(path, sb)		_pgstat64(path, sb)
+#endif
 
 /* These macros are not provided by older MinGW, nor by MSVC */
 #ifndef S_IRUSR
diff --git a/src/port/open.c b/src/port/open.c
index 14c6debba9..09144351a4 100644
--- a/src/port/open.c
+++ b/src/port/open.c
@@ -13,6 +13,7 @@
 
 #ifdef WIN32
 
+#define NO_STAT_OVERRIDE
 #ifndef FRONTEND
 #include "postgres.h"
 #else
diff --git a/src/port/win32stat.c b/src/port/win32stat.c
index 2ad8ee1359..1c9c01cc98 100644
--- a/src/port/win32stat.c
+++ b/src/port/win32stat.c
@@ -15,54 +15,10 @@
 
 #ifdef WIN32
 
+#define NO_STAT_OVERRIDE
 #include "c.h"
 #include 
-
-/*
- * In order to support MinGW and MSVC2013 we use NtQueryInformationFile as an
- * alternative for GetFileInformationByHandleEx. It is loaded from the ntdll
- * library.
- */
-#if _WIN32_WINNT < 0x0600
-#include 
-
-#if !defined(__MINGW32__) && !defined(__MINGW64__)
-/* MinGW includes this in , but it is missing in MSVC */
-typedef struct _FILE_STANDARD_INFORMATION
-{
-	LARGE_INTEGER AllocationSize;
-	LARGE_INTEGER EndOfFile;
-	ULONG		NumberOfLinks;
-	BOOLEAN		DeletePending;
-	BOOLEAN		Directory;
-} FILE_STANDARD_INFORMATION;
-#define FileStandardInformation 5
-#endif			/* !defined(__MINGW32__) &&
- * !defined(__MINGW64__) */
-
-typedef NTSTATUS (NTAPI * PFN_NTQUERYINFORMATIONFILE)
-			(IN HANDLE FileHandle,
-			 OUT PIO_STATUS_BLOCK IoStatusBlock,
-			 OUT PVOID FileInformation,
-			 IN ULONG Length,
-			 IN FILE_INFORMATION_CLASS FileInformationClass);
-
-static PFN_NTQUERYINFORMATIONFILE _NtQueryInformationFile = NULL;
-
-static HMODULE ntdll = NULL;
-
-/*
- * Load DLL file just once regardless of how many functions we load/call in it.
- */
-static void
-LoadNtdll(void)
-{
-	if (ntdll != NULL)
-		return;
-	ntdll = LoadLibraryEx("ntdll.dll", NULL, 0);
-}
-
-#endif			/* _WIN32_WINNT < 0x0600 */
+#include 
 
 
 /*
@@ -112,7 +68,7 @@ fileattr_to_unixmode(int

Re: proposal - psql - use pager for \watch command

Thomas Munro  writes:
> Pushed, after retesting on macOS (with the fixed pspg that has by now
> arrived in MacPorts), FreeBSD and Linux.  Thanks!

After playing with this along the way to fixing the sigwait issues,
I have a gripe/suggestion.  If I hit control-C while the thing
is waiting between queries, eg

regression=# select now() \watch
Tue Jul 13 13:44:44 2021 (every 2s)

  now  
---
 2021-07-13 13:44:44.396565-04
(1 row)

Tue Jul 13 13:44:46 2021 (every 2s)

  now  
---
 2021-07-13 13:44:46.396572-04
(1 row)

^Cregression=# 

then as you can see I get nothing but the "^C" echo before the next
psql prompt.  The problem with this is that now libreadline is
misinformed about the cursor position, messing up any editing I
might try to do on the next line of input.  So I think it would
be a good idea to have some explicit final output when the \watch
command terminates, along the line of

...
Tue Jul 13 13:44:46 2021 (every 2s)

  now  
---
 2021-07-13 13:44:46.396572-04
(1 row)

^C\watch cancelled
regression=# 

This strikes me as a usability improvement even without the
readline-confusion angle.

regards, tom lane

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Em ter., 13 de jul. de 2021 às 14:42, Ranier Vilela 
escreveu:

> Em ter., 13 de jul. de 2021 às 09:44, Ranier Vilela 
> escreveu:
>
>> Em ter., 13 de jul. de 2021 às 09:24, David Rowley 
>> escreveu:
>>
>>> On Wed, 14 Jul 2021 at 00:06, Ranier Vilela  wrote:
>>> >
>>> > Em ter., 13 de jul. de 2021 às 04:19, Ronan Dunklau <
>>> ronan.dunk...@aiven.io> escreveu:
>>> >> I would be
>>> >> surprised the check adds that much to the whole execution though.
>>> >
>>> > I think this branch is a misprediction.
>>>
>>> It could be.  I wondered that myself when I saw Ronan's results were
>>> better than mine for 2,4 and 7.  However, I think Ronan had quite a
>>> bit of noise in his results as there's no reason for the speedup in
>>> tests 2,4 and 7.
>>
>>
>>> > In most cases is it not datumSort?
>>>
>>> who knows.  Maybe someone's workload always requires the datum sort.
>>>
>>> > That's why I would like to use unlikely.
>>>
>>> We really only use unlikely() in cases where we want to move code out
>>> of line to a cold area because it's really never executed under normal
>>> circumstances. We tend to do that for ERROR cases as we don't ever
>>> really want to optimise for errors. We also sometimes do it when some
>>> function has a branch to initialise something during the first call.
>>> The case in question here does not fit for either of those two cases.
>>>
>> Hum, I understand the usage cases now.
>> Thanks for the hint.
>>
>>
>>>
>>> > IMO all the tests should all be to verify past behavior first.
>>>
>>> I'm not quire sure what you mean there.
>>>
>> I'm saying we could help the branch by keeping the same testing logic as
>> before and not reversing it.
>> Attached is a version to demonstrate this, I don't pretend to be v7.
>>
>> I couldn't find a good phrase to the contrary:
>> "are we *not* using the single value optimization ?"
>>
>> I don't have time to take the tests right now.
>>
> Finally I had time to benchmark (David's benchsort.sh)
>
> ubuntu 64 bits (20.04) 8gb ram SSD 256GB.
> Table with the best results of each.
>
>
>  HEADv6v7 v7b v6 vs
> master   v7 vs v6v7b vs v6
> Test1 288,149636 449,018541 469,757169 550,48505 155,83% 104,62% 122,60%
> Test2 94,766955 95,451406 94,556249 94,718982 100,72% 99,06% 99,23%
> Test3 190,521319 260,279802 259,597067 278,115296 136,61% 99,74% 106,85%
> Test4 78,779344 78,253455 78,114068 77,941482 99,33% 99,82% 99,60%
> Test5 131,362614 142,662223 136,436347 149,639041 108,60% 95,64% 104,89%
> Test6 112,884298 124,181671 115,528328 127,58497 110,01% 93,03% 102,74%
> Test7 69,308587 68,643067 66,10195 69,087544 99,04% 96,30% 100,65%
> Test8 243,674171 364,681142 371,928453 419,259703 149,66% 101,99% 114,97%
>
> I have no idea why v7 failed with test6?
> v6 slowdown with test4 and test7.
> v7b slowdown with test2 and test4, in relation with v7.
>
 v7b slowdown with test2 and test4, in relation with *v6*.


> If field struct datumSort is not absolutely necessary, I think that v7
> will be better.
>
 *v7b* will be better.

Sorry for the noise.

regards,
Ranier Vilela

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Em ter., 13 de jul. de 2021 às 09:44, Ranier Vilela 
escreveu:

> Em ter., 13 de jul. de 2021 às 09:24, David Rowley 
> escreveu:
>
>> On Wed, 14 Jul 2021 at 00:06, Ranier Vilela  wrote:
>> >
>> > Em ter., 13 de jul. de 2021 às 04:19, Ronan Dunklau <
>> ronan.dunk...@aiven.io> escreveu:
>> >> I would be
>> >> surprised the check adds that much to the whole execution though.
>> >
>> > I think this branch is a misprediction.
>>
>> It could be.  I wondered that myself when I saw Ronan's results were
>> better than mine for 2,4 and 7.  However, I think Ronan had quite a
>> bit of noise in his results as there's no reason for the speedup in
>> tests 2,4 and 7.
>
>
>> > In most cases is it not datumSort?
>>
>> who knows.  Maybe someone's workload always requires the datum sort.
>>
>> > That's why I would like to use unlikely.
>>
>> We really only use unlikely() in cases where we want to move code out
>> of line to a cold area because it's really never executed under normal
>> circumstances. We tend to do that for ERROR cases as we don't ever
>> really want to optimise for errors. We also sometimes do it when some
>> function has a branch to initialise something during the first call.
>> The case in question here does not fit for either of those two cases.
>>
> Hum, I understand the usage cases now.
> Thanks for the hint.
>
>
>>
>> > IMO all the tests should all be to verify past behavior first.
>>
>> I'm not quire sure what you mean there.
>>
> I'm saying we could help the branch by keeping the same testing logic as
> before and not reversing it.
> Attached is a version to demonstrate this, I don't pretend to be v7.
>
> I couldn't find a good phrase to the contrary:
> "are we *not* using the single value optimization ?"
>
> I don't have time to take the tests right now.
>
Finally I had time to benchmark (David's benchsort.sh)

ubuntu 64 bits (20.04) 8gb ram SSD 256GB.
Table with the best results of each.


 HEADv6v7 v7b v6 vs
master   v7 vs v6v7b vs v6
Test1 288,149636 449,018541 469,757169 550,48505 155,83% 104,62% 122,60%
Test2 94,766955 95,451406 94,556249 94,718982 100,72% 99,06% 99,23%
Test3 190,521319 260,279802 259,597067 278,115296 136,61% 99,74% 106,85%
Test4 78,779344 78,253455 78,114068 77,941482 99,33% 99,82% 99,60%
Test5 131,362614 142,662223 136,436347 149,639041 108,60% 95,64% 104,89%
Test6 112,884298 124,181671 115,528328 127,58497 110,01% 93,03% 102,74%
Test7 69,308587 68,643067 66,10195 69,087544 99,04% 96,30% 100,65%
Test8 243,674171 364,681142 371,928453 419,259703 149,66% 101,99% 114,97%

I have no idea why v7 failed with test6?
v6 slowdown with test4 and test7.
v7b slowdown with test2 and test4, in relation with v7.

If field struct datumSort is not absolutely necessary, I think that v7 will
be better.
Attached the patchs and file results.
Benchmarks datumSort:

1) head

a)
Test1
tps = 286.571700 (without initial connection time)
tps = 285.650960 (without initial connection time)
tps = 287.069988 (without initial connection time)
Test2
tps = 94.565228 (without initial connection time)
tps = 94.690370 (without initial connection time)
tps = 94.741829 (without initial connection time)
Test3
tps = 190.581533 (without initial connection time)
tps = 190.405579 (without initial connection time)
tps = 190.222865 (without initial connection time)
Test4
tps = 78.001599 (without initial connection time)
tps = 78.163404 (without initial connection time)
tps = 78.325974 (without initial connection time)
Test5
tps = 130.276589 (without initial connection time)
tps = 128.854218 (without initial connection time)
tps = 129.062907 (without initial connection time)
Test6
tps = 112.629915 (without initial connection time)
tps = 113.179220 (without initial connection time)
tps = 112.395572 (without initial connection time)
Test7
tps = 69.109489 (without initial connection time)
tps = 68.973755 (without initial connection time)
tps = 69.204028 (without initial connection time)
Test8
tps = 242.364808 (without initial connection time)
tps = 242.448441 (without initial connection time)
tps = 243.191950 (without initial connection time)


b)
Test1
tps = 288.149636 (without initial connection time)
tps = 287.761581 (without initial connection time)
tps = 286.865262 (without initial connection time)
Test2
tps = 94.766955 (without initial connection time)
tps = 94.361680 (without initial connection time)
tps = 94.636953 (without initial connection time)
Test3
tps = 190.394402 (without initial connection time)
tps = 190.521319 (without initial connection time)
tps = 190.446202 (without initial connection time)
Test4
tps = 78.096603 (without initial connection time)
tps = 78.576600 (without initial connection time)
tps = 78.779344 (without initial connection time)
Test5
tps = 131.131458 (without initial connection time)
tps = 131.344875 (without initial connection time)
tps = 131.362614 (without initial connection time)
Test6
tps = 112.884298

Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

I wrote:
> Now, what we don't have control of is what will happen in pre-v12
> branches on platforms where we use the system's *printf.  However,
> note what I wrote in the log for 0c62356cc:
> Per commit e748e902d, we appear to have little or no coverage in the
> buildfarm of machines that will dump core when asked to printf a
> null string pointer.
> Thus it appears that a large fraction of the world is already either
> using glibc or following glibc's lead on this point.

Further to that point: I just ran around and verified that the system
printf prints "(null)" rather than crashing on FreeBSD 12.2, NetBSD 8.99,
OpenBSD 6.8, macOS 11.4, and Solaris 11.3.  AIX 7.2 and HPUX 10.20 print
"", but still don't crash.  If we change snprintf.c then we will also be
okay on Windows, because we've always used our own snprintf on that
platform.  In short, the only place I can find where there is actually
any hazard is Solaris 10 [1].  I do not think we should let the risk of
obscure bugs in pre-v12 versions on one obsolete OS drive our
decision-making about this.

regards, tom lane

[1] Per experimentation locally and on the GCC compile farm, using
the attached.
#include 

int
main(int argc, char **argv)
{
	printf("NULL prints as \"%s\"\n", (char *) NULL);
	return 0;
}

Re: Add option --drop-cascade for pg_dump/restore

2021-07-13 Thread Wu Haotian

> 2) I felt pg_dump will include the cascade option for plain format and
> pg_restore will include the cascade option from pg_restore for other
> formats. If my understanding is correct, should we document this?

I may not understand it correctly, are you saying
pg_dump will include the cascade option only for plain format, or
pg_dump will enable the cascade option for plain by default?

> 4) Is it possible to add a few tests for this?

Looks like tests should be added to
`src/bin/pg_dump/t/002_pg_dump.pl`, I'll try to add some.

vignesh C  于2021年7月13日周二 下午10:23写道：
>
> On Fri, Jul 2, 2021 at 12:11 PM Haotian Wu  wrote:
> >
> > Hi,
> >
> > I agree that —drop-cascade does not make sense for pg_dumpall, so I removed 
> > them.
> >
> > > are we expecting more things to appear after the semi-colon?
> >
> > No, I was just trying to “reuse” original statement as much as possible. 
> > Append “\n” manually should also do the job, and I’ve updated the patch as 
> > you suggests.
>
> 1) This change is not required as it is not supported for pg_dumpall
> +++ b/doc/src/sgml/ref/pg_dumpall.sgml
> @@ -289,6 +289,16 @@ PostgreSQL documentation
>
>   
>
> + 
> +  --drop-cascade
> +  
> +   
> +Use CASCADE to drop database objects.
> +This option is not valid unless --clean is
> also specified.
> +   
> +  
> + 
> +
>
> 2) I felt pg_dump will include the cascade option for plain format and
> pg_restore will include the cascade option from pg_restore for other
> formats. If my understanding is correct, should we document this?
>
> 3) This change is not required
>
> destroyPQExpBuffer(ftStmt);
> pg_free(dropStmtOrig);
> }
> +
> }
>
> 4) Is it possible to add a few tests for this?
>
> Regards,
> Vignesh

Re: row filtering for logical replication

2021-07-13 Thread Jeff Davis

On Tue, 2021-07-13 at 10:24 +0530, Amit Kapila wrote:
> to do. AFAIU, the main things we want to prohibit in the filter are:
> (a) it doesn't refer to any relation other than catalog in where
> clause,

Right, because the walsender is using a historical snapshot.

> (b) it doesn't use UDFs in any way (in expressions, in
> user-defined operators, user-defined types, etc.),

Is this a reasonable requirement? Postgres has a long history of
allowing UDFs nearly everywhere that a built-in is allowed. It feels
wrong to make built-ins special for this feature.

> (c) the columns
> referred to in the filter should be part of PK or Replica Identity.

Why?

Also:

* Andres also mentioned that the function should not leak memory.
* One use case for this feature is when sharding a table, so the
expression should allow things like "hashint8(x) between ...". I'd
really like to see this problem solved, as well.

> I think in the long run one idea to allow UDFs is probably by
> explicitly allowing users to specify whether the function is
> publication predicate safe and if so, then we can allow such
> functions
> in the filter clause.

This sounds like a better direction. We probably need some kind of
catalog information here to say what functions/operators are "safe" for
this kind of purpose. There are a couple questions:

1. Should this notion of safety be specific to this feature, or should
we try to generalize it so that other areas of the system might benefit
as well?

2. Should this marking be superuser-only, or user-specified?

3. Should it be related to the IMMUTABLE/STABLE/VOLATILE designation,
or completely separate?

Regards,
Jeff Davis

Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2021-07-13 Thread David G. Johnston

On Tue, Jul 13, 2021 at 3:30 AM Ibrar Ahmed  wrote:

>
>
> On Tue, Mar 9, 2021 at 9:01 PM David Steele  wrote:
>
>> On 3/9/21 10:08 AM, David G. Johnston wrote:
>> >
>> > On Tuesday, March 9, 2021, David Steele > > > wrote:
>> >
>> > Further, I think we should close this entry at the end of the CF if
>> > it does not attract committer interest. Tom is not in favor of the
>> > patch and it appears Alexander decided not to commit it.
>> >
>> > Pavel re-reviewed it and was fine with ready-to-commit so that status
>> > seems fine.
>>
>> Ah yes, that was my mistake.
>>
>> Regards,
>> --
>> -David
>> da...@pgmasters.net
>>
>>
>>
> The status of the patch is "Need Review" which was previously "Ready for
> Committer ''. After @David G
> and @David Steele  comments, it's not clear whether
> it should be "Read for commit" or "Need Review".
>
>
I changed it to Ready to Commit based on the same logic as my reply to
David quoted above.

David J.

Re: PG 14 release notes, first draft

2021-07-13 Thread Simon Riggs

On Fri, Jul 2, 2021 at 12:50 AM Bruce Momjian  wrote:
>
> On Thu, Jul  1, 2021 at 03:13:30PM +0100, Simon Riggs wrote:
> > On Wed, Jun 30, 2021 at 11:20 PM Bruce Momjian  wrote:
> > > > * "row expiration" is a term not currently used in PG docs, so we
> > > > should probably look for something else.
> > >
> > > Yeah, I changed that to "removing dead rows" which seems to be our
> > > standard text.
> >
> > What you have now looks great for this feature, thanks.
>
> Good.
>
> > > > There are 2 important features here, so the 2nd feature is worth
> > > > mentioning also:
> > > >
> > > > Avoid spurious waits in concurrent indexing
> > > >
> > > > Previously, multiple concurrent index operations could deadlock or
> > > > cause long waits.
> > > > Waits are avoided except for indexes with expressions, or WHERE 
> > > > predicates.
> > >
> > > OK, I added text to the bottom to try and capture that;  new patch
> > > attached, now with UTF8 encoding.
> >
> > The text from "This also avoids..." tries to explain this, but they
> > are two separate features, each important in its own right.
> >
> > So regrettably, this part doesn't capture it, for me.
>
> I see what you mean.  This is in the VACUUM section, and this feature,
> though from the same commits, has nothing to do with vacuum.  Attached
> is an updated patch.

Perfect, many thanks.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Printing backtrace of postgres processes

2021-07-13 Thread vignesh C

On Wed, May 12, 2021 at 2:27 AM Robert Haas  wrote:
>
> On Thu, May 6, 2021 at 3:31 PM Tom Lane  wrote:
> > Andres Freund  writes:
> > > On 2021-05-06 14:56:09 -0400, Tom Lane wrote:
> > >> If we think it's worth having a predefined role for, OK.  However,
> > >> I don't like the future I see us heading towards where there are
> > >> hundreds of random predefined roles.  Is there an existing role
> > >> that it'd be reasonable to attach this ability to?
> >
> > > It does seem like it'd be good to group it in with something
> > > else. There's nothing fitting 100% though.
> >
> > I'd probably vote for pg_read_all_data, considering that much of
> > the concern about this has to do with the possibility of exposure
> > of sensitive data.  I'm not quite sure what the security expectations
> > are for pg_monitor.
>
> Maybe we should have a role that is specifically for server debugging
> type things. This kind of overlaps with Mark Dilger's proposal to try
> to allow SET for security-sensitive GUCs to be delegated via
> predefined roles. The exact way to divide that up is open to question,
> but it wouldn't seem crazy to me if the same role controlled the
> ability to do this plus the ability to set the GUCs
> backtrace_functions, debug_invalidate_system_caches_always,
> wal_consistency_checking, and maybe a few other things.

+1 for the idea of having a new role for this. Currently I have
implemented this feature to be supported only for the superuser. If we
are ok with having a new role to handle debugging features, I will
make a 002 patch to handle this.

Regards,
Vignesh

Re: Bogus HAVE_DECL_FOO entries in msvc/Solution.pm

2021-07-13 Thread Peter Eisentraut


On 13.07.21 09:53, Michael Paquier wrote:

I was thinking to just do the easiest move and fix this issue down to
13, not bothering about older branches :p

Looking at the commit, a backpatch is not that complicated and it is
possible to check the generation of pg_config.h on non-MSVC
environments if some objects are missing.  Still, I think that it
would be better to be careful and test this code properly on Windows
with a real build.  It means that..  Err...  Andrew or I should look
at that.  I am not sure that the potential maintenance gain is worth
poking at the stable branches, to be honest.


We have lived with the previous system for a decade, so I think 
backpatching this would be a bit excessive.

Re: Detecting File Damage & Inconsistencies

2021-07-13 Thread Simon Riggs

On Tue, Jul 6, 2021 at 4:21 AM Amit Kapila  wrote:
>
> On Fri, Jul 2, 2021 at 8:29 PM Simon Riggs  
> wrote:
> >
> > On Fri, Jul 2, 2021 at 5:34 AM Craig Ringer
> >  wrote:
> > >
> >
> > > If you don't think the sorts of use cases I presented are worth the 
> > > trouble that's fair enough. I'm not against adding it on the commit 
> > > record. It's just that with logical decoding moving toward a streaming 
> > > model I suspect only having it at commit time may cause us some pain 
> > > later.
> >
> > I think you have some good ideas about how to handle larger
> > transactions with streaming. As a separate patch it might be worth
> > keeping track of transaction size and logging something when a
> > transaction gets too large.
> >
>
> If we want this additional information for streaming mode in logical
> replication then can't we piggyback it on the very first record
> written for a transaction when this info is required? Currently, we do
> something similar for logging top_level_xid for subtransaction in
> XLogRecordAssemble().

It's possible, but I'm struggling to believe anybody would accept that
as an approach because it breaks simplicity, modularity and makes it
harder to search for this info in the WAL.

I was imagining that we'd keep track of amount of WAL written by a
transaction and when it reaches a certain size generate a "streaming
info" record as an early warning that we have a big transaction coming
down the pipe.

I'm feeling that a simple patch is expanding well beyond its original
scope and timeline. How can we do this simply?

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Column Filtering in Logical Replication

2021-07-13 Thread Rahila Syed

Hi Tomas,

Thank you for your comments.


>
> >
> > Currently, this capability is not included in the patch. If the table on
> > the subscriber
> > server has lesser attributes than that on the publisher server, it
> > throws an error at the
> > time of CREATE SUBSCRIPTION.
> >
>
> That's a bit surprising, to be honest. I do understand the patch simply
> treats the filtered columns as "unchanged" because that's the simplest
> way to filter the *data* of the columns. But if someone told me we can
> "filter columns" I'd expect this to work without the columns on the
> subscriber.
>
> OK, I will look into adding this.


>
> > However, need to carefully consider situations in which a server
> > subscribes to multiple
> > publications,  each publishing a different subset of columns of a table.

Isn't that pretty much the same situation as for multiple subscriptions
> each with a different set of I/U/D operations? IIRC we simply merge
> those, so why not to do the same thing here and merge the attributes?
>
>
Yeah, I agree with the solution to merge the attributes, similar to how
operations are merged. My concern was also from an implementation point
of view, will it be a very drastic change. I now had a look at how remote
relation
attributes are acquired for comparison with local attributes at the
subscriber.
It seems that the publisher will need to send the information about the
filtered columns
for each publication specified during CREATE SUBSCRIPTION.
This will be read at the subscriber side which in turn updates its cache
accordingly.
Currently, the subscriber expects all attributes of a published relation to
be present.
I will add code for this in the next version of the patch.

 To nitpick, I find "Bitmapset *att_list" a bit annoying, because it's

not really a list ;-)


I will make this change with the next version



>  FWIW "make check" fails for me with this version, due to segfault in
> OpenTableLists. Apparenly there's some confusion - the code expects the
> list to contain PublicationTable nodes, and tries to extract the
> RangeVar from the elements. But the list actually contains RangeVar, so
> this crashes and burns. See the attached backtrace.
>
>
Thank you for the report, This is fixed in the attached version, now all
publication
function calls accept the PublicationTableInfo list.

Thank you,
Rahila Syed


v2-0001-Add-column-filtering-to-logical-replication.patch
Description: Binary data

Re: Enhanced error message to include hint messages for redundant options error

2021-07-13 Thread vignesh C

On Tue, Jul 13, 2021 at 4:25 PM Dean Rasheed  wrote:
>
> On Mon, 12 Jul 2021 at 17:39, vignesh C  wrote:
> >
> > Thanks for your comments, I have made the changes for the same in the
> > V10 patch attached.
> > Thoughts?
> >
>
> I'm still not happy about changing so many error messages.
>
> Some of the changes might be OK, but aren't strictly necessary. For example:
>
>  COPY x from stdin (force_not_null (a), force_not_null (b));
> -ERROR:  conflicting or redundant options
> +ERROR:  option "force_not_null" specified more than once
>  LINE 1: COPY x from stdin (force_not_null (a), force_not_null (b));
> ^
>
> I actually prefer the original primary error message, for consistency
> with other similar cases, and I think the error position indicator is
> sufficient to identify the problem. If it were to include the
> "specified more than once" text, I would put that in DETAIL.
>
> Other changes are wrong though. For example:
>
>  COPY x from stdin (format CSV, FORMAT CSV);
> -ERROR:  conflicting or redundant options
> +ERROR:  redundant options specified
>  LINE 1: COPY x from stdin (format CSV, FORMAT CSV);
> ^
>
> The problem here is that the code that throws this error throws the
> same error if the second format is different, which would make it a
> conflicting option, not a redundant one. And I don't think we should
> add more code to test whether it's conflicting or redundant, so again,
> I think we should just keep the original error message.
>
> Similarly, this error is wrong:
>
> CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$ SELECT 1 $$ STABLE 
> IMMUTABLE;
> ERROR:  redundant options specified
> LINE 1: ...NCTION foo() RETURNS int AS $$ SELECT 1 $$ STABLE IMMUTABLE;
>  ^
>
> And even this error:
>
> CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$ SELECT 1 $$ STRICT STRICT;
> ERROR:  redundant options specified
> LINE 1: ... FUNCTION foo() RETURNS int AS $$ SELECT 1 $$ STRICT STRICT;
> ^
>
> which looks OK, is actually problematic because the same code also
> handles the alternate syntax, which leads to this (which is now wrong
> because it's conflicting not redundant):
>
> CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$ SELECT 1 $$ STRICT
> CALLED ON NULL INPUT;
> ERROR:  redundant options specified
> LINE 1: ...NCTION foo() RETURNS int AS $$ SELECT 1 $$ STRICT CALLED ON ...
>  ^
>
> The problem is it's actually quite hard to decide in each case whether
> the option is redundant or conflicting. Sometimes, it might look
> obvious in the code, but actually be much more subtle, due to an
> earlier transformation of the grammar. Likewise redundant doesn't
> necessarily mean literally specified more than once.
>
> Also, most of these don't have regression test cases, and I'm very
> reluctant to change them without proper testing, and that would make
> the patch much bigger. To me, this patch is already attempting to
> change too much in one go, which is causing problems.
>
> So I suggest a more incremental approach, starting by keeping the
> original error message, but improving it where possible with the error
> position. Then maybe move on to look at specific cases that can be
> further improved with additional detail (keeping the same primary
> error message, for consistency).

I'm fine with this approach as we do not have tests to cover all the
error conditions, also I'm not sure if it is worth adding tests for
all the error conditions and as the patch changes a large number of
error conditions, an incremental approach is better.

> Here is an updated version, following that approach. It does the following:
>
> 1). Keeps the same primary error message ("conflicting or redundant
> options") in all cases.
>
> 2). Uses errorConflictingDefElem() to throw it, to ensure consistency
> and reduce the executable size.
>
> 3). Includes your enhancements to make the ParseState available in
> more places, so that the error position indicator is shown to indicate
> the cause of the error.
>
> IMO, this makes for a much safer incremental change, that is more committable.
>
> As it turns out, there are 110 cases of this error that now use
> errorConflictingDefElem(), and of those, just 10 (in 3 functions)
> don't have a ParseState readily available to them:
>
> - ATExecSetIdentity()
> - parse_output_parameters() x5
> - parseCreateReplSlotOptions() x4
>
> It might be possible to improve those (and possibly some of the others
> too) by adding some appropriate DETAIL to the error, but as I said, I
> suggest doing that in a separate follow-on patch, and only with
> careful analysis and testing of each case.
>
> As it stands, the improvements from (3) seem quite worthwhile. Also,
> the patch saves a couple of hundred lines of code, and for me

Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Laurenz Albe  writes:
> On Mon, 2021-07-12 at 13:20 -0400, Tom Lane wrote:
>> So my feeling about this is that switching snprintf.c's behavior
>> would produce some net gain in robustness for v12 and up, while
>> not making things any worse for the older branches.  I still hold
>> to the opinion that we've already flushed out all the cases of
>> passing NULL that we're likely to find via ordinary testing.

> New cases could be introduced in the future and might remain undetected.
> What about adding an Assert that gags on NULLs, but still printing them
> as "(null)"?  That would help find such problems in a debug build.

I think you're missing my main point, which is that it seems certain that
there are corner cases that do this *now*.  I'm proposing that we redefine
this as not being a crash case, full stop.

Now, what we don't have control of is what will happen in pre-v12
branches on platforms where we use the system's *printf.  However,
note what I wrote in the log for 0c62356cc:

Per commit e748e902d, we appear to have little or no coverage in the
buildfarm of machines that will dump core when asked to printf a
null string pointer.

Thus it appears that a large fraction of the world is already either
using glibc or following glibc's lead on this point.  If we do likewise,
it will remove some crash cases and not introduce any new ones.

In hindsight I feel like 0c62356cc was an overreaction to the unusual
property e748e902d's bug had, namely that "(null)" was getting printed
in a place where it would not show up in any visible output.  Since
we certainly wouldn't consider that behavior OK if we saw it, you'd
really have to assume that there are more lurking bugs with that same
property in order to conclude that the Assert is worth its keep.

regards, tom lane

Re: unnesting multirange data types

On 2021-Jul-13, Alexander Korotkov wrote:

> > To be clear, do you mean with or without this hunk ?
> >
> > -  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
> > +  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
> 
> I mean with this hunk unless I hear objection to it.

+1 for pushing with that hunk, no catversion bump.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)

Re: Add option --drop-cascade for pg_dump/restore

2021-07-13 Thread vignesh C

On Fri, Jul 2, 2021 at 12:11 PM Haotian Wu  wrote:
>
> Hi,
>
> I agree that —drop-cascade does not make sense for pg_dumpall, so I removed 
> them.
>
> > are we expecting more things to appear after the semi-colon?
>
> No, I was just trying to “reuse” original statement as much as possible. 
> Append “\n” manually should also do the job, and I’ve updated the patch as 
> you suggests.

1) This change is not required as it is not supported for pg_dumpall
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -289,6 +289,16 @@ PostgreSQL documentation
   
  

+ 
+  --drop-cascade
+  
+   
+Use CASCADE to drop database objects.
+This option is not valid unless --clean is
also specified.
+   
+  
+ 
+

2) I felt pg_dump will include the cascade option for plain format and
pg_restore will include the cascade option from pg_restore for other
formats. If my understanding is correct, should we document this?

3) This change is not required

destroyPQExpBuffer(ftStmt);
pg_free(dropStmtOrig);
}
+
}

4) Is it possible to add a few tests for this?

Regards,
Vignesh

Early Sort/Group resjunk column elimination.

2021-07-13 Thread Ronan Dunklau

Hello,

I would like to know if there is any interest in working to reduce the usage 
and propagation of resjunk columns in the planner.

I think this topic is worth investigating, because most of the time when we 
request a sorted path without ever needing the sort key afterwards, we still 
carry the sort key to the final tuple, where the JunkFilter will finally get 
rid 
of it.

Rationale


This would allow several optimizations.

1) Index not needing to output the column 

This one was mentioned as far back as 2009 [1]  and is still relevant today. 
If we query SELECT a FROM t1 ORDER BY b; and we have an index, on b, we 
shouldn't output b at all since we don't need it in the upper nodes. This 
might not look like a huge problem by itself, but as noted in [1] it becomes 
very expensive in the case of a functional index. This is alleviated for 
IndexOnlyScan because it is capable of fetching the value from the index 
itself, but it is still a problem.

Take this query as an example:

regression=# explain (verbose) select two from tenk2 order by hundred;
   QUERY PLAN   
 
-
 Index Scan using tenk2_hundred on public.tenk2  (cost=0.29..1574.20 
rows=1 width=8)
   Output: two, hundred
(2 rows)


We should be able to transform it into:

regression=# explain (verbose) select two from tenk2 order by hundred;
   QUERY PLAN   
 
-
 Index Scan using tenk2_hundred on public.tenk2  (cost=0.29..1574.20 
rows=1 width=4)
   Output: two
(2 rows)


2) Other nodes

Other nodes might benefit from it, for exemple in FDW. Right now the sort key 
is always returned from the underlying FDW, but if the data can be sorted that 
could be a net win.

3) Incremental Sort

While working on the patch to allow Sort nodes to use the datumOnly 
optimization, a suggestion came up to also use it in the IncrementalSort. This 
is not possible today because even if we don't need the previously-sorted 
columns anymore, we still need to output them as resjunk columns.

4) Narrower tuples in dynamic shared memory.

DSM bandwidth is quite expensive, so if we can avoid exchanging some 
attributes here it could be a net win.


Proposal
===

I've been trying to test this idea using a very simple approach. If that is of 
interest, I can clean up my branch and post a simple patch to discuss 
specifics, but I'd like to keep a high level discussion first. 

The idea would be to:
  - "tag" resjunk TargetEntries according to why they were added. So a column 
added as sort group clause would be tagged as such, and be recognisable
 - in the planner, instead of using the whole processed target list to build 
the finaltarget, we would remove resjunk entries we don't actually need (those 
added only as sortgroup clauses as of now, but there may be other kind of 
resjunk entries we can safely omit).
 - inject those columns only when generating the input targets needed for 
sorting, grouping, window functions and the likes. 

Using only this already allows optimization number 1), because if no Sort node 
needs to be added the pathtarget just cascade to the bottom of the path.

There is one big downside to this: it introduces a mismatch between the 
finaltarget and the output of the previous node (for example sort). This adds a 
costly Result node everywhere, performing an expensive projection instead of 
the much simpler JunkFilter we currently have:

regression=# explain (verbose) select two from tenk2 order by four;
  QUERY PLAN  
--
 Result  (cost=1109.39..1234.39 rows=1 width=4)
   Output: two
   ->  Sort  (cost=1109.39..1134.39 rows=1 width=8)
 Output: two, four
 Sort Key: tenk2.four
 ->  Seq Scan on public.tenk2  (cost=0.00..445.00 rows=1 width=8)
   Output: two, four
(7 rows)


I think this is something that could easily be solved, either by teaching some 
nodes to do simple projections, consisting only of removing / reordering some 
attributes. This would match what  ExecJunkFilter does, generalized to any 
kind of "subset of attributes" projection.

Alternatively, we could also perform that at the Result level, leaving 
individual nodes alone, by implementing a simpler result node using the 
JunkFilter mechanism when it's possible (either with a full-blown 
"SimpleResult" specific node, or a different execprocnode in the Result).

If the idea seems worthy, I'll keep working on it and send you a patch 
demonstrating the idea.

[1] https://www.postgresql.org/message-id/flat/9957.1250956747%40sss.pgh.pa.us

Best regards,

-- 
Ronan

Re: Remove repeated calls to PQserverVersion

Michael Paquier  writes:
> On Tue, Jul 13, 2021 at 07:02:27PM +1000, Peter Smith wrote:
>> I found a few functions making unnecessary repeated calls to
>> PQserverVersion(conn); instead of just calling once and assigning to a
>> local variable.

> Does it really matter?  PQserverVersion() does a simple lookup at the
> internals of PGconn.

Yeah, it'd have to be mighty hot code to be worth caring about that,
and none of these spots look like it could be worth it.

regards, tom lane

Re: unnesting multirange data types

2021-07-13 Thread Alexander Korotkov

On Tue, Jul 13, 2021 at 5:07 PM Justin Pryzby  wrote:
> On Tue, Jul 13, 2021 at 03:11:16PM +0300, Alexander Korotkov wrote:
> > On Sun, Jul 11, 2021 at 1:20 AM Justin Pryzby  wrote:
> > > On Sun, Jul 11, 2021 at 01:00:27AM +0300, Alexander Korotkov wrote:
> > > > On Sat, Jul 10, 2021 at 7:34 PM Alvaro Herrera 
> > > >  wrote:
> > > > > On 2021-Jun-27, Alexander Korotkov wrote:
> > > > >
> > > > > > BTW, I found some small inconsistencies in the declaration of
> > > > > > multirange operators in the system catalog.  Nothing critical, but 
> > > > > > if
> > > > > > we decide to bump catversion in beta3, this patch is also nice to
> > > > > > push.
> > > > >
> > > > > Hmm, I think you should push this and not bump catversion.  That way,
> > > > > nobody is forced to initdb if we end up not having a catversion bump 
> > > > > for
> > > > > some other reason; but also anybody who initdb's with beta3 or later
> > > > > will get the correct descriptions.
> > > > >
> > > > > If you don't push it, everybody will have the wrong descriptions.
> > > >
> > > > True, but I'm a bit uncomfortable about user instances with different
> > > > catalogs but the same catversions.  On the other hand, initdb's with
> > > > beta3 or later will be the vast majority among pg14 instances.
> > > >
> > > > Did we have similar precedents in the past?
> > >
> > > It seems so.
> > >
> > > Note in particular 74ab96a45, which adds a new function with no bump.
> > > Although that one may not be a good precedent to follow, or one that's 
> > > been
> > > followed recently.
> >
> > Justin, thank you very much for the summary.
> >
> > Given we have similar precedents in the past, I'm going to push the
> > patch [1] to master and pg14 if no objections.
>
> To be clear, do you mean with or without this hunk ?
>
> -  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
> +  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },

I mean with this hunk unless I hear objection to it.

The implementations of scalarltjoinsel and scalargtjoinsel are the
same.  And I don't think they are going to be changed on pg14.

--
Regards,
Alexander Korotkov

Re: proposal: possibility to read dumped table's name from file





On 7/13/21 3:40 PM, Stephen Frost wrote:

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Alvaro Herrera  writes:

[1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed
 plus lines starting with # are comments, seems plenty.  Any line not
 following that format would cause an error to be thrown.


I'd like to see some kind of keyword on each line, so that we could extend
the command set by adding new keywords.  As this stands, I fear we'd end
up using random punctuation characters in place of [+-], which seems
pretty horrid from a readability standpoint.


I agree that it'd end up being bad with single characters.



The [+-] format is based on what rsync does, so there's at least some 
precedent for that, and IMHO it's fairly readable. I agree the rest of 
the rule (object type, ...) may be a bit more verbose.



I think that this file format should be designed with an eye to allowing
every, or at least most, pg_dump options to be written in the file rather
than on the command line.  I don't say we have to *implement* that right
now; but if the format spec is incapable of being extended to meet
requests like that one, I think we'll regret it.  This line of thought
suggests that the initial commands ought to match the existing
include/exclude switches, at least approximately.


I agree that we want to have an actual config file that allows just
about every pg_dump option.  I'm also fine with saying that we don't
have to implement that initially but the format should be one which can
be extended to allow that.



I understand the desire to have a config file that may contain all 
pg_dump options, but I really don't see why we'd want to mix that with 
the file containing filter rules.


I think those should be separate, one of the reasons being that I find 
it desirable to be able to "include" the filter rules into different 
pg_dump configs. That also means the format for the filter rules can be 
much simpler.


It's also not clear to me whether the single-file approach would allow 
filtering not supported by actual pg_dump option, for example.



Hence I suggest

include table PATTERN
exclude table PATTERN

which ends up being the above but with words not [+-].



Work for me.


Which ends up inventing yet-another-file-format which people will end up
writing generators and parsers for.  Which is exactly what I was arguing
we really should be trying to avoid doing.



People will have to write generators *in any case* because how else 
would you use this? Unless we also provide tools to manipulate that file 
(which seems rather futile), they'll have to do that. Even if we used 
JSON/YAML/TOML/... they'd still need to deal with the semantics of the 
file format.


FWIW I don't understand why would they need to write parsers. That's 
something we'd need to do to process the file. I think the case when the 
filter file needs to be modified is rather rare - it certainly is not 
what the original use case Pavel tried to address needs. (I know that 
customer and the filter would be generated and used for a single dump.)


My opinion is that the best solution (to make both generators and 
parsers simple) is to keep the format itself as simple as possible. 
Which is exactly why I'm arguing for only addressing the filtering, not 
trying to invent a "universal" pg_dump config file format.



I definitely feel that we should have a way to allow anything that can
be created as an object in the database to be explicitly included in the
file and that means whatever we do need to be able to handle objects
that have names that span multiple lines, etc.  It's not clear how the
above would.  As I recall, the proposed patch didn't have anything for
handling that, which was one of the issues I had with it and is why I
bring it up again.



I really don't understand why you think the current format can't do 
escaping/quoting or handle names spanning multiple lines. The fact that 
the original patch did not handle that correctly is a bug, but it does 
not mean the format can't handle that.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: RFC: Logging plan of the running query

2021-07-13 Thread Masahiro Ikeda

On Tue, Jun 22, 2021 at 8:00 AM torikoshia  
wrote:

Updated the patch.


Hi, torikoshi-san

Thanks for your great work! I'd like to use this feature in v15.
I confirmed that it works with queries I tried and make check-world has 
no error.


When I tried this feature, I realized two things. So, I share them.

(1) About output contents

The format of the query plan is the same as when FORMAT 
TEXT
and VEBOSE are used in the 
EXPLAIN command.

For example:


I think the above needs to add COSTS and SETTINGS options too, and it's 
better to use an

example which the SETTINGS option works like the following.

```
2021-07-13 21:59:56 JST 69757 [client backend] LOG:  plan of the query 
running on backend with PID 69757 is:
Query Text: PREPARE query2 AS SELECT COUNT(*) FROM 
pgbench_accounts t1, pgbench_accounts t2;

Aggregate  (cost=3750027242.84..3750027242.86 rows=1 width=8)
  Output: count(*)
  ->  Nested Loop  (cost=0.84..3125027242.84 rows=2500 
width=0)
->  Index Only Scan using pgbench_accounts_pkey on 
public.pgbench_accounts t1  (cost=0.42..12996.42 rows=50 width=0)

  Output: t1.aid
->  Materialize  (cost=0.42..15496.42 rows=50 
width=0)
  ->  Index Only Scan using pgbench_accounts_pkey on 
public.pgbench_accounts t2  (cost=0.42..12996.42 rows=50 width=0)

Settings: effective_cache_size = '8GB', work_mem = '16MB'
```

(2) About EXPLAIN "BUFFER" option

When I checked EXPLAIN option, I found there is another option "BUFFER" 
which can be

used without the "ANALYZE" option.

I'm not sure it's useful because your target use-case is analyzing a 
long-running query,
not its planning phase. If so, the planning buffer usage is not so much 
useful. But, since
the overhead to output buffer usages is not high and it's used for 
debugging use cases,

I wonder it's not a bad idea to output buffer usages too. Thought?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: unnesting multirange data types

2021-07-13 Thread Justin Pryzby

On Tue, Jul 13, 2021 at 03:11:16PM +0300, Alexander Korotkov wrote:
> On Sun, Jul 11, 2021 at 1:20 AM Justin Pryzby  wrote:
> > On Sun, Jul 11, 2021 at 01:00:27AM +0300, Alexander Korotkov wrote:
> > > On Sat, Jul 10, 2021 at 7:34 PM Alvaro Herrera  
> > > wrote:
> > > > On 2021-Jun-27, Alexander Korotkov wrote:
> > > >
> > > > > BTW, I found some small inconsistencies in the declaration of
> > > > > multirange operators in the system catalog.  Nothing critical, but if
> > > > > we decide to bump catversion in beta3, this patch is also nice to
> > > > > push.
> > > >
> > > > Hmm, I think you should push this and not bump catversion.  That way,
> > > > nobody is forced to initdb if we end up not having a catversion bump for
> > > > some other reason; but also anybody who initdb's with beta3 or later
> > > > will get the correct descriptions.
> > > >
> > > > If you don't push it, everybody will have the wrong descriptions.
> > >
> > > True, but I'm a bit uncomfortable about user instances with different
> > > catalogs but the same catversions.  On the other hand, initdb's with
> > > beta3 or later will be the vast majority among pg14 instances.
> > >
> > > Did we have similar precedents in the past?
> >
> > It seems so.
> >
> > Note in particular 74ab96a45, which adds a new function with no bump.
> > Although that one may not be a good precedent to follow, or one that's been
> > followed recently.
> 
> Justin, thank you very much for the summary.
> 
> Given we have similar precedents in the past, I'm going to push the
> patch [1] to master and pg14 if no objections.

To be clear, do you mean with or without this hunk ?

-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },

-- 
Justin

Re: Bogus HAVE_DECL_FOO entries in msvc/Solution.pm

Michael Paquier  writes:
> On Tue, Jul 13, 2021 at 12:25:06AM -0400, Tom Lane wrote:
>> That's easy enough in v13 and up, which have 8f4fb4c64 so that
>> Solution.pm looks like this.  We could make it consistent in older
>> branches by manually hacking pg_config.h.win32 ... but I'm wondering
>> if the smarter plan wouldn't be to back-patch 8f4fb4c64.

> ... I am not sure that the potential maintenance gain is worth
> poking at the stable branches, to be honest.

Fair enough.  I wasn't very eager to do the legwork on that, either,
given that the issue is (so far) only cosmetic.

regards, tom lane

Re: Replace remaining castNode(…, lfirst(…)) and friends calls with l*_node()

2021-07-13 Thread Peter Eisentraut




On 08.07.21 20:17, Alvaro Herrera wrote:

diff --git a/src/backend/rewrite/rewriteSearchCycle.c 
b/src/backend/rewrite/rewriteSearchCycle.c
index 599fe8e735..c50ebdba24 100644
--- a/src/backend/rewrite/rewriteSearchCycle.c
+++ b/src/backend/rewrite/rewriteSearchCycle.c
@@ -307,8 +307,8 @@ rewriteSearchAndCycle(CommonTableExpr *cte)
  list_nth_oid(cte->ctecolcollations, 
i),
  0);
tle = makeTargetEntry((Expr *) var, i + 1, 
strVal(list_nth(cte->ctecolnames, i)), false);
-   tle->resorigtbl = castNode(TargetEntry, 
list_nth(rte1->subquery->targetList, i))->resorigtbl;
-   tle->resorigcol = castNode(TargetEntry, 
list_nth(rte1->subquery->targetList, i))->resorigcol;
+   tle->resorigtbl = list_nth_node(TargetEntry, 
rte1->subquery->targetList, i)->resorigtbl;
+   tle->resorigcol = list_nth_node(TargetEntry, 
rte1->subquery->targetList, i)->resorigcol;

This seems a bit surprising to me.  I mean, clearly we trust our List
implementation to be so good that we can just fetch the same node twice,
one for each member of the same struct, and the compiler will optimize
everything so that it's a single access to the n'th list entry.  Is this
true?  I would have expected there to be a single fetch of the struct,
followed by an access of each of the two struct members.


Lists are arrays now internally, so accessing an element by number is 
pretty cheap.

Re: proposal: possibility to read dumped table's name from file

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Alvaro Herrera  writes:
> > [1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed
> > plus lines starting with # are comments, seems plenty.  Any line not
> > following that format would cause an error to be thrown.
> 
> I'd like to see some kind of keyword on each line, so that we could extend
> the command set by adding new keywords.  As this stands, I fear we'd end
> up using random punctuation characters in place of [+-], which seems
> pretty horrid from a readability standpoint.

I agree that it'd end up being bad with single characters.

> I think that this file format should be designed with an eye to allowing
> every, or at least most, pg_dump options to be written in the file rather
> than on the command line.  I don't say we have to *implement* that right
> now; but if the format spec is incapable of being extended to meet
> requests like that one, I think we'll regret it.  This line of thought
> suggests that the initial commands ought to match the existing
> include/exclude switches, at least approximately.

I agree that we want to have an actual config file that allows just
about every pg_dump option.  I'm also fine with saying that we don't
have to implement that initially but the format should be one which can
be extended to allow that.

> Hence I suggest
> 
>   include table PATTERN
>   exclude table PATTERN
> 
> which ends up being the above but with words not [+-].

Which ends up inventing yet-another-file-format which people will end up
writing generators and parsers for.  Which is exactly what I was arguing
we really should be trying to avoid doing.

I definitely feel that we should have a way to allow anything that can
be created as an object in the database to be explicitly included in the
file and that means whatever we do need to be able to handle objects
that have names that span multiple lines, etc.  It's not clear how the
above would.  As I recall, the proposed patch didn't have anything for
handling that, which was one of the issues I had with it and is why I
bring it up again.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [PATCH] improve the pg_upgrade error message

2021-07-13 Thread Jeevan Ladhe

> The admin might fix DB123, restart their upgrade procedure, spend 5 or 15
> minutes with that, only to have it then fail on DB1234.
>

Agree with this observation.

Here is a patch that writes the list of all the databases other than
template0
that are having their pg_database.datallowconn to false in a file. Similar
approach is seen in other functions like check_for_data_types_usage(),
check_for_data_types_usage() etc. Thanks Suraj Kharage for the offline
suggestion.

PFA patch.

For experiment, here is how it turns out after the fix.

postgres=# update pg_database set datallowconn='false' where datname in
('mydb', 'mydb1', 'mydb2');
UPDATE 3

$ pg_upgrade -d /tmp/v96/data -D /tmp/v13/data -b $HOME/v96/install/bin -B
$HOME/v13/install/bin
Performing Consistency Checks
-
Checking cluster versions   ok
Checking database user is the install user  ok
Checking database connection settings   fatal

All non-template0 databases must allow connections, i.e. their
pg_database.datallowconn must be true. Your installation contains
non-template0 databases with their pg_database.datallowconn set to
false. Consider allowing connection for all non-template0 databases
using:
UPDATE pg_catalog.pg_database SET datallowconn='true' WHERE datname NOT
LIKE 'template0';
A list of databases with the problem is given in the file:
databases_with_datallowconn_false.txt

Failure, exiting

$ cat databases_with_datallowconn_false.txt
mydb
mydb1
mydb2


Regards,
Jeevan Ladhe


v2-0001-Improve-the-pg_upgrade-error-message.patch
Description: Binary data

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Em ter., 13 de jul. de 2021 às 09:24, David Rowley 
escreveu:

> On Wed, 14 Jul 2021 at 00:06, Ranier Vilela  wrote:
> >
> > Em ter., 13 de jul. de 2021 às 04:19, Ronan Dunklau <
> ronan.dunk...@aiven.io> escreveu:
> >> I would be
> >> surprised the check adds that much to the whole execution though.
> >
> > I think this branch is a misprediction.
>
> It could be.  I wondered that myself when I saw Ronan's results were
> better than mine for 2,4 and 7.  However, I think Ronan had quite a
> bit of noise in his results as there's no reason for the speedup in
> tests 2,4 and 7.


> > In most cases is it not datumSort?
>
> who knows.  Maybe someone's workload always requires the datum sort.
>
> > That's why I would like to use unlikely.
>
> We really only use unlikely() in cases where we want to move code out
> of line to a cold area because it's really never executed under normal
> circumstances. We tend to do that for ERROR cases as we don't ever
> really want to optimise for errors. We also sometimes do it when some
> function has a branch to initialise something during the first call.
> The case in question here does not fit for either of those two cases.
>
Hum, I understand the usage cases now.
Thanks for the hint.


>
> > IMO all the tests should all be to verify past behavior first.
>
> I'm not quire sure what you mean there.
>
I'm saying we could help the branch by keeping the same testing logic as
before and not reversing it.
Attached is a version to demonstrate this, I don't pretend to be v7.

I couldn't find a good phrase to the contrary:
"are we *not* using the single value optimization ?"

I don't have time to take the tests right now.

regards,
Ranier Vilela


allow_single_datum_sort.patch
Description: Binary data

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-13 Thread David Rowley

On Wed, 14 Jul 2021 at 00:06, Ranier Vilela  wrote:
>
> Em ter., 13 de jul. de 2021 às 04:19, Ronan Dunklau  
> escreveu:
>> I would be
>> surprised the check adds that much to the whole execution though.
>
> I think this branch is a misprediction.

It could be.  I wondered that myself when I saw Ronan's results were
better than mine for 2,4 and 7.  However, I think Ronan had quite a
bit of noise in his results as there's no reason for the speedup in
tests 2,4 and 7.

> In most cases is it not datumSort?

who knows.  Maybe someone's workload always requires the datum sort.

> That's why I would like to use unlikely.

We really only use unlikely() in cases where we want to move code out
of line to a cold area because it's really never executed under normal
circumstances. We tend to do that for ERROR cases as we don't ever
really want to optimise for errors. We also sometimes do it when some
function has a branch to initialise something during the first call.
The case in question here does not fit for either of those two cases.

> IMO all the tests should all be to verify past behavior first.

I'm not quire sure what you mean there.

David

Re: unnesting multirange data types

2021-07-13 Thread Alexander Korotkov

On Sun, Jul 11, 2021 at 1:20 AM Justin Pryzby  wrote:
> On Sun, Jul 11, 2021 at 01:00:27AM +0300, Alexander Korotkov wrote:
> > On Sat, Jul 10, 2021 at 7:34 PM Alvaro Herrera  
> > wrote:
> > > On 2021-Jun-27, Alexander Korotkov wrote:
> > >
> > > > BTW, I found some small inconsistencies in the declaration of
> > > > multirange operators in the system catalog.  Nothing critical, but if
> > > > we decide to bump catversion in beta3, this patch is also nice to
> > > > push.
> > >
> > > Hmm, I think you should push this and not bump catversion.  That way,
> > > nobody is forced to initdb if we end up not having a catversion bump for
> > > some other reason; but also anybody who initdb's with beta3 or later
> > > will get the correct descriptions.
> > >
> > > If you don't push it, everybody will have the wrong descriptions.
> >
> > True, but I'm a bit uncomfortable about user instances with different
> > catalogs but the same catversions.  On the other hand, initdb's with
> > beta3 or later will be the vast majority among pg14 instances.
> >
> > Did we have similar precedents in the past?
>
> It seems so.
>
> Note in particular 74ab96a45, which adds a new function with no bump.
> Although that one may not be a good precedent to follow, or one that's been
> followed recently.

Justin, thank you very much for the summary.

Given we have similar precedents in the past, I'm going to push the
patch [1] to master and pg14 if no objections.

Links
1. 
https://www.postgresql.org/message-id/CAPpHfdv9OZEuZDqOQoUKpXhq%3Dmc-qa4gKCPmcgG5Vvesu7%3Ds1w%40mail.gmail.com

--
Regards,
Alexander Korotkov

Re: a misbehavior of partition row movement (?)

2021-07-13 Thread Amit Langote

Hi Ibrar, Sawada-san,

On Tue, Jul 13, 2021 at 20:25 Ibrar Ahmed  wrote:

>
>
> On Fri, Apr 2, 2021 at 6:09 PM Amit Langote 
> wrote:
>
>> On Thu, Apr 1, 2021 at 10:56 AM Masahiko Sawada 
>> wrote:
>> > On Tue, Mar 23, 2021 at 6:27 PM Amit Langote 
>> wrote:
>> > > Actually, I found a big hole in my assumptions around deferrable
>> > > foreign constraints, invalidating the approach I took in 0002 to use a
>> > > query-lifetime tuplestore to record root parent tuples.  I'm trying to
>> > > find a way to make the tuplestore transaction-lifetime so that the
>> > > patch still works.
>> > >
>> > > In the meantime, I'm attaching an updated set with 0001 changed per
>> > > your comments.
>> >
>> > 0001 patch conflicts with 71f4c8c6f74. Could you please rebase the
>> patchset?
>>
>> Thanks for the heads up.
>>
>> I still don't have a working patch to address the above mentioned
>> shortcoming of the previous approach, but here is a rebased version in
>> the meantime.
>>
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com
>>
>
>
> @Amit patch is not successfully applying, can you please rebase that.
>

Thanks for the reminder.

Masahiko Sawada, it's been a bit long since you reviewed the patch, are you
> still interested to review that?
>

Unfortunately, I don’t think I’ll have time in this CF to solve some very
fundamental issues I found in the patch during the last cycle.  I’m fine
with either marking this as RwF for now or move to the next CF.

> --
Amit Langote
EDB: http://www.enterprisedb.com

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Em ter., 13 de jul. de 2021 às 04:19, Ronan Dunklau 
escreveu:

> > I've now pushed that bug fix so it's fine to remove the change to
>
>

> I would be
> surprised the check adds that much to the whole execution though.
>
I think this branch is a misprediction.
In most cases is it not datumSort?
That's why I would like to use unlikely.

IMO all the tests should all be to verify past behavior first.

regards,
Ranier Vilela

Re: psql \copy from sends a lot of packets

2021-07-13 Thread Aleksander Alekseev

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

The patch was marked as the one that needs review and doesn't currently have
a reviewer, so I decided to take a look. The patch was tested on MacOS against
master `e0271d5f`. It works fine and doesn't seem to contradict the current
documentation.

The future COPY TO patch may require some changes in the docs, as Tom pointed
out. I also wonder if it may affect any 3rd party applications and if we care
about this, but I suggest we discuss this when and if a corresponding patch
will be proposed.

The new status of this patch is: Ready for Committer

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

Em ter., 13 de jul. de 2021 às 01:44, David Rowley 
escreveu:

> Thanks for having a look at this.
>
> On Tue, 13 Jul 2021 at 11:04, Ranier Vilela  wrote:
> >> 0001 Adds planner support for ORDER BY aggregates.
> >
> > /* Normal transition function without ORDER BY / DISTINCT. */
> > Is it possible to avoid entering to initialize args if 'argno >=
> pertrans->numTransInputs'?
> > Like this:
> >
> > if (!pertrans->aggsortrequired && argno < pertrans->numTransInputs)
> >
> > And if argos is '>' that pertrans->numTransInputs?
> > The test shouldn't be, inside the loop?
> >
> > + /*
> > + * Don't initialize args for any ORDER BY clause that might
> > + * exist in a presorted aggregate.
> > + */
> > + if (argno >= pertrans->numTransInputs)
> > + break;
>
> The idea is to stop the loop before processing any Aggref arguments
> that might belong to the ORDER BY clause.

Yes, I get the idea.

We must still process other
> arguments up to the ORDER BY args though,

I may have misunderstood, but the other arguments are under
pertrans->numTransInputs?


> so we can't skip this loop.
>
The question not answered is if *argno* can '>=' that
pertrans->numTransInputs,
before entering the loop?
If *can*, the loop might be useless in that case.


>
> Note that we're doing argno++ inside the loop.

Another question is, if *argno* can '>' that pertrans->numTransInputs,
before the loop, the test will fail?
if (argno == pertrans->numTransInputs)


>
> > I think that or can reduce the scope of variable 'sortlist' or simply
> remove it?
>
> I've adjusted the scope of this.  I didn't want to remove it because
> it's kinda useful to have it that way otherwise the 0002 patch would
> need to add it.
>
Nice.


> >> 0002 is a WIP patch for DISTINCT support.  This still lacks JIT
> >> support and I'm still not certain of the best where to store the
> >> previous value or tuple to determine if the current one is distinct
> >> from it.
> >
> > In the patch 0002, I think that can reduce the scope of variable
> 'aggstate'?
> >
> > + EEO_CASE(EEOP_AGG_PRESORTED_DISTINCT_SINGLE)
>
> Yeah, that could be done.
>
> I've attached the updated patches.
>
Thanks.

regards,
Ranier Vilela

Re: POC: GROUP BY optimization

On Wed, Mar 10, 2021 at 4:05 AM Tomas Vondra 
wrote:

> Hi,
>
> I take a look at the patch today. Attached is the v12 and a separate
> patch with some comment tweaks and review comments.
>
>
> 1) I see the patch results in some plan changes in postgres_fdw. I
> assume it's somehow related to the sort costing changes, but I find it a
> bit suspicious. Why should the plan change from this:
>
>   Foreign Scan
> Remote SQL: ... ORDER BY ... OFFSET 100 LIMIT 10;
>
> to
>
>   Limit
> Foreign Scan
>   Remote SQL: ... ORDER BY ...
>
> But even if this is due to changes to order by costing, postponing the
> limit seems like a net loss - the sort happens before the limit, and by
> postponing the limit after foreign scan we need to transfer 100 more
> rows. So what's causing this?
>
> The difference in plan cost seems fairly substantial (old: 196.52, new:
> 241.94). But maybe that's OK and expected.
>
>
> 2) I wonder how much more expensive (in terms of CPU) is the new sort
> costing code. It's iterative, and it's calling functions to calculate
> number of groups etc. I wonder what's the impact simple queries?
>
>
> 3) It's not clear to me why we need "fake var" protection? Why would the
> estimate_num_groups() fail for that?
>
>
> 4) In one of the places a comment says DEFAULT_NUM_DISTINCT is not used
> because we're dealing with multiple columns. That makes sense, but maybe
> we should use DEFAULT_NUM_DISTINCT at least to limit the range. That is,
> with N columns we should restrict the nGroups estimate by:
>
> min = DEFAULT_NUM_DISTINCT
> max = Min(pow(DEFAULT_NUM_DISTINCT, ncolumns), tuples)
>
> Also, it's not clear what's the logic behind the formula:
>
> nGroups = ceil(2.0 + sqrt(tuples) *
>list_length(pathkeyExprs) / list_length(pathkeys));
>
> A comment explaining that would be helpful.
>
>
> 5) The compute_cpu_sort_cost comment includes two references (in a quite
> mixed-up way), and then there's another reference in the code. I suggest
> we list all of them in the function comment.
>
>
> 6) There's a bunch of magic values (various multipliers etc.). It's not
> clear which values are meant to be equal, etc. Let's introduce nicer
> constants with reasonable names.
>
>
> 7) The comment at get_cheapest_group_keys_order is a bit misleading,
> because it talks about "uniqueness" - but that's not what we're looking
> at here (I mean, is the column unique or not). We're looking at number
> of distinct values in the column, which is a different thing. Also, it'd
> be good to roughly explain what get_cheapest_group_keys_order does - how
> it calculates the order (permutations up to 4 values, etc.).
>
>
> 8) The comment at compute_cpu_sort_cost should also explain basics of
> the algorithm. I tried doing something like that, but I'm not sure I got
> all the details right and it probably needs further improvements.
>
>
> 9) The main concern I have is still about the changes in planner.c, and
> I think I've already shared it before. The code takes the grouping
> pathkeys, and just reshuffles them to what it believes is a better order
> (cheaper, ...). That is, there's on input pathkey list, and it gets
> transformed into another pathkey list. The problem is that even if this
> optimizes the sort, it may work against some optimization in the upper
> part of the plan.
>
> Imagine a query that does something like this:
>
>SELECT a, b, count(*) FROM (
>   SELECT DISTINCT a, b, c, d FROM x
>) GROUP BY a, b;
>
> Now, from the costing perspective, it may be cheaper to do the inner
> grouping by "c, d, a, b" for example. But that works directly against
> the second grouping, which would benefit from having the input sorted by
> "a, b". How exactly would this behaves depends on the number of distinct
> values in the columns, how expensive the comparisons are for each
> column, and so on. But you get the idea.
>
> I admit I haven't managed to construct a reasonably query that'd be
> significantly harmed by this, but I haven't spent a lot of time on it.
>
> I'm pretty sure I used this trick in the past, when doing aggregations
> on large data sets, where it was much better to "pipe" the data through
> multiple GroupAggregate nodes.
>
> For this reason I think the code generating paths should work a bit like
> get_useful_pathkeys_for_relation and generate_useful_gather_paths.
>
> * group_keys_reorder_by_pathkeys should not produce just one list of
>   pathkeys, but multiple interesting options. At the moment it should
>   produce at least the input grouping pathkeys (as specified in the
>   query), and the "optimal" pathkeys (by lowest cost).
>
> * the places calling group_keys_reorder_by_pathkeys should loop on the
>   result, and generate separate path for each option.
>
> I'd guess in the future we'll "peek forward" in the plan and see if
> there are other interesting pathkeys (that's the expectation for
> get_useful_pathkeys_for_relation).
>
>
> regards
>
> --
> Tomas Vondra
>

Re: a misbehavior of partition row movement (?)

On Fri, Apr 2, 2021 at 6:09 PM Amit Langote  wrote:

> On Thu, Apr 1, 2021 at 10:56 AM Masahiko Sawada 
> wrote:
> > On Tue, Mar 23, 2021 at 6:27 PM Amit Langote 
> wrote:
> > > Actually, I found a big hole in my assumptions around deferrable
> > > foreign constraints, invalidating the approach I took in 0002 to use a
> > > query-lifetime tuplestore to record root parent tuples.  I'm trying to
> > > find a way to make the tuplestore transaction-lifetime so that the
> > > patch still works.
> > >
> > > In the meantime, I'm attaching an updated set with 0001 changed per
> > > your comments.
> >
> > 0001 patch conflicts with 71f4c8c6f74. Could you please rebase the
> patchset?
>
> Thanks for the heads up.
>
> I still don't have a working patch to address the above mentioned
> shortcoming of the previous approach, but here is a rebased version in
> the meantime.
>
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>


@Amit patch is not successfully applying, can you please rebase that.

Masahiko Sawada, it's been a bit long since you reviewed the patch, are you
still interested to review that?

-- 
Ibrar Ahmed

Re: Transactions involving multiple postgres foreign servers, take 2

Em ter., 13 de jul. de 2021 às 01:14, r.takahash...@fujitsu.com <
r.takahash...@fujitsu.com> escreveu:

> Hi Sawada-san,
>
>
> Thank you for your reply.
>
> > Not sure but it might be possible to keep holding an xlogreader for
> > reading PREPARE WAL records even after the transaction commit. But I
> > wonder how much open() for wal segment file accounts for the total
> > execution time of 2PC. 2PC requires 2 network round trips for each
> > participant. For example, if it took 500ms in total, we would not get
> > benefits much from the point of view of 2PC performance even if we
> > improved it from 14ms to 1ms.
>
> I made the patch based on your advice and re-run the test on the new
> machine.
> (The attached patch is just for test purpose.)
>
Wouldn't it be better to explicitly initialize the pointer with NULL?
I think it's common in Postgres.

static XLogReaderState *xlogreader = NULL;


>
> * foreign_twophase_commit = disabled
> 2686tps
>
> * foreign_twophase_commit = required (It is necessary to set -R ${RATE} as
> Ikeda-san said)
> 311tps
>
> * foreign_twophase_commit = required with attached patch (It is not
> necessary to set -R ${RATE})
> 2057tps
>
Nice results.

regards,
Ranier Vilela

Re: Introduce pg_receivewal gzip compression tests

2021-07-13 Thread gkokolatos




‐‐‐ Original Message ‐‐‐

On Tuesday, July 13th, 2021 at 12:26, Michael Paquier  
wrote:

> On Tue, Jul 13, 2021 at 08:28:44AM +, gkokola...@pm.me wrote:
> > Sounds great. Let me cook up v6 for this.
>
> Thanks. Could you use v5 I posted upthread as a base? There were
> some improvements in the variable names, the comments and the test
> descriptions.

Agreed. For the record that is why I said v6 :)

Cheers,
//Georgios

> -
> Michael

Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

On Wed, Mar 3, 2021 at 1:42 PM Neil Chen 
wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:tested, passed
>
> Greetings,
> I learned about the patch and read your discussions. I'm not sure why this
> patch has not been discussed now. In short, I think it's beneficial to
> submit it as a temporary solution.
> Another thing I want to know is whether these codes can be simplified:
> -   if (state > outer_cxt->state)
> +   if (collation == outer_cxt->collation &&
> +   ((state == FDW_COLLATE_UNSAFE &&
> + outer_cxt->state == FDW_COLLATE_SAFE) ||
> +(state == FDW_COLLATE_SAFE &&
> + outer_cxt->state == FDW_COLLATE_UNSAFE)))
> +   {
> +   outer_cxt->state = FDW_COLLATE_SAFE;
> +   }
> +   else if (state > outer_cxt->state)
>
> If the state is determined by the collation, when the collations are
> equal, do we just need to judge the state not equal to FDW_COLLATE_NONE?

The patch is failing the regression, @Tom Lane  can you
please take a look at that.

https://cirrus-ci.com/task/4593497492684800

== running regression test queries ==
test postgres_fdw ... FAILED 2782 ms
== shutting down postmaster ==
==
1 of 1 tests failed.
==
The differences that caused some tests to fail can be viewed in the
file "/tmp/cirrus-ci-build/contrib/postgres_fdw/regression.diffs". A copy
of the test summary that you see
above is saved in the file
"/tmp/cirrus-ci-build/contrib/postgres_fdw/regression.out".

-- 
Ibrar Ahmed

Re: row filtering for logical replication

On Tue, Jul 13, 2021 at 10:24 AM Amit Kapila  wrote:
>
> On Mon, Jul 12, 2021 at 3:01 PM Tomas Vondra
>  wrote:
>
> > In terms of implementation, I think there are two basic options - either
> > we can define a new "expression" type in gram.y, which would be a subset
> > of a_expr etc. Or we can do it as some sort of expression walker, kinda
> > like what the transform* functions do now.
> >
>
> I think it is better to use some form of walker here rather than
> extending the grammar for this. However, the question is do we need
> some special kind of expression walker here or can we handle all
> required cases via transformWhereClause() call as the patch is trying
> to do. AFAIU, the main things we want to prohibit in the filter are:
> (a) it doesn't refer to any relation other than catalog in where
> clause, (b) it doesn't use UDFs in any way (in expressions, in
> user-defined operators, user-defined types, etc.), (c) the columns
> referred to in the filter should be part of PK or Replica Identity.
> Now, if all such things can be detected by the approach patch has
> taken then why do we need a special kind of expression walker? OTOH,
> if we can't detect some of this then probably we can use a special
> walker.
>
> I think in the long run one idea to allow UDFs is probably by
> explicitly allowing users to specify whether the function is
> publication predicate safe and if so, then we can allow such functions
> in the filter clause.
>

Another idea here could be to read the publication-related catalog
with the latest snapshot instead of a historic snapshot. If we do that
then if the user faces problems as described by Petr [1] due to
missing dependencies via UDFs then she can Alter the Publication to
remove/change the filter clause and after that, we would be able to
recognize the updated filter clause and the system will be able to
move forward.

I might be missing something but reading publication catalogs with
non-historic snapshots shouldn't create problems as we use the
historic snapshots are required to decode WAL.

I think the problem described by Petr[1] is also possible today if the
user drops the publication and there is a corresponding subscription,
basically, the system will stuck with error: "ERROR:  publication
"mypub" does not exist. I think allowing to use non-historic snapshots
just for publications will resolve that problem as well.

[1] - 
https://www.postgresql.org/message-id/92e5587d-28b8-5849-2374-5ca3863256f1%402ndquadrant.com

-- 
With Regards,
Amit Kapila.

Re: Enhanced error message to include hint messages for redundant options error

2021-07-13 Thread Dean Rasheed

On Mon, 12 Jul 2021 at 17:39, vignesh C  wrote:
>
> Thanks for your comments, I have made the changes for the same in the
> V10 patch attached.
> Thoughts?
>

I'm still not happy about changing so many error messages.

Some of the changes might be OK, but aren't strictly necessary. For example:

 COPY x from stdin (force_not_null (a), force_not_null (b));
-ERROR:  conflicting or redundant options
+ERROR:  option "force_not_null" specified more than once
 LINE 1: COPY x from stdin (force_not_null (a), force_not_null (b));
^

I actually prefer the original primary error message, for consistency
with other similar cases, and I think the error position indicator is
sufficient to identify the problem. If it were to include the
"specified more than once" text, I would put that in DETAIL.

Other changes are wrong though. For example:

 COPY x from stdin (format CSV, FORMAT CSV);
-ERROR:  conflicting or redundant options
+ERROR:  redundant options specified
 LINE 1: COPY x from stdin (format CSV, FORMAT CSV);
^

The problem here is that the code that throws this error throws the
same error if the second format is different, which would make it a
conflicting option, not a redundant one. And I don't think we should
add more code to test whether it's conflicting or redundant, so again,
I think we should just keep the original error message.

Similarly, this error is wrong:

CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$ SELECT 1 $$ STABLE IMMUTABLE;
ERROR:  redundant options specified
LINE 1: ...NCTION foo() RETURNS int AS $$ SELECT 1 $$ STABLE IMMUTABLE;
 ^

And even this error:

CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$ SELECT 1 $$ STRICT STRICT;
ERROR:  redundant options specified
LINE 1: ... FUNCTION foo() RETURNS int AS $$ SELECT 1 $$ STRICT STRICT;
^

which looks OK, is actually problematic because the same code also
handles the alternate syntax, which leads to this (which is now wrong
because it's conflicting not redundant):

CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$ SELECT 1 $$ STRICT
CALLED ON NULL INPUT;
ERROR:  redundant options specified
LINE 1: ...NCTION foo() RETURNS int AS $$ SELECT 1 $$ STRICT CALLED ON ...
 ^

The problem is it's actually quite hard to decide in each case whether
the option is redundant or conflicting. Sometimes, it might look
obvious in the code, but actually be much more subtle, due to an
earlier transformation of the grammar. Likewise redundant doesn't
necessarily mean literally specified more than once.

Also, most of these don't have regression test cases, and I'm very
reluctant to change them without proper testing, and that would make
the patch much bigger. To me, this patch is already attempting to
change too much in one go, which is causing problems.

So I suggest a more incremental approach, starting by keeping the
original error message, but improving it where possible with the error
position. Then maybe move on to look at specific cases that can be
further improved with additional detail (keeping the same primary
error message, for consistency).

Here is an updated version, following that approach. It does the following:

1). Keeps the same primary error message ("conflicting or redundant
options") in all cases.

2). Uses errorConflictingDefElem() to throw it, to ensure consistency
and reduce the executable size.

3). Includes your enhancements to make the ParseState available in
more places, so that the error position indicator is shown to indicate
the cause of the error.

IMO, this makes for a much safer incremental change, that is more committable.

As it turns out, there are 110 cases of this error that now use
errorConflictingDefElem(), and of those, just 10 (in 3 functions)
don't have a ParseState readily available to them:

- ATExecSetIdentity()
- parse_output_parameters() x5
- parseCreateReplSlotOptions() x4

It might be possible to improve those (and possibly some of the others
too) by adding some appropriate DETAIL to the error, but as I said, I
suggest doing that in a separate follow-on patch, and only with
careful analysis and testing of each case.

As it stands, the improvements from (3) seem quite worthwhile. Also,
the patch saves a couple of hundred lines of code, and for me an
optimised executable is around 30 kB smaller, which is more than I
expected.

Thoughts?

Regards,
Dean
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
new file mode 100644
index 5339241..89792b1
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -59,6 +59,7 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/event_trigger.h"
 #include

Re: Remove repeated calls to PQserverVersion

On Tue, Jul 13, 2021 at 07:02:27PM +1000, Peter Smith wrote:
> I found a few functions making unnecessary repeated calls to
> PQserverVersion(conn); instead of just calling once and assigning to a
> local variable.

Does it really matter?  PQserverVersion() does a simple lookup at the
internals of PGconn.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Pull general SASL framework out of SCRAM

On Tue, Jul 13, 2021 at 12:41:27PM +0300, Mikhail Kulagin wrote:
> I got an error while building one of the extensions.
> /home/mkulagin/pg-install/postgresql-master/include/internal/libpq-int.h:44:10:
>  fatal error: fe-auth-sasl.h: No such file or directory
>   #include "fe-auth-sasl.h"
>   ^~~~

Right.  I overlooked the fact that libpq-int.h is installed.

> I think the new fe-auth-sasl.h file should be installed too.
> Correction proposal in the attached file (but I'm not sure that fix
> of Install.pm is correct). 

That looks correct to me.  I'll check that tomorrow.
--
Michael


signature.asc
Description: PGP signature

Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

On Tue, Mar 9, 2021 at 9:01 PM David Steele  wrote:

> On 3/9/21 10:08 AM, David G. Johnston wrote:
> >
> > On Tuesday, March 9, 2021, David Steele  > > wrote:
> >
> > Further, I think we should close this entry at the end of the CF if
> > it does not attract committer interest. Tom is not in favor of the
> > patch and it appears Alexander decided not to commit it.
> >
> > Pavel re-reviewed it and was fine with ready-to-commit so that status
> > seems fine.
>
> Ah yes, that was my mistake.
>
> Regards,
> --
> -David
> da...@pgmasters.net
>
>
>
The status of the patch is "Need Review" which was previously "Ready for
Committer ''. After @David G
and @David Steele  comments, it's not clear whether it
should be "Read for commit" or "Need Review".

-- 
Ibrar Ahmed

Re: Introduce pg_receivewal gzip compression tests