date:20210827

Re: Estimating HugePages Requirements?

2021-08-27 Thread Justin Pryzby

On Sat, Aug 28, 2021 at 11:00:11AM +0900, Michael Paquier wrote:
> On Fri, Aug 27, 2021 at 08:16:40PM +, Bossart, Nathan wrote:
> > On 8/27/21, 12:39 PM, "Andres Freund"  wrote:
> >> One thing I wonder is if this wouldn't better be dealt with in a more 
> >> generic
> >> way. While this is the most problematic runtime computed GUC, it's not the
> >> only one. What if we introduced a new shared_memory_size GUC, and made
> >> --describe-config output it? Perhaps adding --describe-config=guc-name?
> >>
> >> I also wonder if we should output the number of hugepages needed instead of
> >> the "raw" bytes of shared memory. The whole business about figuring out the
> >> huge page size, dividing the shared memory size by that and then rounding 
> >> up
> >> could be removed in that case. Due to huge_page_size it's not even 
> >> immediately
> >> obvious which huge page size one should use...
> > 
> > I like both of these ideas.
> 
> That pretty much looks like -C in concept, isn't it?  Except that you
> cannot get the actual total shared memory value because we'd do this
> operation before loading shared_preload_libraries and miss any amount
> asked by extensions.  There is a problem similar when attempting to do
> postgres -C data_checksums, for example, which would output an
> incorrect value even if the cluster has data checksums enabled.

Since we don't want to try to allocate the huge pages, and we also don't want
to compute based on shared_buffers alone, did anyone consider if pg_controldata
is the right place to put this ?

It includes a lot of related stuff:

max_connections setting:  100
max_worker_processes setting: 8
 - (added in 2013: 6bc8ef0b7f1f1df3998745a66e1790e27424aa0c)
max_wal_senders setting:  10
max_prepared_xacts setting:   2
max_locks_per_xact setting:   64

I'm not sure if there's any reason these aren't also shown (?)
autovacuum_max_workers - added in 2007: e2a186b03
max_predicate_locks_per_xact - added in 2011: dafaa3efb
max_logical_replication_workers
max_replication_slots

-- 
Justin

Re: Remove Value node struct

2021-08-27 Thread Julien Rouhaud

On Wed, Aug 25, 2021 at 9:49 PM Robert Haas  wrote:
>
> On Wed, Aug 25, 2021 at 9:20 AM Peter Eisentraut
>  wrote:
> > This change removes the Value struct and node type and replaces them
> > by separate Integer, Float, String, and BitString node types that are
> > proper node types and structs of their own and behave mostly like
> > normal node types.
>
> +1. I noticed this years ago and never thought of doing anything about
> it. I'm glad you did think of it...

+1, it also bothered me in the past.

Re: Estimating HugePages Requirements?

2021-08-27 Thread Michael Paquier

On Fri, Aug 27, 2021 at 08:16:40PM +, Bossart, Nathan wrote:
> On 8/27/21, 12:39 PM, "Andres Freund"  wrote:
>> One thing I wonder is if this wouldn't better be dealt with in a more generic
>> way. While this is the most problematic runtime computed GUC, it's not the
>> only one. What if we introduced a new shared_memory_size GUC, and made
>> --describe-config output it? Perhaps adding --describe-config=guc-name?
>>
>> I also wonder if we should output the number of hugepages needed instead of
>> the "raw" bytes of shared memory. The whole business about figuring out the
>> huge page size, dividing the shared memory size by that and then rounding up
>> could be removed in that case. Due to huge_page_size it's not even 
>> immediately
>> obvious which huge page size one should use...
> 
> I like both of these ideas.

That pretty much looks like -C in concept, isn't it?  Except that you
cannot get the actual total shared memory value because we'd do this
operation before loading shared_preload_libraries and miss any amount
asked by extensions.  There is a problem similar when attempting to do
postgres -C data_checksums, for example, which would output an
incorrect value even if the cluster has data checksums enabled.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-27 Thread Julien Rouhaud

On Sat, Aug 28, 2021 at 9:40 AM Michael Paquier  wrote:
>
> On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote:
> > Isn't that just going to end up with extension code erroring out and/or
> > blocking waiting for a bgworker to start?
>
> Well, that's the point to block things during an upgrade.  Do you have
> a list of requirements you'd like to see satisfied here?  POWA would
> be fine with blocking the execution of bgworkers AFAIK (Julien feel
> free to correct me here if necessary).

Yes, no problem at all, whether the bgworker isn't registered or never
launched.  The bgworker isn't even mandatory anymore since a few
years, as we introduced an external daemon to collect metrics on a
distant database.

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-27 Thread Julien Rouhaud

On Sat, Aug 28, 2021 at 3:28 AM Andres Freund  wrote:
>
> > On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier  wrote:
> >
> > > Wouldn't it be better to block things at the source, as of
> > > RegisterBackgroundWorker()?  And that would keep track of the control
> > > we have on bgworkers in a single place.  I also think that we'd better
> > > document something about that either in bgworker.sgml or pg_upgrade's
> > > page.
>
> Isn't that just going to end up with extension code erroring out and/or
> blocking waiting for a bgworker to start?

But there's no API to wait for the start of a non-dynamic bgworker or
check for it right?  So I don't see how the extension code could wait
or error out.  As far as I know the only thing you can do is
RegisterBackgroundWorker() in your _PG_init() code and hope that the
server is correctly configured.  The only thing that third-party code
could I think is try to check if the bgworker could be successfully
registered or not as I mentioned in [1].  Maybe extra paranoid code
may add such check in all executor hook but the overhead would be so
terrible that no one would use such an extension anyway.

[1] 
https://www.postgresql.org/message-id/CAOBaU_ZtR88x3Si6XwprqGo8UZNLncAQrD_-wc67sC=acO3g=w...@mail.gmail.com

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-27 Thread Michael Paquier

On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote:
> Isn't that just going to end up with extension code erroring out and/or
> blocking waiting for a bgworker to start?

Well, that's the point to block things during an upgrade.  Do you have
a list of requirements you'd like to see satisfied here?  POWA would
be fine with blocking the execution of bgworkers AFAIK (Julien feel
free to correct me here if necessary).  It could be possible that
preventing extension code to execute this way could prevent hazards as
well.  The idea from upthread to prevent any writes and/or WAL
activity is not really different as extensions may still generate an
error because of pg_upgrade's safety measures we'd put in, no?
--
Michael


signature.asc
Description: PGP signature

Re: Async-unsafe functions in signal handlers

2021-08-27 Thread Denis Smirnov



> 28 авг. 2021 г., в 07:05, Andres Freund  написал(а):
> 
> However, we have a
> bandaid that deals with possible hangs, by SIGKILLing when processes don't
> shut down (at that point things have already gone quite south, so that's not
> an issue).

Thanks for the explanation. I can see that child process SIGKILL machinery was 
introduced by 82233ce7ea42d6ba519aaec63008aff49da6c7af commit to fix a malloc() 
deadlock in quickdie() signal handler. As a result, all child processes that 
die too long are killed in the ServerLoop() with SIGKILL. But bgworker_die() is 
a problem as we initialize bgworkers right before ServerLoop(). So we can face 
malloc() deadlock on postmaster startup (before ServerLoop() started). Maybe we 
should simply use write() and exit() instead of ereport() for bgworker_die()?

Best regards,
Denis Smirnov | Developer
s...@arenadata.io 
Arenadata | Godovikova 9-17, Moscow 129085 Russia

Re: automatic analyze: readahead - add "IO read time" log message

2021-08-27 Thread Stephen Frost

Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Egor Rogov (e.ro...@postgrespro.ru) wrote:
> > On 11.02.2021 01:10, Stephen Frost wrote:
> > >* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> > >>On 05/02/2021 23:22, Stephen Frost wrote:
> > >>>Unless there's anything else on this, I'll commit these sometime next
> > >>>week.
> > >>One more thing: Instead of using 'effective_io_concurrency' GUC directly,
> > >>should call get_tablespace_maintenance_io_concurrency().
> > >Ah, yeah, of course.
> > >
> > >Updated patch attached.
> > 
> > I happened to notice that get_tablespace_io_concurrency() is called instead
> > of get_tablespace_maintenance_io_concurrency(). It doesn't look right, no?
> 
> Hah, indeed you're right.  Will fix.

Found this under a bit of a pile in my todo list. :)

Fix pushed.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-08-27 Thread Mark Dilger

> On Aug 23, 2021, at 1:46 PM, Stephen Frost  wrote:
> 
> I'd much rather we go down the path that Robert had suggested where we
> find a way to make a connection between the tenant role and everything
> that they create, and leave everything that is outside of that box on
> the other side of the 'wall'.  

I am coming around to this way of thinking.  The main difficulty here stems (as 
you know) from how CREATEROLE was implemented.  You and Tom had conversations 
about that back in 2005 [1], and Tom even suggested perhaps roles have owners:

> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:

> Possibly for 8.2 we could invent a notion of roles having owners.
> Offhand I don't see any harm in letting non-CREATEROLE users create
> non-login roles, and manipulate the membership of roles they have
> created (or that have been assigned to them by a superuser).  On the
> other hand, it could be that the WITH ADMIN OPTION feature is already
> sufficient for this.  This really needs some thought ...

Making roles owners of roles they create, and giving them the power to 
manipulate objects which belong to roles they own (recursively), seems to solve 
most of our problems we have been discussing.  The remaining problem is that 
roles without createrole or superuser cannot create other roles.  We don't want 
tenants to need either of those things, at least not as they are currently 
defined.  We could either modify the createrole privilege to be far less 
powerful, or create a new privilege.

If role owners can alter and drop roles they own (and ones those roles own, 
etc.) then we could redefine CREATEROLE to really just mean the ability to 
create new roles.  The ability to alter or drop roles would not stem from 
having CREATEROLE, but rather from owning the role.  For setups where one admin 
role has CREATEROLE and creates all other roles (except the superuser which 
created the admin) nothing changes.  In setups with multiple admins, where none 
own the others, each admin would have its own fiefdom, managing everything 
downstream from itself, but having no special privilege over the other 
fiefdoms.  I think that setup wasn't implemented for 8.1 more for lack of time 
than because it was unwanted.

Alternately, we could just create a new privilege parallel to CREATEROLE, but 
that seems confusing more than helpful.

Thoughts?

[1] https://www.postgresql.org/message-id/17554.1120258001%40sss.pgh.pa.us
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Deduplicate choice of horizon for a relation procarray.c.

2021-08-27 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> As the code in question was only introduced in dc7420c2c92 it seems worth
> backpatching this change as well, otherwise 14 will look different from all
> other branches.

Interestingly, these patches ended up actually introducing a difference
between 14 and master in the form of:

GlobalVisTestFor(Relation rel)

-   GlobalVisState *state;
+   GlobalVisState *state = NULL;

being done on master but not in the 14 stable branch, leading to, at
least for me:

.../src/backend/storage/ipc/procarray.c: In function ‘GlobalVisTestFor’:
.../src/backend/storage/ipc/procarray.c:4054:9: warning: ‘state’ may be used 
uninitialized in this function [-Wmaybe-uninitialized]
 4054 |  return state;
  | ^

Seems like we should include that change in 14 too, to get rid of the
above warning and to make that bit of code the same too..?

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: New predefined roles- 'pg_read/write_all_data'

2021-08-27 Thread Stephen Frost

Greetings,

* Michael Banck (michael.ba...@credativ.de) wrote:
> On Thu, Apr 01, 2021 at 04:00:06PM -0400, Stephen Frost wrote:
> > diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
> > index d171b13236..fe0bdb7599 100644
> > --- a/doc/src/sgml/user-manag.sgml
> > +++ b/doc/src/sgml/user-manag.sgml
> > @@ -518,6 +518,24 @@ DROP ROLE doomed_role;
> >
> >   
> >   
> > +  
> > +   pg_read_all_data
> > +   Read all data (tables, views, sequences), as if having SELECT
> > +   rights on those objects, and USAGE rights on all schemas, even 
> > without
> > +   having it explicitly.  This role does not have the role attribute
> > +   BYPASSRLS set.  If RLS is being used, an 
> > administrator
> > +   may wish to set BYPASSRLS on roles which this 
> > role is
> > +   GRANTed to.
> > +  
> > +  
> > +   pg_write_all_data
> > +   Write all data (tables, views, sequences), as if having 
> > INSERT,
> > +   UPDATE, and DELETE rights on those objects, and USAGE rights on all
> > +   schemas, even without having it explicitly.  This role does not 
> > have the
> > +   role attribute BYPASSRLS set.  If RLS is being 
> > used,
> > +   an administrator may wish to set BYPASSRLS on 
> > roles
> > +   which this role is GRANTed to.
> > +  
> 
> Shouldn't those "SELECT", "INSERT" etc. be wrapped in  tags?

Yeah, good point, fixed.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Async-unsafe functions in signal handlers

2021-08-27 Thread Andres Freund

Hi,

On 2021-08-27 23:51:27 +0300, Denis Smirnov wrote:
> > 26 авг. 2021 г., в 23:39, Tom Lane  написал(а):
> > 
> > (BTW, I think it's pretty silly to imagine that adding backtrace()
> > calls inside ereport is making things any more dangerous.  ereport
> > has pretty much always carried a likelihood of calling malloc(),
> > for example.)
> 
> I have taken a look through the signal handlers and found out that many of 
> them use malloc() via ereport() and elog(). Here is the list:
> 
> SIGUSR1
> - procsignal_sigusr1_handler(): autoprewarm, autovacuum, bgworker, bgwriter, 
> checkpointer, pgarch, startup, walwriter, walreciever, walsender

There shouldn't be meaningful uses of elog/ereport() inside
procsignal_sigusr1_handler(). The exception I found was an elog(FATAL) for
unreachable code.

> - sigusr1_handler(): postmaster
> SIGHUP:
> - SIGHUP_handler(): postmaster
> SIGCHLD:
> - reaper(): postmaster

I think these runs in a very controlled set of circumstances because most of
postmaster runs with signals masked.

> SIGFPE:
> - FloatExceptionHandler(): autovacuum, bgworker, postgres, plperl

Yep, although as discussed this might not be a "real" problem because it
should only run during an instruction triggering an FPE.

> SIGQUIT:
> - quickdie(): postgres

Yes, this is an issue. I've previously argued for handling this via write()
and _exit(), instead of the full ereport() machinery. However, we have a
bandaid that deals with possible hangs, by SIGKILLing when processes don't
shut down (at that point things have already gone quite south, so that's not
an issue).

> SIGTERM:
> - bgworker_die(): bgworker

Bad.

> SIGALRM:
> - handle_sig_alarm(): autovacuum, bgworker, postmaster, startup, walsender, 
> postgres

I don't think there reachable elogs in there. I'm not concerned about e.g.
elog(FATAL, "timeout index %d out of range 0..%d", index,
 num_active_timeouts - 1);
because that's not something that should ever be reachable in a production
scenario. If it is, there's bigger problems.

Perhaps we ought to have a version of Assert() that's enabled in production
builds as well, and that outputs the error messages via write() and then
_exit()s?

Greetings,

Andres Freund

Re: Async-unsafe functions in signal handlers

2021-08-27 Thread Denis Smirnov



> 26 авг. 2021 г., в 23:39, Tom Lane  написал(а):
> 
> (BTW, I think it's pretty silly to imagine that adding backtrace()
> calls inside ereport is making things any more dangerous.  ereport
> has pretty much always carried a likelihood of calling malloc(),
> for example.)

I have taken a look through the signal handlers and found out that many of them 
use malloc() via ereport() and elog(). Here is the list:

SIGUSR1
- procsignal_sigusr1_handler(): autoprewarm, autovacuum, bgworker, bgwriter, 
checkpointer, pgarch, startup, walwriter, walreciever, walsender
- sigusr1_handler(): postmaster

SIGFPE:
- FloatExceptionHandler(): autovacuum, bgworker, postgres, plperl

SIGHUP:
- SIGHUP_handler(): postmaster

SIGCHLD:
- reaper(): postmaster

SIGQUIT:
- quickdie(): postgres

SIGTERM:
- bgworker_die(): bgworker

SIGALRM:
- handle_sig_alarm(): autovacuum, bgworker, postmaster, startup, walsender, 
postgres

I suspect there are lots of potential ways to lock on malloc() inside any of 
this handlers. An interesting question is why there are still no evidence of 
such locks?

Best regards,
Denis Smirnov | Developer
s...@arenadata.io 
Arenadata | Godovikova 9-17, Moscow 129085 Russia

Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-27 Thread Peter Geoghegan

On Fri, Aug 27, 2021 at 12:55 PM Stephen Frost  wrote:
> > Okay. Plan is now to push these two patches together, later on. The
> > second patch concerns this separate track_io_timing issue. It's pretty
> > straightforward.
> >
> > (No change to the first patch in the series, relative to the v2 from 
> > earlier.)
>
> Looks alright to me.

Pushed both patches -- thanks.

-- 
Peter Geoghegan

Re: Queries that should be canceled will get stuck on secure_write function

2021-08-27 Thread Andres Freund

Hi,

On Fri, Aug 27, 2021, at 13:07, Robert Haas wrote:
> On Fri, Aug 27, 2021 at 3:24 PM Andres Freund  wrote:
> > I wonder if we could improve the situation somewhat by using the 
> > non-blocking
> > pqcomm functions in a few select places. E.g. if elog.c's
> > send_message_to_frontend() sent its message via a new 
> > pq_endmessage_noblock()
> > (which'd use the existing pq_putmessage_noblock()) and used
> > pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to 
> > the
> > client before AbortCurrentTransaction(), b) able to queue further error
> > messages safely.
> 
> pq_flush_if_writable() could succeed in sending only part of the data,
> so I don't see how this works.

All the data is buffered though, so I don't see that problem that causes?

Andres

Re: Queries that should be canceled will get stuck on secure_write function

2021-08-27 Thread Robert Haas

On Fri, Aug 27, 2021 at 3:24 PM Andres Freund  wrote:
> I wonder if we could improve the situation somewhat by using the non-blocking
> pqcomm functions in a few select places. E.g. if elog.c's
> send_message_to_frontend() sent its message via a new pq_endmessage_noblock()
> (which'd use the existing pq_putmessage_noblock()) and used
> pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the
> client before AbortCurrentTransaction(), b) able to queue further error
> messages safely.

pq_flush_if_writable() could succeed in sending only part of the data,
so I don't see how this works.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-27 Thread Stephen Frost

Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Fri, Aug 27, 2021 at 12:30 PM Stephen Frost  wrote:
> > I don't particularly care for that explain rule, ultimately, but it's
> > been around longer than I have and so I guess it wins.  I'm fine with
> > always showing the read/write for VACUUM and ANALYZE.
> >
> > Including 'ms' and lower-casing 'Timings' to 'timings' still strikes me
> > as something that should be consistent for all of these, but that's
> > independent of this and I'm not going to stress over it, particularly
> > since that's pre-existing.
> 
> Okay. Plan is now to push these two patches together, later on. The
> second patch concerns this separate track_io_timing issue. It's pretty
> straightforward.
> 
> (No change to the first patch in the series, relative to the v2 from earlier.)

Looks alright to me.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-27 Thread Peter Geoghegan

On Fri, Aug 27, 2021 at 12:30 PM Stephen Frost  wrote:
> I don't particularly care for that explain rule, ultimately, but it's
> been around longer than I have and so I guess it wins.  I'm fine with
> always showing the read/write for VACUUM and ANALYZE.
>
> Including 'ms' and lower-casing 'Timings' to 'timings' still strikes me
> as something that should be consistent for all of these, but that's
> independent of this and I'm not going to stress over it, particularly
> since that's pre-existing.

Okay. Plan is now to push these two patches together, later on. The
second patch concerns this separate track_io_timing issue. It's pretty
straightforward.

(No change to the first patch in the series, relative to the v2 from earlier.)

-- 
Peter Geoghegan


v3-0002-Show-zero-values-in-track_io_timing-log-output.patch
Description: Binary data


v3-0001-Reorder-log_autovacuum_min_duration-log-output.patch
Description: Binary data

Re: RFC: Improve CPU cache locality of syscache searches

2021-08-27 Thread Andres Freund

Hi,

On 2021-08-19 19:10:37 -0400, John Naylor wrote:
> I've made a small step in this direction in the attached. It uses a
> template approach to generate type-specific SearchCatCache* functions, for
> now only the 4-key ones. Since there's only a few of those, it's manageable
> to invoke the template and change the call sites by hand. To get this to
> scale up to all syscaches would require some scripting, but this much is
> enough to get feedback on the approach. One possible concern in starting
> down this path is that hundreds of call sites would have to be changed. (If
> there's a good way around that, it hasn't occurred to me.)

I was thinking of avoiding the need for that via a macro. I.e. have
SearchSysCache4(AMOPSTRATEGY, ...) be mapped to
SearchSysCache4AMOPSTRATEGY(...). That way callers wouldn't need to care, and
we could continue to evolve the internals without continually doing
large-scale code changes?


> diff --git a/src/include/utils/catcache_search_template.h 
> b/src/include/utils/catcache_search_template.h
> new file mode 100644
> index 00..6f5dc2573c
> --- /dev/null
> +++ b/src/include/utils/catcache_search_template.h
> @@ -0,0 +1,176 @@
> +/*-
> + * catcache_search_template.h
> + *
> + * A template for type-specialized SearchCatCache functions
> + *
> + * Usage notes:
> + *
> + * To generate functions specialized for a set of catcache keys,
> + * the following parameter macros should be #define'd before this
> + * file is included.
> + *
> + * - CC_SEARCH - the name of the search function to be generated
> + * - CC_NKEYS - the number of search keys for the tuple
> + * - FASTEQFUNCx (x in 1,2,3,4) - type-specific equality function(s)
> + * - HASHFUNCx (x in 1,2,3,4) - type-specific hash function(s)

It's not clear to me we need such a complicated interface here. Can't we just
have a pg_attribute_always_inline function with a bunch more parameters?

Greetings,

Andres Freund

Re: Estimating HugePages Requirements?

2021-08-27 Thread Andres Freund

Hi,

On 2021-08-27 19:27:18 +, Bossart, Nathan wrote:
> + 
> +  --output-shmem
> +  
> +   
> +Prints the amount of shared memory required for the given
> +configuration and exits.  This can be used on a running server, but
> +the return value reflects the amount of shared memory needed based
> +on the current invocation.  It does not return the amount of shared
> +memory in use by the running server.  This must be the first
> +argument on the command line.
> +   
> +
> +   
> +This option is useful for determining the number of huge pages
> +needed for the server.  For more information, see
> +.
> +   
> +  
> + 
> +

One thing I wonder is if this wouldn't better be dealt with in a more generic
way. While this is the most problematic runtime computed GUC, it's not the
only one. What if we introduced a new shared_memory_size GUC, and made
--describe-config output it? Perhaps adding --describe-config=guc-name?

I also wonder if we should output the number of hugepages needed instead of
the "raw" bytes of shared memory. The whole business about figuring out the
huge page size, dividing the shared memory size by that and then rounding up
could be removed in that case. Due to huge_page_size it's not even immediately
obvious which huge page size one should use...


> diff --git a/src/backend/main/main.c b/src/backend/main/main.c
> index 3a2a0d598c..c141ae3d1c 100644
> --- a/src/backend/main/main.c
> +++ b/src/backend/main/main.c
> @@ -182,9 +182,11 @@ main(int argc, char *argv[])
>*/
>  
>   if (argc > 1 && strcmp(argv[1], "--check") == 0)
> - BootstrapModeMain(argc, argv, true);
> + BootstrapModeMain(argc, argv, true, false);
> + else if (argc > 1 && strcmp(argv[1], "--output-shmem") == 0)
> + BootstrapModeMain(argc, argv, false, true);
>   else if (argc > 1 && strcmp(argv[1], "--boot") == 0)
> - BootstrapModeMain(argc, argv, false);
> + BootstrapModeMain(argc, argv, false, false);
>  #ifdef EXEC_BACKEND
>   else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
>   SubPostmasterMain(argc, argv);

help() needs an update too.


> diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
> index 3e4ec53a97..b225b1ee70 100644
> --- a/src/backend/storage/ipc/ipci.c
> +++ b/src/backend/storage/ipc/ipci.c
> @@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
>   total_addin_request = add_size(total_addin_request, size);
>  }
>  
> +/*
> + * CalculateShmemSize
> + *   Calculates the amount of shared memory and number of semaphores 
> needed.
> + *
> + * If num_semaphores is not NULL, it will be set to the number of semaphores
> + * required.
> + *
> + * Note that this function freezes the additional shared memory request size
> + * from loadable modules.
> + */
> +Size
> +CalculateShmemSize(int *num_semaphores)
> +{

Can you split this into a separate commit? It feels fairy uncontroversial to
me, so I think we could just apply it soon?

Greetings,

Andres Freund

\dP and \dX use ::regclass without "pg_catalog."

2021-08-27 Thread Justin Pryzby

I noticed that for \dP+ since 1c5d9270e, regclass is written without
"pg_catalog." (Alvaro and I failed to notice it in 421a2c483, too).

+   if (showNested || pattern)
+   appendPQExpBuffer(&buf,
+ ",\n  c3.oid::regclass as 
\"%s\"",
+ gettext_noop("Parent name"));
+
+   if (showIndexes)
+   appendPQExpBuffer(&buf,
+ ",\n c2.oid::regclass as 
\"%s\"",
+ gettext_noop("On table"));

\dX is new in v14, and introduced the same issue in ad600bba0 (and modifies it
but not fixed in a4d75c86).

I searched for issues like this, which finds all 4 errors with 1 false positive
in psql/describe.c

|time grep -wF "$(grep -oE 'pg_catalog\.[_[:alpha:]]+' src/bin/psql/describe.c 
|sed -r 's/^pg_catalog\.//; /^(char|oid|text|trigger)$/d' )" 
src/bin/psql/describe.c |grep -Ev 'pg_catalog\.|^   *[/ ]\*'
|#include "catalog/pg_am.h"
|  ",\n  
inh.inhparent::regclass as \"%s\"",
|  ",\n c2.oid::regclass as 
\"%s\"",
|  "  es.stxrelid::regclass) AS 
\"%s\"",
|  "es.stxrelid::regclass) AS 
\"%s\"",

Tom informs me that this is not considered to be interesting as a security 
patch.

-- 
Justin
>From 70eb6d65084104fa54965a349474cda8db6ed90d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Aug 2021 13:52:39 -0500
Subject: [PATCH 1/2] psql \dX: reference regclass with "pg_catalog." prefix

Should backpatch to v14

See similar issue fixed at 1d21ba8a9b8cb784f927a2c9c5818f8ff6779c0b.
---
 src/bin/psql/describe.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 8333558bda..772d881c2f 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4735,7 +4735,7 @@ listExtendedStats(const char *pattern)
 		appendPQExpBuffer(&buf,
 		  "pg_catalog.format('%%s FROM %%s', \n"
 		  "  pg_get_statisticsobjdef_columns(es.oid), \n"
-		  "  es.stxrelid::regclass) AS \"%s\"",
+		  "  es.stxrelid::pg_catalog.regclass) AS \"%s\"",
 		  gettext_noop("Definition"));
 	else
 		appendPQExpBuffer(&buf,
@@ -4746,7 +4746,7 @@ listExtendedStats(const char *pattern)
 		  "   ON (es.stxrelid = a.attrelid \n"
 		  "   AND a.attnum = s.attnum \n"
 		  "   AND NOT a.attisdropped)), \n"
-		  "es.stxrelid::regclass) AS \"%s\"",
+		  "es.stxrelid::pg_catalog.regclass) AS \"%s\"",
 		  gettext_noop("Definition"));
 
 	appendPQExpBuffer(&buf,
-- 
2.17.0

>From 9e243f51a095ec2ca5949d711483be2546e46d0e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Aug 2021 13:52:15 -0500
Subject: [PATCH 2/2] psql \dP: reference regclass with "pg_catalog." prefix

Should backpatch to v12
---
 src/bin/psql/describe.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 772d881c2f..30fb17123e 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4275,12 +4275,12 @@ listPartitionedTables(const char *reltypes, const char *pattern, bool verbose)
 
 	if (showNested || pattern)
 		appendPQExpBuffer(&buf,
-		  ",\n  inh.inhparent::regclass as \"%s\"",
+		  ",\n  inh.inhparent::pg_catalog.regclass as \"%s\"",
 		  gettext_noop("Parent name"));
 
 	if (showIndexes)
 		appendPQExpBuffer(&buf,
-		  ",\n c2.oid::regclass as \"%s\"",
+		  ",\n c2.oid::pg_catalog.regclass as \"%s\"",
 		  gettext_noop("Table"));
 
 	if (verbose)
-- 
2.17.0

Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-27 Thread Stephen Frost

Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Fri, Aug 27, 2021 at 11:35 AM Stephen Frost  wrote:
> > > BTW, I noticed one thing about the track_io_time stuff. Sometimes it
> > > looks like this:
> > >
> > > I/O timings:
> > >
> > > i.e., it doesn't show anything at all after the colon.
> 
> > Reporting zeros seems
> > valuable to me in that it shows that we did actually track the io timing
> > and there simply wasn't any time spent doing that- if we didn't include
> > the line at all then it wouldn't be clear if there wasn't any time spent
> > in i/o or if track io timing wasn't enabled.
> 
> The principle that we don't show things that are all-zeroes is unique
> to text-format EXPLAIN output -- any other EXPLAIN format doesn't
> treat all-zeroes as a special case. And so the most consistent and
> correct thing seems to be this: show both all-zero "read:" and
> "write:" (both in vacuumlazy.c and in analyze.c), without making any
> other changes (i.e., no changes to EXPLAIN output are needed).

I suppose.

> You seem to be almost sold on that plan anyway. But this text format
> EXPLAIN rule seems like it decides the question for us.

I don't particularly care for that explain rule, ultimately, but it's
been around longer than I have and so I guess it wins.  I'm fine with
always showing the read/write for VACUUM and ANALYZE.

Including 'ms' and lower-casing 'Timings' to 'timings' still strikes me
as something that should be consistent for all of these, but that's
independent of this and I'm not going to stress over it, particularly
since that's pre-existing.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2021-08-27 Thread Andres Freund

Hi,

On 2021-08-27 09:34:24 +0800, Julien Rouhaud wrote:
> On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier  wrote:
> >
> > Indeed, there is some history here with autovacuum.  I have not been
> > careful enough to check that.  Still, putting a check on
> > IsBinaryUpgrade in bgworker_should_start_now() would mean that we
> > still keep track of the set of bgworkers registered in shared memory.
> 
> That shouldn't lead to any problem right?
> 
> > Wouldn't it be better to block things at the source, as of
> > RegisterBackgroundWorker()?  And that would keep track of the control
> > we have on bgworkers in a single place.  I also think that we'd better
> > document something about that either in bgworker.sgml or pg_upgrade's
> > page.
> 
> I'm fine with that approach too.

Isn't that just going to end up with extension code erroring out and/or
blocking waiting for a bgworker to start?

Greetings,

Andres Freund

Re: Queries that should be canceled will get stuck on secure_write function

2021-08-27 Thread Andres Freund

Hi,

On 2021-08-27 08:27:38 -0400, Robert Haas wrote:
> On Tue, Aug 24, 2021 at 9:58 PM Fujii Masao  
> wrote:
> > to report an error to a client, and then calls AbortCurrentTransaction()
> > to abort the transaction. If secure_write() called by EmitErrorReport()
> > gets stuck, a backend gets stuck inside transaction block and the locks
> > keep being held unnecessarily. Isn't this problematic? Can we change
> > the order of them?
> ...
> More generally, I think it's hopeless to try to improve the situation
> very much by changing the place where secure_write() happens. It
> happens in so many different places, and it is clearly not going to be
> possible to make it so that in none of those places do we hold any
> important server resources. Therefore I think the only solution is to
> fix secure_write() itself ... and the trick is what to do there given
> that we have to be very careful not to do anything that might try to
> write another message while we are already in the middle of writing a
> message.

I wonder if we could improve the situation somewhat by using the non-blocking
pqcomm functions in a few select places. E.g. if elog.c's
send_message_to_frontend() sent its message via a new pq_endmessage_noblock()
(which'd use the existing pq_putmessage_noblock()) and used
pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the
client before AbortCurrentTransaction(), b) able to queue further error
messages safely.

I think this'd not violate the goal of putting pq_flush() into
send_message_to_frontend():
/*
 * This flush is normally not necessary, since postgres.c will flush out
 * waiting data when control returns to the main loop. But it seems best
 * to leave it here, so that the client has some clue what happened if 
the
 * backend dies before getting back to the main loop ... error/notice
 * messages should not be a performance-critical path anyway, so an 
extra
 * flush won't hurt much ...
 */
pq_flush();

because the only situations where we'd not send the data out immediately would
be when the socket buffer is already full. In which case the client wouldn't
get the error immediately anyway?

Greetings,

Andres Freund

Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-27 Thread Peter Geoghegan

On Fri, Aug 27, 2021 at 11:35 AM Stephen Frost  wrote:
> > BTW, I noticed one thing about the track_io_time stuff. Sometimes it
> > looks like this:
> >
> > I/O timings:
> >
> > i.e., it doesn't show anything at all after the colon.

> Reporting zeros seems
> valuable to me in that it shows that we did actually track the io timing
> and there simply wasn't any time spent doing that- if we didn't include
> the line at all then it wouldn't be clear if there wasn't any time spent
> in i/o or if track io timing wasn't enabled.

The principle that we don't show things that are all-zeroes is unique
to text-format EXPLAIN output -- any other EXPLAIN format doesn't
treat all-zeroes as a special case. And so the most consistent and
correct thing seems to be this: show both all-zero "read:" and
"write:" (both in vacuumlazy.c and in analyze.c), without making any
other changes (i.e., no changes to EXPLAIN output are needed).

You seem to be almost sold on that plan anyway. But this text format
EXPLAIN rule seems like it decides the question for us.

-- 
Peter Geoghegan

Re: Queries that should be canceled will get stuck on secure_write function

2021-08-27 Thread Andres Freund

Hi,

On 2021-08-23 10:13:03 -0400, Robert Haas wrote:
> On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于)  wrote:
> > I want to know why the interrupt is only handled when ProcDiePending is 
> > true, I think query which is supposed to be canceled also should respond to 
> > the signal.
> 
> Well, if we're halfway through sending a message to the client and we
> abort the write, we have no way of re-establishing protocol sync,
> right? It's OK to die without sending any more data to the client,
> since then the connection is closed anyway and the loss of sync
> doesn't matter, but continuing the session doesn't seem workable.

Right.

> Your proposed patch actually seems to dodge this problem and I think
> perhaps we could consider something along those lines. It would be
> interesting to hear what Andres thinks about this idea, since I think
> he was the last one to rewrite that code.

I think it's a reasonable idea. I have some quibbles with the implementation
(new code should be in ProcessClientWriteInterrupt(), not secure_write()), and
I suspect we should escalate more quickly to killing the connection, but those
seem like details.

I think that if we want to tackle this, we need to do the same for
secure_read() as well. secure_read() will process interrupts normally while
DoingCommandRead, but if we're already in a command, we'd just as well be
prevented from processing a !ProcDiePending interrupt.

Greetings,

Andres Freund

Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-27 Thread Peter Geoghegan

On Thu, Aug 26, 2021 at 10:28 PM Peter Geoghegan  wrote:
> I'll commit this in a day or two, backpatching to 14. Barring any objections.

Actually, we also need to make the corresponding lines for ANALYZE
follow the same convention -- those really should be consistent. As in
the attached revision.

I haven't tried to address the issue with "I/O timings:" that I just
brought to Stephen's attention. We can handle that question
separately.

-- 
Peter Geoghegan


v2-0001-Reorder-log_autovacuum_min_duration-log-output.patch
Description: Binary data

Re: Estimating HugePages Requirements?

2021-08-27 Thread Andres Freund

On 2021-08-27 16:40:27 +0200, Magnus Hagander wrote:
> On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier  wrote:
> I'd say a lot more than just handy. I don't think the workaround is
> really all that useful.

+1

Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-27 Thread Stephen Frost

Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Wed, Aug 25, 2021 at 2:07 PM Stephen Frost  wrote:
> > I generally like the idea though I'm not sure about changing things in
> > v13 as there's likely code out there that's already parsing that data
> > and it might suddenly break if this was changed.
> 
> Agreed -- I won't backpatch anything to v13.

Ok.

> > Given that such code would need to be adjusted for v14 anyway, I don't
> > really see changing it in v14 as as being an issue (nor do I feel that
> > it's even a big concern at this point in the release cycle, though
> > perhaps others feel differently).
> 
> BTW, I noticed one thing about the track_io_time stuff. Sometimes it
> looks like this:
> 
> I/O timings:
> 
> i.e., it doesn't show anything at all after the colon. This happens
> because the instrumentation indicates that no time was spent on either
> read I/O or write I/O.

Hrmpf.  That's an interesting point.

> I now wonder if we should just unconditionally report both things
> (both "read:" and "write:"), without regard for whether or not they're
> non-zero. (We'd do the same thing with ANALYZE's equivalent code too,
> if we actually did this -- same issue there.)

So, it was done that way to match how we report I/O Timings from explain
analyze, around src/backend/commands/explain.c:3574 (which I note is now
slightly different from what VACUUM/ANALYZE do due to f4f4a64...).  The
intent was to be consistent in all of these places and I generally still
feel that's a worthwhile thing to strive for.

I don't have any particular problem with just always reporting it.  Sure
looks odd to have the line there w/o anything after it.  Perhaps we
should also address that in the explain analyze case though, and make
the same changes there that were done in f4f4a64?  Reporting zeros seems
valuable to me in that it shows that we did actually track the io timing
and there simply wasn't any time spent doing that- if we didn't include
the line at all then it wouldn't be clear if there wasn't any time spent
in i/o or if track io timing wasn't enabled.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [PATCH] Proof of concept for GUC improvements

2021-08-27 Thread David Christensen

On Fri, Aug 27, 2021 at 1:19 AM Michael Paquier  wrote:
>
> On Thu, Aug 19, 2021 at 03:58:57PM -0700, David G. Johnston wrote:
> > I'm at -0.5 as to whether such a patch would actually be an improvement or
> > whether the added possibilities would just be confusing and, because it is
> > all optional, indefinitely so.
>
> FWIW, I find this proposition of introducing a set of optional
> synonyms to map with some special-case values we have in the
> configurations a bit confusing, as that's basically introducing
> enum-like options into GUCs that already have a type assigned.

It does use enum-like mappings, but that is just because I needed to
tie together name + value and just reused the already similar data
structure.  That could be changed if the code itself is less
understandable based on the struct names.

> The patch, with its set of options like special_disabled0,
> special_disabled_all is not really easy to parse either so that's just
> a recipe to make the set of synonyms to grow on an GUC-basis.

Yes, when I started out on this, I expected maybe 2-3 different
interpretations at most, with more common overlap.  I am not tied to
making *every* GUC support this; maybe we support the special_disabled
or special_disabled0 with differing names.

> What I am wondering, though, is if there are cases in the existing
> GUCs, with their existing types, where the situation of a default or
> disabled value could be improved, though, to make the overall picture
> more consistent.

I think this would be possible, although the benefit of what I've
written is that it doesn't change the interpretation of the value
anywhere else, just in GUC parsing (and optionally GUC display).  The
parsing was where I felt this improved understanding, I'm less tied to
outputting in the "canonical" way.

Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-27 Thread Peter Geoghegan

On Wed, Aug 25, 2021 at 2:07 PM Stephen Frost  wrote:
> I generally like the idea though I'm not sure about changing things in
> v13 as there's likely code out there that's already parsing that data
> and it might suddenly break if this was changed.

Agreed -- I won't backpatch anything to v13.

> Given that such code would need to be adjusted for v14 anyway, I don't
> really see changing it in v14 as as being an issue (nor do I feel that
> it's even a big concern at this point in the release cycle, though
> perhaps others feel differently).

BTW, I noticed one thing about the track_io_time stuff. Sometimes it
looks like this:

I/O timings:

i.e., it doesn't show anything at all after the colon. This happens
because the instrumentation indicates that no time was spent on either
read I/O or write I/O.

I now wonder if we should just unconditionally report both things
(both "read:" and "write:"), without regard for whether or not they're
non-zero. (We'd do the same thing with ANALYZE's equivalent code too,
if we actually did this -- same issue there.)

-- 
Peter Geoghegan

Re: Fwd: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-27 Thread Tom Lane

Andrew Dunstan  writes:
> First, this apparently only affects build done with NLS. And in fact
> even on release 11 the performance is much better when run on a non-NLS
> build. So there's lots of work to do here.

Wow ... it would not have occurred to me to check that.

Testing that angle using HEAD on Linux (RHEL8), here are times
I see for the OP's slow query:

Non-NLS build, C locale:
Time: 12452.062 ms (00:12.452)

NLS build, en_US.utf8 locale:
Time: 13596.114 ms (00:13.596)

NLS build, after SET lc_messages TO 'es_ES.utf8':
Time: 15190.689 ms (00:15.191)

So there is a cost for translating the error messages on Linux too,
but it's not nearly as awful as on Windows.  I wonder if this
boils down to a performance bug in the particular gettext version
you're using?

regards, tom lane

Re: [PATCH] pgbench: add multiconnect option

2021-08-27 Thread David Christensen

> >> Good. I was thinking of adding such capability, possibly for handling
> >> connection errors and reconnecting…
> >
> > round-robin and random make sense.  I am wondering how round-robin
> > would work with -C, though?  Would you just reuse the same connection
> > string as the one chosen at the starting point.
>
> Well, not necessarily, but this is debatable.

My expectation for such a behavior would be that it would reconnect to
a random connstring each time, otherwise what's the point of using
this with -C?  If we needed to forbid some option combinations that is
also an option.

> >> I was thinking of providing a allowing a list of conninfo strings with
> >> repeated options, eg --conninfo "foo" --conninfo "bla"…
> >
> > That was my first thought when reading the subject of this thread:
> > create a list of connection strings and pass one of them to
> > doConnect() to grab the properties looked for.  That's a bit confusing
> > though as pgbench does not support directly connection strings,
>
> They are supported because libpq silently assumes that "dbname" can be a
> full connection string.
>
> > and we should be careful to keep fallback_application_name intact.
>
> Hmmm. See attached patch, ISTM that it does the right thing.

I guess the multiple --conninfo approach is fine; I personally liked
having the list come from a file, as you could benchmark different
groups/clusters based on a file, much easier than constructing
multiple pgbench invocations depending.  I can see an argument for
both approaches.  The PGSERVICEFILE was an idea I'd had to store
easily indexed groups of connection information in a way that I didn't
need to know all the details, could easily parse, and could later pass
in the ENV so libpq could just pull out the information.

> >> Your approach using PGSERVICEFILE also make sense!
> >
> > I am not sure that's actually needed here, as it is possible to pass
> > down a service name within a connection string.  I think that you'd
> > better leave libpq do all the work related to a service file, if
> > specified.  pgbench does not need to know any of that.
>
> Yes, this is an inconvenient with this approach, part of libpq machinery
> is more or less replicated in pgbench, which is quite annoying, and less
> powerful.

There is some small fraction reproduced here just to pull out the
named sections; no other parsing should be done though.

> Attached my work-in-progress version, with a few open issues (eg probably
> not thread safe), but comments about the provided feature are welcome.
>
> I borrowed the "strategy" option, renamed policy, from the initial patch.
> Pgbench just accepts several connection strings as parameters, eg:
>
>pgbench ... "service=db1" "service=db2" "service=db3"
>
> The next stage is to map scripts to connections types and connections
> to connection types, so that pgbench could run W transactions against a
> primary and R transactions agains a hot standby, for instance. I have a
> some design for that, but nothing is implemented.
>
> There is also the combination with the error handling patch to consider:
> if a connection fails, a connection to a replica could be issued instead.

I'll see if I can take a look at your latest patch.  I was also
wondering about how we should handle `pgbench -i` with multiple
connection strings; currently it would only initialize with the first
DSN it gets, but it probably makes sense to run initialize against all
of the databases (or at least attempt to).  Maybe this is one argument
for the multiple --conninfo handling, since you could explicitly pass
the databases you want.  (Not that it is hard to just loop over
connection info and `pgbench -i` with ENV, or any other number of ways
to accomplish the same thing.)

Best,

David

Fwd: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

2021-08-27 Thread Andrew Dunstan



Somehow -hackers got left off the cc:


On 8/22/21 6:11 PM, Andrew Dunstan wrote:
> On 8/22/21 5:59 PM, l...@laurent-hasson.com wrote:
>> > -Original Message-
>> > From: Andrew Dunstan 
>> > Sent: Sunday, August 22, 2021 17:27
>> > To: Tom Lane ; l...@laurent-hasson.com
>> > Cc: Justin Pryzby ; Ranier Vilela
>> > ; pgsql-performa...@postgresql.org
>> > Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2
>> > and 13.4
>> > > > On 8/22/21 4:11 PM, Tom Lane wrote:
>> > > "l...@laurent-hasson.com"  writes:
>> > >> I do have a Linux install of 13.3, and things work beautifully,
>> so this is
>> > definitely a Windows thing here that started in V12.
>> > > It's good to have a box around it, but that's still a pretty
>> large box
>> > > :-(.
>> > >
>> > > I'm hoping that one of our Windows-using developers will see if they
>> > > can reproduce this, and if so, try to bisect where it started.
>> > > Not sure how to make further progress without that.
>> > >
>> > >
>> > > > Can do. Assuming the assertion that it started in Release 12 is
>> correct, I
>> > should be able to find it by bisecting between the branch point for 12
>> > and the tip of that branch. That's a little over 20 probes by my
>> > calculation.
>> > > > cheers
>> > > > andrew
>> > > > --
>> > Andrew Dunstan
>> > EDB: https://www.enterprisedb.com
>>
>>
>> I tried it on 11.13 and 12.3. Is there a place where I could download
>> 12.1 and 12.2 and test that? Is it worth it or you think you have all
>> you need?
>>
>
> I think I have everything I need.
>
>
> Step one will be to verify that the difference exists between the branch
> point and the tip of release 12. Once that's done it will be a matter of
> probing until the commit at fault is identified.
>

OK, here's what we know.


First, this apparently only affects build done with NLS. And in fact
even on release 11 the performance is much better when run on a non-NLS
build. So there's lots of work to do here.


I can't yet pinpoint the place where it got disastrously bad, because I
can't build with VS2017 back past commit a169155453 on the REL_13_STABLE
branch. That commit fixed an issue with VS2015 and newer.


The machine that runs bowerbird has some older VS installations, and
choco has vs2013 packages, so there are opportunities to explore
further. I'll get back to this in a couple of days.


Thanks to my EDB colleagues Sandeep Thakkar and Tushar Ahuja for helping
to identify the cause of the issue.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] pgbench: add multiconnect option

2021-08-27 Thread Fabien COELHO



Bonjour Michaël,


Good. I was thinking of adding such capability, possibly for handling
connection errors and reconnecting…


round-robin and random make sense.  I am wondering how round-robin
would work with -C, though?  Would you just reuse the same connection
string as the one chosen at the starting point.


Well, not necessarily, but this is debatable.


I was thinking of providing a allowing a list of conninfo strings with
repeated options, eg --conninfo "foo" --conninfo "bla"…


That was my first thought when reading the subject of this thread:
create a list of connection strings and pass one of them to
doConnect() to grab the properties looked for.  That's a bit confusing
though as pgbench does not support directly connection strings,


They are supported because libpq silently assumes that "dbname" can be a 
full connection string.



and we should be careful to keep fallback_application_name intact.


Hmmm. See attached patch, ISTM that it does the right thing.


Your approach using PGSERVICEFILE also make sense!


I am not sure that's actually needed here, as it is possible to pass
down a service name within a connection string.  I think that you'd
better leave libpq do all the work related to a service file, if
specified.  pgbench does not need to know any of that.


Yes, this is an inconvenient with this approach, part of libpq machinery
is more or less replicated in pgbench, which is quite annoying, and less 
powerful.


Attached my work-in-progress version, with a few open issues (eg probably 
not thread safe), but comments about the provided feature are welcome.


I borrowed the "strategy" option, renamed policy, from the initial patch. 
Pgbench just accepts several connection strings as parameters, eg:


  pgbench ... "service=db1" "service=db2" "service=db3"

The next stage is to map scripts to connections types and connections
to connection types, so that pgbench could run W transactions against a 
primary and R transactions agains a hot standby, for instance. I have a 
some design for that, but nothing is implemented.


There is also the combination with the error handling patch to consider: 
if a connection fails, a connection to a replica could be issued instead.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..7b99344c90 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -29,7 +29,7 @@ PostgreSQL documentation
   
pgbench
option
-   dbname
+   dbname or conninfo
   
  
 
@@ -160,6 +160,9 @@ pgbench  options  d
 not specified, the environment variable
 PGDATABASE is used. If that is not set, the
 user name specified for the connection is used.
+Alternatively, the dbname can be
+a standard connection information string.
+Several connections can be provided.

   
  
@@ -840,6 +843,21 @@ pgbench  options  d
 
 
 
+ 
+  --connection-policy=policy
+  
+   
+Set the connection policy when multiple connections are available.
+Default is round-robin provided (ro).
+Possible values are:
+first (f), 
+random (ra), 
+round-robin (ro),
+working (w).
+   
+  
+ 
+
  
   -h hostname
   --host=hostname
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index b0e20c46ae..95e58f0573 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -277,13 +277,39 @@ bool		is_connect;			/* establish connection for each transaction */
 bool		report_per_command; /* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
 
+char	   *logfile_prefix = NULL;
+
+/* main connection definition */
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
-const char *dbName = NULL;
-char	   *logfile_prefix = NULL;
 const char *progname;
 
+/* multi connections */
+typedef enum mc_policy_t
+{
+	MC_UNKNOWN = 0,
+	MC_FIRST,
+	MC_RANDOM,
+	MC_ROUND_ROBIN,
+	MC_WORKING
+} mc_policy_t;
+
+/* connection info list */
+typedef struct connection_t
+{
+	const char *connection;		/* conninfo or dbname */
+	int			errors;			/* number of connection errors */
+} connection_t;
+
+static intn_connections = 0;
+static connection_t	   *connections = NULL;
+static mc_policy_t	mc_policy = MC_ROUND_ROBIN;
+
+/* last used connection */
+// FIXME per thread?
+static int current_connection = 0;
+
 #define WSEP '@'/* weight separator */
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
@@ -701,7 +727,7 @@ usage(void)
 {
 	printf("%s is a benchmarking tool for PostgreSQL.\n\n"
 		   "Usage:\n"
-		   "  %s [OPTION]... [DBNAME]\n"
+		   "  %s [OPTION]... [DBNAME or CONNINFO ...]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize invokes initialization mode\n"
 		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
@

Re: Avoid repeated PQfnumber() in pg_dump

2021-08-27 Thread Bossart, Nathan

On 8/27/21, 7:27 AM, "Daniel Gustafsson"  wrote:
> I took another look at this today and pushed it after verifying with a 
> pgindent
> run.  Thanks!

Thank you!

Nathan

Re: Partition Check not updated when insert into a partition

2021-08-27 Thread Nitin Jadhav

I have reviewed the patch and it looks good to me. However I have one comment.

+   foreach(l, attachrel_children)
+   {
+   Oid partOid = lfirst_oid(l);
+
+   CacheInvalidateRelcacheByRelid(partOid);
+   }

Can we avoid using the extra variable 'partOid' and directly pass
'lfirst_oid(l)' to CacheInvalidateRelcacheByRelid().

Thanks & Regards,
Nitin Jadhav

On Fri, Aug 27, 2021 at 2:50 PM Amit Langote  wrote:
>
> On Thu, Aug 5, 2021 at 11:32 AM Amit Langote  wrote:
> > On Wed, Jul 14, 2021 at 11:16 AM houzj.f...@fujitsu.com
> > > On Tuesday, July 13, 2021 2:52 AM Alvaro Herrera 
> > >  wrote:
> > > > Did you have a misbehaving test for the ATTACH case?
> > >
> > > Thanks for the response.
> >
> > Thank you both.
> >
> > > Yes, I think the following example of ATTACH doesn't work as expected.
> >
> > Yeah, need the fix for the ATTACH case too.
> >
> > Couple more things:
> >
> > * We must invalidate not just the "direct" partitions of the table
> > being attached/detached, but also any indirect ones, because all of
> > their partition constraints would need to contain (or no longer
> > contain) the root parent's partition constraint.
> >
> > * I think we should lock the partitions before sending the
> > invalidation.  The ATTACH code already locks the descendents for a
> > different purpose, but DETACH doesn't, so the latter needs to be fixed
> > to match.
> >
> > I've updated Alvaro's patch to address these points.  Maybe, we should
> > also add these cases to the regression and isolation suites?
>
> Apparently, I had posted a version of the patch that didn't even compile.
>
> I have fixed that in the attached and also added regression tests.
>
> Adding this to the next CF.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Re: Estimating HugePages Requirements?

2021-08-27 Thread Magnus Hagander

On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier  wrote:
>
> On Wed, Aug 11, 2021 at 11:23:52PM +, Bossart, Nathan wrote:
> > I think BootstrapModeMain() makes the most sense.  It fits in nicely
> > with the --check logic that's already there.  With v3, the following
> > command can be used to retrieve the amount of shared memory required.
> >
> > postgres --output-shmem -D dir
> >
> > While testing this new option, I noticed that you can achieve similar
> > results today with the following command, although this one will
> > actually try to create the shared memory, too.
>
> That may not be the best option.

I would say that can be a disastrous option.

First of all it would probably not work if you already have something
running -- especially when using huge pages. And if it does work, in
that or other scenarios, it can potentially have significant impact on
a running cluster to suddenly allocate many GB of more memory...


> > IMO the new option is still handy, but I can see the argument that it
> > might not be necessary.
>
> A separate option looks handy.  Wouldn't it be better to document it
> in postgres-ref.sgml then?

I'd say a lot more than just handy. I don't think the workaround is
really all that useful.

(haven't looked at the actual patch yet, just commenting on the principle)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Avoid repeated PQfnumber() in pg_dump

2021-08-27 Thread Daniel Gustafsson

> On 23 Jul 2021, at 01:39, Bossart, Nathan  wrote:

> The patch looks good to me.  I am marking it as ready-for-committer.

I took another look at this today and pushed it after verifying with a pgindent
run.  Thanks! 

--
Daniel Gustafsson   https://vmware.com/

Re: [PATCH] Allow multiple recursive self-references

2021-08-27 Thread Peter Eisentraut


I tested the following query (from SQLite documentation):

CREATE TABLE edge(aa INT, bb INT);

WITH RECURSIVE nodes(x) AS (
   SELECT 59
   UNION
   SELECT aa FROM edge JOIN nodes ON bb=x
   UNION
   SELECT bb FROM edge JOIN nodes ON aa=x
)
SELECT x FROM nodes;

ERROR:  42P19: recursive reference to query "nodes" must not appear 
within its non-recursive term

LINE 4:SELECT aa FROM edge JOIN nodes ON bb=x
^
LOCATION:  checkWellFormedRecursionWalker, parse_cte.c:960

This well-formedness check apparently needs to be enhanced to allow for 
more than two branches in the union.

Re: [PATCH] Add OAUTH2_SCOPE variable for scope configuration

2021-08-27 Thread Daniel Gustafsson

> On 27 Aug 2021, at 14:15, Nico Rikken  wrote:

> I haven't yet tested this, as I'm still in the process of setting up a local
> development environment. I hope somebody else here can help me with the 
> quality
> assurance.

This is the mailinglist for the core postgres server, for pgadmin development
please see the below URL for an appropriate list:

https://www.pgadmin.org/support/list/

I’m sure someone there will be able to help.

--
Daniel Gustafsson   https://vmware.com/

[PATCH] Add OAUTH2_SCOPE variable for scope configuration

2021-08-27 Thread Nico Rikken

In certain cases like with OpenID Connect, a different scope is needed. This
patch adds an additional variable `OAUTH2_SCOPE` that can be used to configure
the appropriate scope for the deployment. Already there are runtime checks to
ensure that the email claim is included in the user profile, so there is no need
for similar checks on the configuration. This commit does introduce a check in
the oauth2.py if a value for OAUTH2_SCOPE is set, to prevent a breaking change.

Related issue: https://redmine.postgresql.org/issues/6627
OIDC docs: https://openid.net/specs/openid-connect-core-1_0.html#ScopeClaims

I haven't yet tested this, as I'm still in the process of setting up a local
development environment. I hope somebody else here can help me with the quality
assurance.

Signed-off-by: Nico Rikken 
---
 docs/en_US/oauth2.rst | 1 +
 web/config.py | 3 +++
 web/pgadmin/authenticate/oauth2.py| 6 +-
 web/pgadmin/browser/tests/test_oauth2_with_mocking.py | 1 +
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/docs/en_US/oauth2.rst b/docs/en_US/oauth2.rst
index 8947b509e..4cc2628f5 100644
--- a/docs/en_US/oauth2.rst
+++ b/docs/en_US/oauth2.rst
@@ -30,6 +30,7 @@ and modify the values for the following parameters:
 "OAUTH2_AUTHORIZATION_URL", "Endpoint for user authorization"
 "OAUTH2_API_BASE_URL", "Oauth2 base URL endpoint to make requests simple, 
ex: *https://api.github.com/*";
 "OAUTH2_USERINFO_ENDPOINT", "User Endpoint, ex: *user* (for github) and 
*useinfo* (for google)"
+"OAUTH2_SCOPE", "Oauth scope, ex: 'openid email profile'. Note that an 
'email' claim is required in the resulting profile."
 "OAUTH2_ICON", "The Font-awesome icon to be placed on the oauth2 button,  
ex: fa-github"
 "OAUTH2_BUTTON_COLOR", "Oauth2 button color"
 "OAUTH2_AUTO_CREATE_USER", "Set the value to *True* if you want to 
automatically
diff --git a/web/config.py b/web/config.py
index d797e26f7..e932d17fc 100644
--- a/web/config.py
+++ b/web/config.py
@@ -711,6 +711,9 @@ OAUTH2_CONFIG = [
 # Name of the Endpoint, ex: user
 'OAUTH2_USERINFO_ENDPOINT': None,
 # Font-awesome icon, ex: fa-github
+'OAUTH2_SCOPE': None,
+# Oauth scope, ex: 'openid email profile'
+# Note that an 'email' claim is required in the resulting profile
 'OAUTH2_ICON': None,
 # UI button colour, ex: #ff
 'OAUTH2_BUTTON_COLOR': None,
diff --git a/web/pgadmin/authenticate/oauth2.py 
b/web/pgadmin/authenticate/oauth2.py
index 91903165a..5e60d35dd 100644
--- a/web/pgadmin/authenticate/oauth2.py
+++ b/web/pgadmin/authenticate/oauth2.py
@@ -104,7 +104,11 @@ class OAuth2Authentication(BaseAuthentication):
 access_token_url=oauth2_config['OAUTH2_TOKEN_URL'],
 authorize_url=oauth2_config['OAUTH2_AUTHORIZATION_URL'],
 api_base_url=oauth2_config['OAUTH2_API_BASE_URL'],
-client_kwargs={'scope': 'email profile'}
+# Resort to previously hardcoded scope 'email profile' in case
+# no OAUTH2_SCOPE is provided. This prevents a breaking change.
+client_kwargs={'scope':
+   oauth2_config.get('OAUTH2_SCOPE',
+ 'email profile')}
 )
 
 def get_source_name(self):
diff --git a/web/pgadmin/browser/tests/test_oauth2_with_mocking.py 
b/web/pgadmin/browser/tests/test_oauth2_with_mocking.py
index b170720a8..71706ebe6 100644
--- a/web/pgadmin/browser/tests/test_oauth2_with_mocking.py
+++ b/web/pgadmin/browser/tests/test_oauth2_with_mocking.py
@@ -58,6 +58,7 @@ class Oauth2LoginMockTestCase(BaseTestGenerator):
 'https://github.com/login/oauth/authorize',
 'OAUTH2_API_BASE_URL': 'https://api.github.com/',
 'OAUTH2_USERINFO_ENDPOINT': 'user',
+'OAUTH2_SCOPE': 'email profile',
 'OAUTH2_ICON': 'fa-github',
 'OAUTH2_BUTTON_COLOR': '#3253a8',
 }
-- 
2.25.1

Re: Supply restore_command to pg_rewind via CLI argument

2021-08-27 Thread Alexey Kondratov

On Fri, Aug 27, 2021 at 10:05 AM Andrey Borodin  wrote:
> There is a small bug
> +   /*
> +* Take restore_command from the postgresql.conf only if it is not 
> already
> +* provided as a command line option.
> +*/
> +   if (!restore_wal && restore_command == NULL)
> return;
>
> I think we should use condition (!restore_wal || restore_command != NULL).
>

Yes, you are right, thanks. Attached is a fixed version. Tests were
passing since PostgresNode->enable_restoring is adding restore_command
to the postgresql.conf anyway.

>
> Besides this patch looks good and is ready for committer IMV.
>


-- 
Alexey Kondratov


v2-0001-Allow-providing-restore_command-as-a-command-line.patch
Description: Binary data

Re: UNIQUE null treatment option

2021-08-27 Thread Marko Tiikkaja

On Fri, Aug 27, 2021 at 3:38 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> In the SQL:202x draft, this
> has been formalized by making this implementation-defined and adding
> an option on unique constraint definitions UNIQUE [ NULLS [NOT]
> DISTINCT ] to choose a behavior explicitly.
>
> The CREATE UNIQUE INDEX syntax extension is not from the standard,
> it's my own invention.
>

For the unique index syntax, should this be selectable per
column/expression, rather than for the entire index as a whole?


.m

Re: Added schema level support for publication.

2021-08-27 Thread vignesh C

On Fri, Aug 27, 2021 at 4:57 PM Amit Kapila  wrote:
>
> On Fri, Aug 27, 2021 at 11:43 AM vignesh C  wrote:
> >
> > On Wed, Aug 25, 2021 at 3:07 PM vignesh C  wrote:
> >
> > I have implemented this in the 0003 patch, I have kept it separate to
> > reduce the testing effort and also it will be easier if someone
> > disagrees with the syntax. I will merge it to the main patch later
> > based on the feedback. Attached v22 patch has the changes for the
> > same.
> >
>
> Few comments on v22-0001-Added-schema-level-support-for-publication:
> 
> 1. Why in publication_add_schema(), you are registering invalidation
> for all schema relations? It seems this is to allow rebuilding the
> publication info for decoding sessions. But that is not required as
> you are registering rel_sync_cache_publication_cb for
> publication_schema relation. In rel_sync_cache_publication_cb, we are
> marking replicate_valid as false for each entry which will allow
> publication info to be rebuilt in get_rel_sync_entry.
>
> I see that it is done for a single relation in the current code in
> function publication_add_relation but I guess that is also not
> required. You can once test this. If you still think it is required,
> can you please explain me why and then we can accordingly add some
> comments in the patch.

I felt this is required for handling the following concurrency scenario:
create schema sch1;
create table sch1.t1(c1 int);
insert into sch1.t1 values(10);
update sch1.t1 set c1 = 11;
# update will be successful and relation cache will update publication
actions based on the current state i.e no publication
create publication pub1 for all tables in schema sch1;
# After the above publication is created the relations present in this
schema should be invalidated so that the next update should fail. If
the relations are not invalidated the updates will be successful based
on previous publication actions.
update sch1.t1 set c1 = 11;
I will add comments to mention the above details. Thoughts?

> Peter E., Sawada-San, can you please let me know if I am missing
> something in this regard? In the code, I see a comment "/* Invalidate
> relcache so that publication info is rebuilt. */" in function
> publication_add_relation() but I don't understand why it is required
> as per my explanation above?
>
> 2.
> + * Publication object type
> + */
> +typedef enum PublicationObjSpecType
> +{
> + PUBLICATIONOBJ_TABLE, /* Table type */
> + PUBLICATIONOBJ_SCHEMA, /* Schema type */
> + PUBLICATIONOBJ_SEQUENCE, /* Sequence type */
>
> Why add anything related to the sequence in this patch?

I will handle this in my next version.

> 3.
> +get_object_address_publication_schema(List *object, bool missing_ok)
> +{
> + ObjectAddress address;
> + char*pubname;
> + Publication *pub;
> + char*schemaname;
> + Oid schemaoid;
> +
> + ObjectAddressSet(address, PublicationSchemaRelationId, InvalidOid);
> +
> + /* Fetch schema name and publication name from input list */
> + schemaname = strVal(linitial(object));
> + pubname = strVal(lsecond(object));
> +
> + schemaoid = get_namespace_oid(schemaname, false);
> +
> + /* Now look up the pg_publication tuple */
> + pub = GetPublicationByName(pubname, missing_ok);
> + if (!pub)
> + return address;
>
> Why the schema name handling is different from publication name? Why
> can't we pass missing_ok for schema api and handle it similar
> publication api?

I will handle this in my next version.

> 4. In getPublicationSchemaInfo(), why the missing_ok flag is not used
> in get_publication_name() whereas it is used for all other syscache
> searches in that function?

I will handle this in my next version.

> 5. Don't we need to expose a view for publication schemas similar to
> pg_publication_tables?

pg_publication_tables is a common view for both "FOR TABLE", "FOR ALL
TABLES" and "FOR ALL TABLES IN SCHEMA", this view will internally
access pg_publication_rel and pg_publication_schema to get the
corresponding tables. I felt we don't need a separate view for
publication schemas. Thoughts?

> 6.
> publication_add_schema()
> {
> ..
> + /* Can't be system namespace */
> + if (IsCatalogNamespace(schemaoid) || IsToastNamespace(schemaoid))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("\"%s\" is a system schema",
> + get_namespace_name(schemaoid)),
> + errdetail("System schemas cannot be added to publications.")));
> +
> + /* Can't be temporary namespace */
> + if (isAnyTempNamespace(schemaoid))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("\"%s\" is a temporary schema",
> + get_namespace_name(schemaoid)),
> + errdetail("Temporary schemas cannot be added to publications.")));
> ..
> }
>
> Can we change the first detail message as: "This operation is not
> supported for system schemas." and the second detail message as:
> "Temporary schemas cannot be replicated."? This is to make these
> messages similar to c

UNIQUE null treatment option

2021-08-27 Thread Peter Eisentraut



The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is apparently pretty easy; most of the patch is just to carry the flag 
around to all the places that need it.


The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

(I named all the internal flags, catalog columns, etc. in the
negative ("nulls not distinct") so that the default PostgreSQL
behavior is the default if the flag is false.  But perhaps the double
negatives make some code harder to read.)
From 14bd23b4f164c4298262e7fbfec1a49292d16e27 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 27 Aug 2021 14:31:46 +0200
Subject: [PATCH] Add UNIQUE null treatment option

The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

XXX I named all the internal flags, catalog columns, etc. in the
negative ("nulls not distinct") so that the default PostgreSQL
behavior is the default if the flag is false.  But perhaps the double
negatives make some code harder to read.
---
 doc/src/sgml/catalogs.sgml | 13 ++
 doc/src/sgml/ddl.sgml  | 29 +---
 doc/src/sgml/information_schema.sgml   | 12 +
 doc/src/sgml/ref/alter_table.sgml  |  4 +-
 doc/src/sgml/ref/create_index.sgml | 13 ++
 doc/src/sgml/ref/create_table.sgml | 11 ++---
 src/backend/access/nbtree/nbtinsert.c  | 10 ++---
 src/backend/access/nbtree/nbtsort.c| 15 ++-
 src/backend/catalog/index.c|  7 +++
 src/backend/catalog/information_schema.sql |  9 +++-
 src/backend/catalog/sql_features.txt   |  1 +
 src/backend/catalog/toasting.c |  1 +
 src/backend/commands/indexcmds.c   |  3 +-
 src/backend/nodes/copyfuncs.c  |  2 +
 src/backend/nodes/equalfuncs.c |  2 +
 src/backend/nodes/makefuncs.c  |  3 +-
 src/backend/nodes/outfuncs.c   |  2 +
 src/backend/parser/gram.y  | 47 
 src/backend/parser/parse_utilcmd.c |  3 ++
 src/backend/utils/adt/ruleutils.c  | 23 +++---
 src/backend/utils/cache/relcache.c |  1 +
 src/backend/utils/sort/tuplesort.c |  8 +++-
 src/bin/pg_dump/pg_dump.c  |  9 +++-
 src/bin/pg_dump/pg_dump.h  |  1 +
 src/bin/psql/describe.c| 30 ++---
 src/include/catalog/pg_index.h |  1 +
 src/include/nodes/execnodes.h  |  1 +
 src/include/nodes/makefuncs.h  |  2 +-
 src/include/nodes/parsenodes.h |  2 +
 src/include/utils/tuplesort.h  |  1 +
 src/test/regress/expected/create_index.out | 51 ++
 src/test/regress/input/constraints.source  | 14 ++
 src/test/regress/output/constraints.source | 23 ++
 src/test/regress/sql/create_index.sql  | 36 +++
 34 files changed, 332 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2b2c70a26e..fd49738d4f 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4256,6 +4256,19 @@ pg_index Columns
   
  
 
+ 
+  
+   indnullsnotdistinct bool
+  
+  
+   This value is only used for unique indexes.  If false, this unique
+   index will consider null values distinct (so the index can contain
+   multiple null values in a column, the default PostgreSQL behavior).  If
+   it is true, it will consider null values to be equal (so the index can
+   only contain one null value in a column).
+  
+ 
+
  
   
indisprimary bool
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index e0ffb020bf..815a2e23f9 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -759,14 +759,33 @@ Unique Constraints
 In general, a unique constraint is violated if there is more than
 on

Re: Queries that should be canceled will get stuck on secure_write function

2021-08-27 Thread Robert Haas

On Tue, Aug 24, 2021 at 9:58 PM Fujii Masao  wrote:
> I was thinking that it's valid even if secure_write() doesn't react to
> pg_cancel_backend() because it's basically called outside transaction block,
> i.e., there is no longer transaction to cancel in that case. But there can be
> some cases where secure_write() is called inside transaction block,
> for example, when the query generates NOTICE message. In these cases,
> secure_write() might need to react to the cancel request.

Yeah. I think we could also be sending tuple data.

> BTW, when an error happens, I found that a backend calls EmitErrorReport()
> to report an error to a client, and then calls AbortCurrentTransaction()
> to abort the transaction. If secure_write() called by EmitErrorReport()
> gets stuck, a backend gets stuck inside transaction block and the locks
> keep being held unnecessarily. Isn't this problematic? Can we change
> the order of them?

I think there might be problems with that, like perhaps the ErrorData
object can have pointers into the memory contexts that we'd be
destroying in AbortCurrentTransaction().

More generally, I think it's hopeless to try to improve the situation
very much by changing the place where secure_write() happens. It
happens in so many different places, and it is clearly not going to be
possible to make it so that in none of those places do we hold any
important server resources. Therefore I think the only solution is to
fix secure_write() itself ... and the trick is what to do there given
that we have to be very careful not to do anything that might try to
write another message while we are already in the middle of writing a
message.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Added schema level support for publication.

2021-08-27 Thread Amit Kapila

On Fri, Aug 27, 2021 at 11:43 AM vignesh C  wrote:
>
> On Wed, Aug 25, 2021 at 3:07 PM vignesh C  wrote:
>
> I have implemented this in the 0003 patch, I have kept it separate to
> reduce the testing effort and also it will be easier if someone
> disagrees with the syntax. I will merge it to the main patch later
> based on the feedback. Attached v22 patch has the changes for the
> same.
>

Few comments on v22-0001-Added-schema-level-support-for-publication:

1. Why in publication_add_schema(), you are registering invalidation
for all schema relations? It seems this is to allow rebuilding the
publication info for decoding sessions. But that is not required as
you are registering rel_sync_cache_publication_cb for
publication_schema relation. In rel_sync_cache_publication_cb, we are
marking replicate_valid as false for each entry which will allow
publication info to be rebuilt in get_rel_sync_entry.

I see that it is done for a single relation in the current code in
function publication_add_relation but I guess that is also not
required. You can once test this. If you still think it is required,
can you please explain me why and then we can accordingly add some
comments in the patch.

Peter E., Sawada-San, can you please let me know if I am missing
something in this regard? In the code, I see a comment "/* Invalidate
relcache so that publication info is rebuilt. */" in function
publication_add_relation() but I don't understand why it is required
as per my explanation above?

2.
+ * Publication object type
+ */
+typedef enum PublicationObjSpecType
+{
+ PUBLICATIONOBJ_TABLE, /* Table type */
+ PUBLICATIONOBJ_SCHEMA, /* Schema type */
+ PUBLICATIONOBJ_SEQUENCE, /* Sequence type */

Why add anything related to the sequence in this patch?

3.
+get_object_address_publication_schema(List *object, bool missing_ok)
+{
+ ObjectAddress address;
+ char*pubname;
+ Publication *pub;
+ char*schemaname;
+ Oid schemaoid;
+
+ ObjectAddressSet(address, PublicationSchemaRelationId, InvalidOid);
+
+ /* Fetch schema name and publication name from input list */
+ schemaname = strVal(linitial(object));
+ pubname = strVal(lsecond(object));
+
+ schemaoid = get_namespace_oid(schemaname, false);
+
+ /* Now look up the pg_publication tuple */
+ pub = GetPublicationByName(pubname, missing_ok);
+ if (!pub)
+ return address;

Why the schema name handling is different from publication name? Why
can't we pass missing_ok for schema api and handle it similar
publication api?

4. In getPublicationSchemaInfo(), why the missing_ok flag is not used
in get_publication_name() whereas it is used for all other syscache
searches in that function?

5. Don't we need to expose a view for publication schemas similar to
pg_publication_tables?

6.
publication_add_schema()
{
..
+ /* Can't be system namespace */
+ if (IsCatalogNamespace(schemaoid) || IsToastNamespace(schemaoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("\"%s\" is a system schema",
+ get_namespace_name(schemaoid)),
+ errdetail("System schemas cannot be added to publications.")));
+
+ /* Can't be temporary namespace */
+ if (isAnyTempNamespace(schemaoid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("\"%s\" is a temporary schema",
+ get_namespace_name(schemaoid)),
+ errdetail("Temporary schemas cannot be added to publications.")));
..
}

Can we change the first detail message as: "This operation is not
supported for system schemas." and the second detail message as:
"Temporary schemas cannot be replicated."? This is to make these
messages similar to corresponding messages for relations in function
check_publication_add_relation(). Can we move these checks to a
separate function?

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2021-08-27 Thread Masahiko Sawada

On Fri, Aug 27, 2021 at 1:36 PM Amit Kapila  wrote:
>
> On Thu, Aug 26, 2021 at 6:24 PM Masahiko Sawada  wrote:
> >
> > On Thu, Aug 26, 2021 at 9:11 PM Amit Kapila  wrote:
> > >
> > >
> > > Okay, changed accordingly. Additionally, I have changed the code which
> > > sets timestamp to (unset) when it is 0 so that it won't display the
> > > timestamp in that case. I have made few other cosmetic changes in the
> > > attached patch. See and let me know what you think of it?
> >
> > Thank you for the patch!
> >
> > Agreed with these changes. The patch looks good to me.
> >
>
> Pushed, feel free to rebase and send the remaining patch set.

Thanks!

I'll post the updated version patch on Monday.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Failure of subscription tests with topminnow

2021-08-27 Thread Ajin Cherian

On Fri, Aug 27, 2021 at 3:29 PM Amit Kapila  wrote:
>
>
> I think the fix is correct but similar changes are required in
> 022_twophase_cascade.pl as well (search for $oldpid in tests). I am
> not completely sure but I think it is better to make this test change
> in back branches as well to make it stable and reduce such random
> build farm failures.

I have made the changes in 022_twophase_cascade.pl for HEAD as well as
patches for older branches.

regards,
Ajin Cherian
Fujitsu Australia


HEAD-v2-0001-fix-for-tap-test-failure-seen-in-001_rep_changes.patch
Description: Binary data


REL-13-STABLE-v1-0001-fix-for-tap-test-failure-seen-in-001_rep_changes.patch
Description: Binary data


REL-12-STABLE-v1-0001-fix-for-tap-test-failure-seen-in-001_rep_changes.patch
Description: Binary data


REL-10-STABLE-v1-0001-fix-for-tap-test-failure-seen-in-001_rep_changes.patch
Description: Binary data


REL-11-STABLE-v1-0001-fix-for-tap-test-failure-seen-in-001_rep_changes.patch
Description: Binary data


REL-14-STABLE-v1-0001-fix-for-tap-test-failure-seen-in-001_rep_changes.patch
Description: Binary data

Re: Next Steps with Hash Indexes

2021-08-27 Thread Sadhuprasad Patro

On Fri, Aug 13, 2021 at 11:40 AM Dilip Kumar  wrote:
>
> On Fri, Aug 13, 2021 at 9:31 AM Amit Kapila  wrote:
> >
> > On Thu, Aug 12, 2021 at 8:30 PM Robert Haas  wrote:
> > >
> > > On Thu, Aug 12, 2021 at 12:22 AM Amit Kapila  
> > > wrote:
> > > > The design of the patch has changed since the initial proposal. It
> > > > tries to perform unique inserts by holding a write lock on the bucket
> > > > page to avoid duplicate inserts.
> > >
> > > Do you mean that you're holding a buffer lwlock while you search the
> > > whole bucket? If so, that's surely not OK.
> > >
> >
> > I think here you are worried that after holding lwlock we might
> > perform reads of overflow pages which is not a good idea. I think
> > there are fewer chances of having overflow pages for unique indexes so
> > we don't expect such cases in common
>
> I think if we identify the bucket based on the hash value of all the
> columns then there should be a fewer overflow bucket, but IIUC, in
> this patch bucket, is identified based on the hash value of the first
> column only so there could be a lot of duplicates on the first column.

IMHO, as discussed above, since other databases also have the
limitation that if you create a multi-column hash index then the hash
index can not be used until all the key columns are used in the search
condition. So my point is that users might be using the hash index
with this limitation and their use case might be that they want to
gain the best performance when they use this particular case and they
might not be looking for much flexibility like we provide in BTREE.

For reference:
https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html
https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/indexes-for-memory-optimized-tables?view=sql-server-ver15

We already know that performance will be better with a single hash for
multiple columns, but still I just wanted to check the performance
difference in PG. This might help us to decide the approach we need to
go for. With a quick POC of both the ideas, I have observed there is a
major performance advantage with single combined hash for multi-key
columns.

Performance Test Details: (Used PGBENCH Tool)
Initialize cmd: “./pgbench -i -s 100 -d postgres"

postgres=# \d+ pgbench_accounts

 Table "public.pgbench_accounts"

  Column  | Type  | Collation | Nullable | Default | Storage
| Compression | Stats target | Description

--+---+---+--+-+--+-+--+-

 aid  | integer   |   | not null | | plain
| |  |
 bid  | integer   |   |  | | plain
| |  |
 abalance | integer   |   |  | | plain
| |  |
 filler   | character(84) |   |  | | extended
| |  |

Indexes:
"pgbench_accounts_idx" hash (aid, bid)
Access method: heap
Options: fillfactor=100

Test Command: “./pgbench -j 1 postgres -C -M prepared -S -T 300”

Performance Test Results:
Idea-1: Single Hash value for multiple key columns
 TPS = ~292

Idea-2: Separate Hash values for each key column. But use only the
first one to search the bucket. Other hash values are used as payload
to get to the matching tuple before going to the heap.
 TPS = ~212

Note: Here we got near to 25% better performance in a single combine
hash approach with only TWO key columns. If we go for separate Hash
values for all key columns mentioned then there will be a performance
dip and storage also will be relatively higher when we have more key
columns.

I have just done separate POC patches to get the performance results
as mentioned above, there are many other scenarios like split case, to
be taken care further.
Attaching the POC patches here just for reference…

Thanks & Regards
SadhuPrasad
EnterpriseDB: http://www.enterprisedb.com

Single-Hash_MultiKeyColumns.patch
Description: Binary data

Separate-Hash_MultiKeyColumns.patch
Description: Binary data

Re: [PATCH] Tab completion for ALTER TABLE … ADD …

2021-08-27 Thread Dagfinn Ilmari Mannsåker

Michael Paquier  writes:

> On Tue, Aug 03, 2021 at 12:48:38PM +0100, Dagfinn Ilmari Mannsåker wrote:
>> The other day I noticed that there's no tab completion after ALTER TABLE
>> … ADD, so here's a patch.  In addition to COLUMN and all the table
>> constraint types, it also completes with the list of unique indexes on
>> the table after ALTER TABLE … ADD … USING INDEX.
>
> I was reading this patch (not actually tested), and that's a clear
> improvement.  One extra thing that could be done here is to complete
> with types for a ALTER TABLE ADD COLUMN foo.

That was easy enough to add (just a bit of extra fiddling to handle
COLUMN being optional), done in the attached v2 patch.

> We could as well have a list of columns after UNIQUE or PRIMARY KEY,
> but that feels like extra cream on top of the cake.

Doing a list of arbitrarily many comma-separated names is more
complicated, so that can be the subject for another patch.

>  In short I am fine with what you have here.

Thanks for the review.

- ilmari

>From 0bdf91f2a66bf9393e6900db904bda1961c03350 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Tue, 3 Aug 2021 12:23:07 +0100
Subject: [PATCH v2] =?UTF-8?q?Add=20tab=20completion=20for=20ALTER=20TABLE?=
 =?UTF-8?q?=20=E2=80=A6=20ADD=20=E2=80=A6?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Complete with COLUMN plus the various table constraint types, list
unique indexes on the table after ADD (UNQIUE|PRIMARY KEY) USING INDEX,
and data types after ADD [COLUMN] .
---
 src/bin/psql/tab-complete.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 7cdfc7c637..bb7c3fc5cf 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -776,6 +776,10 @@ static const SchemaQuery Query_for_list_of_collations = {
 "   and pg_catalog.quote_ident(c1.relname)='%s'"\
 "   and pg_catalog.pg_table_is_visible(c2.oid)"
 
+#define Query_for_unique_index_of_table \
+Query_for_index_of_table \
+"   and i.indisunique"
+
 /* the silly-looking length condition is just to eat up the current word */
 #define Query_for_constraint_of_table \
 "SELECT pg_catalog.quote_ident(conname) "\
@@ -2019,6 +2023,51 @@ psql_completion(const char *text, int start, int end)
 	  "OWNER TO", "SET", "VALIDATE CONSTRAINT",
 	  "REPLICA IDENTITY", "ATTACH PARTITION",
 	  "DETACH PARTITION", "FORCE ROW LEVEL SECURITY");
+	/* ALTER TABLE xxx ADD */
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD"))
+		COMPLETE_WITH("COLUMN", "CONSTRAINT", "CHECK", "UNIQUE", "PRIMARY KEY",
+	  "EXCLUDE", "FOREIGN KEY");
+	/* ALTER TABLE xxx ADD CONSTRAINT yyy */
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD", "CONSTRAINT", MatchAny))
+		COMPLETE_WITH("CHECK", "UNIQUE", "PRIMARY KEY", "EXCLUDE", "FOREIGN KEY");
+	/* ALTER TABLE xxx ADD (CONSTRAINT yyy)? FOREIGN|PRIMARY */
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD", "FOREIGN|PRIMARY") ||
+			 Matches("ALTER", "TABLE", MatchAny, "ADD", "CONSTRAINT", MatchAny, "FOREIGN|PRIMARY"))
+		COMPLETE_WITH("KEY");
+	/* ALTER TABLE xxx ADD (CONSTRAINT yyy)? (PRIMARY KEY|UNIQUE) */
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD", "PRIMARY", "KEY") ||
+			 Matches("ALTER", "TABLE", MatchAny, "ADD", "UNIQUE") ||
+			 Matches("ALTER", "TABLE", MatchAny, "ADD", "CONSTRAINT", MatchAny, "PRIMARY", "KEY") ||
+			 Matches("ALTER", "TABLE", MatchAny, "ADD", "CONSTRAINT", MatchAny, "UNIQUE"))
+		COMPLETE_WITH("(", "USING INDEX");
+	/* ALTER TABLE xxx ADD (CONSTRAINT yyy)? ... USING INDEX */
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD", "PRIMARY", "KEY", "USING", "INDEX"))
+	{
+		completion_info_charp = prev6_wd;
+		COMPLETE_WITH_QUERY(Query_for_unique_index_of_table);
+	}
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD", "UNIQUE", "USING", "INDEX"))
+	{
+		completion_info_charp = prev5_wd;
+		COMPLETE_WITH_QUERY(Query_for_unique_index_of_table);
+	}
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD", "CONSTRAINT", MatchAny,
+	 "PRIMARY", "KEY", "USING", "INDEX"))
+	{
+		completion_info_charp = prev8_wd;
+		COMPLETE_WITH_QUERY(Query_for_unique_index_of_table);
+	}
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD", "CONSTRAINT", MatchAny,
+	 "UNIQUE", "USING", "INDEX"))
+	{
+		completion_info_charp = prev7_wd;
+		COMPLETE_WITH_QUERY(Query_for_unique_index_of_table);
+	}
+	/* ADD column_name must be last of the ALTER TABLE xxx ADD matches */
+	else if (Matches("ALTER", "TABLE", MatchAny, "ADD", "COLUMN", MatchAny) ||
+			 (Matches("ALTER", "TABLE", MatchAny, "ADD", MatchAny) &&
+			  !Matches("ALTER", "TABLE", MatchAny, "ADD", "COLUMN")))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_datatypes, NULL);
 	/* ALTER TABLE xxx ENABLE */
 	else if (Matches("ALTER", "TABLE", MatchAny, "ENABLE"))
 		COMPLETE_WITH("ALWAYS", "REPLICA", "ROW LEVEL SECURITY", "RULE",
-- 
2.30.2

Re: create table like: ACCESS METHOD

2021-08-27 Thread Vik Fearing

On 3/23/21 1:39 AM, Justin Pryzby wrote:
> On Tue, Jan 19, 2021 at 03:03:31PM -0600, Justin Pryzby wrote:
>> On Wed, Dec 30, 2020 at 12:33:56PM +, Simon Riggs wrote:
>>> There are no tests for the new functionality, please could you add some?
>>
>> Did you look at the most recent patch?
>>
>> +CREATE ACCESS METHOD heapdup TYPE TABLE HANDLER heap_tableam_handler;
>> +CREATE TABLE likeam() USING heapdup;
>> +CREATE TABLE likeamlike(LIKE likeam INCLUDING ALL); 
>>  
>>  

It seems like this should error to me:

CREATE ACCESS METHOD heapdup TYPE TABLE HANDLER heap_tableam_handler;
CREATE TABLE likeam1() USING heap;
CREATE TABLE likeam2() USING heapdup;
CREATE TABLE likeamlike(
LIKE likeam1 INCLUDING ACCESS METHOD,
LIKE likeam2 INCLUDING ACCESS METHOD
);

At the very least, the documentation should say that the last one wins.
-- 
Vik Fearing

Re: perlcritic: prohibit map and grep in void conext

2021-08-27 Thread Dagfinn Ilmari Mannsåker

Michael Paquier  writes:

> On Wed, Jul 28, 2021 at 01:26:23PM +0200, Daniel Gustafsson wrote:
>> I'm fine with increasing this policy, but I don't have strong feelings.  If 
>> we
>> feel the perlcritic policy change is too much, I would still prefer to go 
>> ahead
>> with the map rewrite part of the patch though.
>
> I have no issue either about the rewrite part of the patch, so I'd
> tend to just do this part and move on.  Daniel, would you like to
> apply that?

Why the resistance to the perlcritic part?  That one case is the only
violation in the tree today, and it's a pattern we don't want to let
back in (I will surely object every time I see it when reviewing
patches), so why not automate it?

- ilmari

Re: Mark all GUC variable as PGDLLIMPORT

2021-08-27 Thread Craig Ringer

On Fri, 27 Aug 2021 at 08:59, Julien Rouhaud  wrote:

> On Fri, Aug 27, 2021 at 3:42 AM Magnus Hagander 
> wrote:
> >
> > Yeah, but that does move the problem to the other side doesn't it? So
> > if you (as a pure test of course) were to remove the variable
> > completely from the included header and just declare it manually with
> > PGDLLSPEC in your file, it should work?
> >
> > Ugly as it is, I wonder if there's a chance we could just process all
> > the headers at install times and inject the PGDLLIMPORT. We know which
> > symvols it is on account of what we're getting in the DEF file.
> >
> > Not saying that's not a very ugly solution, but it might work?
>
> It's apparently not enough.  I tried with autovacuum_max_workers GUC,
> and it still errors out.
> If I add a PGDLLIMPORT, there's a link error when trying to access the
> variable:
> error LNK2019: unresolved external symbol __imp_autovacuum_max_workers
> referenced in function...
>
> If I use PGDLLEXPORT I simply get:
> error LNK2001: unresolved external symbol aytovacuum_max_workers
>

It works, but you can't use PGDLLIMPORT, you have to implement the import
yourself without the help of __declspec(dllimport) .

Where you want

autovacuum_max_workers

you must instead write

*((int*)__imp_autovacuum_max_workers)

Here's the comment I wrote on the topic in something I was working on:

/*
 * On Windows, a symbol is not accessible outside the executable or shared
 * library (PE object) that it is defined in unless explicitly exported in
 * the DLL interface.
 *
 * It must then be explicitly imported by objects that use it; Windows
doesn't
 * do ELF-style fixups.
 *
 * The export part is usually accomplished by a __declspec(dllexport)
 * annotation on the symbol or a .DEF file supplied as linker input.
Postgres
 * uses the .DEF approach, auto-exporting all symbols using
 * src\tools\msvc\gendef.pl . Internally this hides "symname" from the DLL
 * interface and instead generates an export symbol "__imp_symname" that is
a
 * pointer to the value of "symname".
 *
 * The import part is usually done with the __declspec(dllimport)
annotation on
 * the symbol. The PGDLLIMPORT macro expands to __declspec(dllimport) when
 * postgres headers are included during extension compilation. But not all
the
 * symbols that pglogical needs are annotated with PGDLLIMPORT. Attempting
to
 * access a symbol that is not so-annotated will fail at link time with an
 * error like
 *
 * error LNK2001: unresolved external symbol
criticalSharedRelcachesBuilt
 *
 * Because of gendefs.pl, postgres still exports the symbol even if it isn't
 * annotated PGDLLIMPORT. So we can just do the shorthand that
 * __declspec(dllimport) does for us in the preprocessor instead: replace
each
 * symbol with its __imp_symbol indirection and dereference it.
 *
 * There's one wrinkle in this though. MSVC doesn't generate a definition
for a
 * global data symbol that is neither initialized nor explicitly marked
 * __declspec(dllexport). gendefs.pl will think such symbols are references
to
 * a symbol defined in another object file and will skip them without
emitting
 * a DATA entry for them in the DEF file, so no __imp_ stub is generated in
the
 * DLL interface. We can't use (*__imp_symbolname) if there's no import
stub.
 *
 * If they're GUCs, we can round-trip them through their text values
 * to read them. Nothing should ever be assigning to GUC storage and
there's no
 * reason to take the address of GUC storage, so this should work fine,
albeit
 * slower. If we find any that aren't GUCs we're in trouble but so far there
 * haven't been any.
 *
 * See also:
 *
 * -
https://docs.microsoft.com/en-us/cpp/build/importing-data-using-declspec-dllimport
 * - https://docs.microsoft.com/en-us/cpp/build/importing-using-def-files
 * -
https://docs.microsoft.com/en-us/cpp/build/exporting-from-a-dll-using-def-files
 * -
https://docs.microsoft.com/en-us/cpp/build/determining-which-exporting-method-to-use
 */

This is gruesome and I hadn't planned to mention it, but now someone
noticed the .DEF file exports the symbols I guess it does no harm.

So can we just fix PGDLLIMPORT now?

Re: Mark all GUC variable as PGDLLIMPORT

2021-08-27 Thread Craig Ringer

On Thu, 26 Aug 2021 at 01:51, Alvaro Herrera 
wrote:

> On 2021-Aug-25, Magnus Hagander wrote:
>
> > The thing we need the PGDLLIMPORT definition for is to *import* them
> > on the other end?
>
> Oh ... so modules that are willing to cheat can include their own
> declarations of the variables they need, and mark them __declspec
> (dllimport)?
>
>
Damn. I was hoping nobody would notice that.

I do exactly that in some extensions to work around some of this mess, but
it is quite awkward and has its limitations.

Re: Mark all GUC variable as PGDLLIMPORT

2021-08-27 Thread Craig Ringer

On Wed, 25 Aug 2021 at 22:36, Magnus Hagander  wrote:

> On Wed, Aug 25, 2021 at 4:06 PM Robert Haas  wrote:
> >
> > On Tue, Aug 24, 2021 at 5:06 PM Chapman Flack 
> wrote:
> > > The thing is, I think I have somewhere a list of all the threads on
> this
> > > topic that I've read through since the first time I had to come with
> my own
> > > hat in hand asking for a PGDLLIMPORT on something, years ago now, and
> > > I don't think I have ever seen one where it was as uncontroversial
> > > as you suggest.
> >
> > It does tend to be controversial, but I think that's basically only
> > because Tom Lane has reservations about it. I think if Tom dropped his
> > opposition to this, nobody else would really care. And I think that
> > would be a good thing for the project.
>
>
> I have only one consideration about it, and that's a technical one :)
>
> Does this in some way have an effect on the size of the binary and/or
> the time it takes to load it?
>

On *nix, no.

On Windows, very, very minimally.

We *should* be looking into making  private symbols we can't make
non-static have hidden visibility at link time, i.e. be DSO-private.  This
can have a huge impact on link-time optimisation and inlining.

But doing so is quite orthogonal to the matter of fixing a linkage issue on
Windows. By making select symbols hidden we'd be *reducing* the exposed set
of functions and data symbols in a disciplined and progressive way on all
platforms. Useful but different.

Re: Mark all GUC variable as PGDLLIMPORT

2021-08-27 Thread Craig Ringer

On Wed, 25 Aug 2021 at 03:13, Robert Haas  wrote:

> On Tue, Aug 24, 2021 at 2:52 PM Chapman Flack 
> wrote:
> > I don't think that's true of the second proposal in [0]. I don't foresee
> > a noticeable runtime cost unless there is a plausible workload that
> > involves very frequent updates to GUC settings that are also of interest
> > to a bunch of extensions. Maybe I'll take a stab at a POC.
>
> I'm not sure I fully understand that proposal, but I find it hard to
> believe that we would seriously consider replacing every direct GUC
> reference in the backend with something that goes through an API. Even
> if didn't hurt performance, I think it would uglify the code a whole
> lot.
>

It'd probably have to be something that resolves the GUC storage addresses
at compile-time or once at runtime, if it's going to be used by core code.
While some code doesn't hit a lot of GUCs, some *really* hammers some
common GUCs.

There are various issues with cache lines and pointer chasing that are
beyond my low-level fu at work here. Adding a level of pointer indirection
can be very expensive in the wrong situations.

So you're probably looking at some kind of mess with token pasting, macros
and static inlines. Ew.

Re: Partition Check not updated when insert into a partition

2021-08-27 Thread Amit Langote

On Thu, Aug 5, 2021 at 11:32 AM Amit Langote  wrote:
> On Wed, Jul 14, 2021 at 11:16 AM houzj.f...@fujitsu.com
> > On Tuesday, July 13, 2021 2:52 AM Alvaro Herrera  
> > wrote:
> > > Did you have a misbehaving test for the ATTACH case?
> >
> > Thanks for the response.
>
> Thank you both.
>
> > Yes, I think the following example of ATTACH doesn't work as expected.
>
> Yeah, need the fix for the ATTACH case too.
>
> Couple more things:
>
> * We must invalidate not just the "direct" partitions of the table
> being attached/detached, but also any indirect ones, because all of
> their partition constraints would need to contain (or no longer
> contain) the root parent's partition constraint.
>
> * I think we should lock the partitions before sending the
> invalidation.  The ATTACH code already locks the descendents for a
> different purpose, but DETACH doesn't, so the latter needs to be fixed
> to match.
>
> I've updated Alvaro's patch to address these points.  Maybe, we should
> also add these cases to the regression and isolation suites?

Apparently, I had posted a version of the patch that didn't even compile.

I have fixed that in the attached and also added regression tests.

Adding this to the next CF.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v3-0001-Invalidate-partitions-of-table-being-attached-det.patch
Description: Binary data

Re: Multi-Column List Partitioning

2021-08-27 Thread Nitin Jadhav

> + * isnulls is an array of boolean-tuples with key->partnatts booleans values
> + * each.  Currently only used for list partitioning, it stores whether a
>
> I think 'booleans' should be 'boolean'.
> The trailing word 'each' is unnecessary.

> bq. Supported new syantx to allow mentioning multiple key information.
>
> syantx -> syntax

> +   isDuplicate = checkForDuplicates(result, values);
> +   if (isDuplicate)
> +   continue;
>
> It seems the variable isDuplicate is not needed. The if statement can 
> directly check the return value from checkForDuplicates().

I agree that isDuplicate is not required.
Thanks for sharing the comments. I will take care of these comments in
the next patch.

> +   //TODO: Handle for multi-column cases
> +   for (j = 0; j < 1; j++)
>
> Is this part going to be updated in the next patch?

Yes. The code changes related to partition-wise join are in progress.
I will handle these in the next patch.

Thanks & Regards,
Nitin Jadhav

On Thu, Aug 26, 2021 at 2:40 AM Zhihong Yu  wrote:
>
>
>
> On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav  
> wrote:
>>
>> > The new list bound binary search and related comparison support
>> > function look a bit too verbose to me.  I was expecting
>> > partition_list_bsearch() to look very much like
>> > partition_range_datum_bsearch(), but that is not the case.  The
>> > special case code that you wrote in partition_list_bsearch() seems
>> > unnecessary, at least in that function.  I'm talking about the code
>> > fragment starting with this comment:
>> >
>> > I will look at other parts of the patch next week hopefully.   For
>> > now, attached is a delta patch that applies on top of your v1, which
>> > does:
>> >
>> > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
>> > * Make qsort_partition_list_value_cmp simply call
>> > partition_lbound_datum_cmp() instead of having its own logic to
>> > compare input bounds
>> > * Move partition_lbound_datum_cmp() into partbounds.c as a static
>> > function (export seems unnecessary)
>> > * Add a comment for PartitionBoundInfo.isnulls and remove that for 
>> > null_index
>>
>> Yes. You are right. The extra code added in partition_list_bsearch()
>> is not required and thanks for sharing the delta patch. It looks good
>> to me and I have incorporated the changes in the attached patch.
>>
>> > I guess you're perhaps trying to address the case where the caller
>> > does not specify the values for all of the partition key columns,
>> > which can happen when the partition pruning code needs to handle a set
>> > of clauses matching only some of the partition key columns.  But
>> > that's a concern of the partition pruning code and so the special case
>> > should be handled there (if at all), not in the binary search function
>> > that is shared with other callers.  Regarding that, I'm wondering if
>> > we should require clauses matching all of the partition key columns to
>> > be found for the pruning code to call the binary search, so do
>> > something like get_matching_hash_bounds() does:
>> >
>> > Do you think that trying to match list partitions even with fewer keys
>> > is worth the complexity of the implementation?  That is, is the use
>> > case to search for only a subset of partition key columns common
>> > enough with list partitioning?
>> >
>> > If we do decide to implement the special case, remember that to do
>> > that efficiently, we'd need to require that the subset of matched key
>> > columns constitutes a prefix, because of the way the datums are
>> > sorted.  That is, match all partitions when the query only contains a
>> > clause for b when the partition key is (a, b, c), but engage the
>> > special case of pruning if the query contains clauses for a, or for a
>> > and b.
>>
>> Thanks for the suggestion. Below is the implementation details for the
>> partition pruning for multi column list partitioning.
>>
>> In the existing code (For single column list partitioning)
>> 1. In gen_partprune_steps_internal(), we try to match the where
>> clauses provided by the user with the partition key data using
>> match_clause_to_partition_key(). Based on the match, this function can
>> return many values like PARTCLAUSE_MATCH_CLAUSE,
>> PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
>> 2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
>> gen_prune_steps_from_opexps() (strategy-2) which generate and return a
>> list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
>> clauses that have been matched to the partition key and it also takes
>> care handling prefix of the partition keys.
>> 3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
>> gen_prune_step_op() (strategy-1) which generates single
>> PartitionPruneStepOp since the earlier list partitioning supports
>> single column and there can be only one NULL value. In
>> get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
>> partition index which accepts n

Re: Error code for checksum failure in origin.c

2021-08-27 Thread Daniel Gustafsson

> On 27 Aug 2021, at 06:32, Amit Kapila  wrote:

> I think we need to backpatch this till 9.6 as this is introduced by
> commit 5aa2350426. Any objections to that?

No, that seems appropriate.

--
Daniel Gustafsson   https://vmware.com/

Re: perlcritic: prohibit map and grep in void conext

2021-08-27 Thread Daniel Gustafsson

> On 27 Aug 2021, at 08:10, Michael Paquier  wrote:
> 
> On Wed, Jul 28, 2021 at 01:26:23PM +0200, Daniel Gustafsson wrote:
>> I'm fine with increasing this policy, but I don't have strong feelings.  If 
>> we
>> feel the perlcritic policy change is too much, I would still prefer to go 
>> ahead
>> with the map rewrite part of the patch though.
> 
> I have no issue either about the rewrite part of the patch, so I'd
> tend to just do this part and move on.  Daniel, would you like to
> apply that?

Sure, I can take care of that.

--
Daniel Gustafsson   https://vmware.com/

Re: Supply restore_command to pg_rewind via CLI argument

2021-08-27 Thread Andrey Borodin




> 29 июня 2021 г., в 19:34, Alexey Kondratov  
> написал(а):
> 
> On Fri, Jun 18, 2021 at 10:06 PM Alexey Kondratov
>  wrote:
>> On Fri, Jun 18, 2021 at 5:42 PM Andrey Borodin  wrote:
>>> 
>>> If we run 'pg_rewind --restore-target-wal' there must be restore_command in 
>>> config of target installation. But if the config is not within 
>>> $PGDATA\postgresql.conf pg_rewind cannot use it.
>>> If we run postmaster with `-c 
>>> config_file=/etc/postgresql/10/data/postgresql.conf`, we simply cannot use 
>>> the feature. We solved the problem by putting config into PGDATA only 
>>> during pg_rewind, but this does not seem like a very robust solution.
>>> 
>> 
>> Yeah, Michael was against it, while we had no good arguments, so
>> Alexander removed it, IIRC. This example sounds reasonable to me. I
>> also recall some complaints from PostgresPro support folks, that it is
>> sad to not have a cli option to pass restore_command. However, I just
>> thought about another recent feature --- ensure clean shutdown, which
>> is turned on by default. So you usually run Postgres with one config,
>> but pg_rewind may start it with another, although in single-user mode.
>> Is it fine for you?
We rewind failovered node, so clean shutdown was not performed. But I do not 
see how it could help anyway.
To pass restore command we had to setup new config in PGDATA configured as 
standby, because either way we cannot set restore_command there.

>>> Maybe we could add "-C, --target-restore-command=COMMAND  target WAL 
>>> restore_command\n" as was proposed within earlier versions of the patch[0]? 
>>> Or instruct pg_rewind to pass config to 'postgres -C restore_command' run?
>> 
>> Hm, adding --target-restore-command is the simplest way, sure, but
>> forwarding something like '-c config_file=...' to postgres sounds
>> interesting too. Could it have any use case beside providing a
>> restore_command? I cannot imagine anything right now, but if any
>> exist, then it could be a more universal approach.
I think --target-restore-command is the best solution right now.

>>> From my POV adding --target-restore-command is simplest way, I can extract 
>>> corresponding portions from original patch.
>>> 
>> 
>> I will have a look, maybe I even already have this patch separately. I
>> remember that we were considering adding this option to PostgresPro,
>> when we did a backport of this feature.
>> 
> 
> Here it is. I have slightly adapted the previous patch to the recent
> pg_rewind changes. In this version -C does not conflict with -c, it
> just overrides it.

Great, thanks!

There is a small bug
+   /*
+* Take restore_command from the postgresql.conf only if it is not 
already
+* provided as a command line option.
+*/
+   if (!restore_wal && restore_command == NULL)
return;

I think we should use condition (!restore_wal || restore_command != NULL).

Besides this patch looks good and is ready for committer IMV.

Thanks!

Best regards, Andrey Borodin.

63 matches

Mail list logo