date:20221129

Re: ExecRTCheckPerms() and many prunable partitions

2022-11-29 Thread Alvaro Herrera

On 2022-Nov-29, Amit Langote wrote:

> Maybe, we should have the following there so that the PlannedStmt's
> contents don't point into the Query?
> 
> newperminfo = copyObject(perminfo);

Hmm, I suppose if we want a separate RTEPermissionInfo node, we should
instead do GetRTEPermissionInfo(rte) followed by
AddRTEPermissionInfo(newrte) and avoid the somewhat cowboy-ish coding
there.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Collation version tracking for macOS

2022-11-29 Thread Jeremy Schneider

On 11/28/22 6:54 PM, Jeff Davis wrote:

> 
> =# select * from pg_icu_collation_versions('en_US') order by
> icu_version;
>  icu_version | uca_version | collator_version 
> -+-+--
>  ...
>  67.1| 13.0| 153.14
>  68.2| 13.0| 153.14
>  69.1| 13.0| 153.14
>  70.1| 14.0| 153.112
> (21 rows)
> 
> This is good information, because it tells us that major library
> versions change more often than collation versions, empirically-
> speaking.

It seems to me that the collator_version field is not a good version
identifier to use.

Just taking a quick glance at the ICU home page right now, it shows that
all of the last 5 versions of ICU have included "additions and
corrections" to locale data itself, including 68 to 69 where the
collator version did not change.

Is it possible that this "collator_version" only reflects the code that
processes collation data to do comparisons/sorts, but it does not
reflect updates to the locale data itself?

https://icu.unicode.org/

ICU v72 -> CLDR v42
ICU v71 -> CLDR v41
ICU v70 -> CLDR v40
ICU v69 -> CLDR v39
ICU v68 -> CLDR v38

-Jeremy

-- 
http://about.me/jeremy_schneider

Re: Collation version tracking for macOS

2022-11-29 Thread Robert Haas

On Mon, Nov 28, 2022 at 11:49 PM Jeff Davis  wrote:
> Not necessarily, #2-4 (at least as implemented in v7) can only load one
> major version at a time, so can't specify minor versions:
> https://www.postgresql.org/message-id/9f8e9b5a3352478d4cf7d6c0a5dd7e82496be4b6.ca...@j-davis.com
>
> With #1, you can provide control over the search order to find the
> symbol you want. Granted, if you want to specify that different
> collations look in different libraries for the same version, then it
> won't work, because the search order is global -- is that what you're
> worried about? If so, I think we need to compare it against the
> downsides of #2-4, which in my opinion are more serious.

You know more about this than I do, for sure, so don't let my vote
back the project into a bad spot. But, yeah, the thing you mention
here is what I'm worried about. Without a way to force a certain
behavior for a certain particular collation, you don't have an escape
valve if the global library ordering isn't doing what you want. Your
argument seems to at least partly be that #1 will be more usable on
the whole, and that does seem like an important consideration. People
may have a lot of collations and adjusting them all individually could
be difficult and unpleasant. However, I think it's also worth asking
what options someone has if #1 can't be made to work due to a single
ordering controlling every collation.

It's entirely possible that the scenario I'm worried about is too
remote in practice to be concerned about. I don't know how this stuff
works well enough to be certain. It's just that, on the basis of
previous experience, (1) it's not that uncommon for people to actually
end up in situations that we thought shouldn't ever happen and (2)
code that deals with collations is more untrustworthy than average.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-29 Thread SATYANARAYANA NARLAPURAM

On Tue, Nov 29, 2022 at 8:42 AM SATYANARAYANA NARLAPURAM <
satyanarlapu...@gmail.com> wrote:

>
>
> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian  wrote:
>
>> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
>> > 2. Process proc die immediately when a backend is waiting for sync
>> > replication acknowledgement, as it does today, however, upon
>> restart,
>> > don't open up for business (don't accept ready-only connections)
>> > unless the sync standbys have caught up.
>> >
>> >
>> > Are you planning to block connections or queries to the database? It
>> would be
>> > good to allow connections and let them query the monitoring views but
>> block the
>> > queries until sync standby have caught up. Otherwise, this leaves a
>> monitoring
>> > hole. In cloud, I presume superusers are allowed to connect and monitor
>> (end
>> > customers are not the role members and can't query the data). The same
>> can't be
>> > true for all the installations. Could you please add more details on
>> your
>> > approach?
>>
>> I think ALTER SYSTEM should be allowed, particularly so you can modify
>> synchronous_standby_names, no?
>
>
> Yes, Change in synchronous_standby_names is expected in this situation.
> IMHO, blocking all the connections is not a recommended approach.
>

How about allowing superusers (they can still read locally committed data)
and users part of pg_monitor role?


>
>>
>> --
>>   Bruce Momjian  https://momjian.us
>>   EDB  https://enterprisedb.com
>>
>> Embrace your flaws.  They make you human, rather than perfect,
>> which you will never be.
>>
>

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Bruce Momjian

On Tue, Nov 29, 2022 at 11:41:07AM -0500, Robert Haas wrote:
> My argument is that removing xidStopLimit is totally fine, because it
> only serves to stop the database. What to do about xidWarnLimit is a
> slightly more complex question. Certainly it can't be left untouched,
> because warning that we're about to shut down the database for lack of
> allocatable XIDs is not sensible if there is no such lack and we
> aren't going to shut it down. But I'm also not sure if the model is
> right. Doing nothing for a long time and then warning in every
> transaction when some threshold is crossed is an extreme behavior
> change. Right now that's somewhat justified because we're about to hit
> a brick wall at full speed, but if we remove the brick wall and
> replace it with a gentle pelting with rotten eggs, it's unclear that a
> similarly strenuous reaction is the right model. But that's also not
> to say that we should do nothing at all.

Yeah, we would probably need to warn on every 1 million transactions or
something.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-29 Thread SATYANARAYANA NARLAPURAM

On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian  wrote:

> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> > 2. Process proc die immediately when a backend is waiting for sync
> > replication acknowledgement, as it does today, however, upon restart,
> > don't open up for business (don't accept ready-only connections)
> > unless the sync standbys have caught up.
> >
> >
> > Are you planning to block connections or queries to the database? It
> would be
> > good to allow connections and let them query the monitoring views but
> block the
> > queries until sync standby have caught up. Otherwise, this leaves a
> monitoring
> > hole. In cloud, I presume superusers are allowed to connect and monitor
> (end
> > customers are not the role members and can't query the data). The same
> can't be
> > true for all the installations. Could you please add more details on your
> > approach?
>
> I think ALTER SYSTEM should be allowed, particularly so you can modify
> synchronous_standby_names, no?


Yes, Change in synchronous_standby_names is expected in this situation.
IMHO, blocking all the connections is not a recommended approach.


>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
> Embrace your flaws.  They make you human, rather than perfect,
> which you will never be.
>

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Robert Haas

On Tue, Nov 29, 2022 at 8:03 AM Chris Travers  wrote:
> To be clear, I never suggested shutting down the database.  What I have 
> suggested is that repurposing the current approaching-xid-wraparound warnings 
> to start complaining loudly when a threshold is exceeded would be helpful.   
> I think it makes sense to make that threshold configurable especially if we 
> eventually have people running bloat-free table structures.
>
> There are two fundamental problems here.  The first is that if, as you say, a 
> table is progressively bloating and we are getting further and further behind 
> on vacuuming and freezing, something is seriously wrong and we need to do 
> something about it.  In these cases, I my experience is that vacuuming and 
> related tools tend to suffer degraded performance, and determining how to 
> solve the problem takes quite a bit more time than a routine bloat issue 
> would.  So what I am arguing against is treating the problem just as a bloat 
> issue.  If you get there due to vacuum being slow, something else is wrong 
> and you are probably going to have to find and fix that as well in order to 
> catch up.  At least that's my experience.
>
> I don't object to the db continuing to run, allocate xids etc.  What I object 
> to is it doing so in silently where things are almost certainly going very 
> wrong.

OK. My feeling is that the set of things we can do to warn the user is
somewhat limited. I'm open to trying our best, but we need to have
reasonable expectations. Sophisticated users will be monitoring for
problems even if we do nothing to warn, and dumb ones won't look at
their logs. Any feature that proposes to warn must aim at the uses who
are smart enough to check the logs but dumb enough not to have any
more sophisticated monitoring. Such users certainly exist and are not
even uncommon, but they aren't the only kind by a long shot.

My argument is that removing xidStopLimit is totally fine, because it
only serves to stop the database. What to do about xidWarnLimit is a
slightly more complex question. Certainly it can't be left untouched,
because warning that we're about to shut down the database for lack of
allocatable XIDs is not sensible if there is no such lack and we
aren't going to shut it down. But I'm also not sure if the model is
right. Doing nothing for a long time and then warning in every
transaction when some threshold is crossed is an extreme behavior
change. Right now that's somewhat justified because we're about to hit
a brick wall at full speed, but if we remove the brick wall and
replace it with a gentle pelting with rotten eggs, it's unclear that a
similarly strenuous reaction is the right model. But that's also not
to say that we should do nothing at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-29 Thread Bruce Momjian

On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> 2. Process proc die immediately when a backend is waiting for sync
> replication acknowledgement, as it does today, however, upon restart,
> don't open up for business (don't accept ready-only connections)
> unless the sync standbys have caught up.
> 
> 
> Are you planning to block connections or queries to the database? It would be
> good to allow connections and let them query the monitoring views but block 
> the
> queries until sync standby have caught up. Otherwise, this leaves a monitoring
> hole. In cloud, I presume superusers are allowed to connect and monitor (end
> customers are not the role members and can't query the data). The same can't 
> be
> true for all the installations. Could you please add more details on your
> approach?

I think ALTER SYSTEM should be allowed, particularly so you can modify
synchronous_standby_names, no?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Patch: Global Unique Index

2022-11-29 Thread Laurenz Albe

On Tue, 2022-11-29 at 13:58 +0100, Vik Fearing wrote:
> I disagree.  A user does not need to know that a table is partitionned, 
> and if the user wants a unique constraint on the table then making them 
> type an extra word to get it is just annoying.

Hmm.  But if I created a primary key without thinking too hard about it,
only to discover later that dropping old partitions has become a problem,
I would not be too happy either.

Yours,
Laurenz Albe

Re: Collation version tracking for macOS

2022-11-29 Thread Joe Conway


On 11/28/22 14:11, Robert Haas wrote:

On Wed, Nov 23, 2022 at 12:09 AM Thomas Munro  wrote:

OK.  Time for a new list of the various models we've discussed so far:

1.  search-by-collversion:  We introduce no new "library version"
concept to COLLATION and DATABASE object and little or no new syntax.

2.  lib-version-in-providers: We introduce a separate provider value
for each ICU version, for example ICU63, plus an unversioned ICU like
today.

3.  lib-version-in-attributes: We introduce daticuversion (alongside
datcollversion) and collicuversion (alongside collversion).  Similar
to the above, but it's a separate property and the provider is always
ICU.  New syntax for CREATE/ALTER COLLATION/DATABASE to set and change
ICU_VERSION.

4.  lib-version-in-locale:  "63:en" from earlier versions.  That was
mostly a strawman proposal to avoid getting bogged down in
syntax/catalogue/model change discussions while trying to prove that
dlopen would even work.  It doesn't sound like anyone really likes
this.

5.  lib-version-in-collversion:  We didn't explicitly discuss this
before, but you hinted at it: we could just use u_getVersion() in
[dat]collversion.


I'd like to vote against #3 at least in the form that's described
here. If we had three more libraries providing collations, it's likely
that they would need versioning, too. So if we add an explicit notion
of provider version, then it ought not to be specific to libicu.


+many


I think it's OK to decide that different library versions are
different providers (your option #2), or that they are the same
provider but give rise to different collations (your option #4), or
that there can be multiple version of each collation which are
distinguished by some additional provider version field (your #3 made
more generic).


I think provider and collation version are distinct concepts. The 
provider ('c' versus 'i' for example) determines a unique code path in 
the backend due to different APIs, whereas collation version is related 
to a specific ordering given a set of characters.




I don't really understand #1 or #5 well enough to have an educated
opinion, but I do think that #1 seems a bit magical. It hopes that the
combination of a collation name and a datcollversion will be
sufficient to find exactly one matcing collation in a list of provided
libraries. The advantage of that, as I understand it, is that if you
do something to your system that causes the number of matches to go
from one to zero, you can just throw another library on the pile and
get the number back up to one. Woohoo! But there's a part of me that
worries: what if the number goes up to two, and they're not all the
same? Probably that's something that shouldn't happen, but if it does
then I think there's kind of no way to fix it. With the other options,
if there's some way to jigger the catalog state to match what you want
to happen, you can always repair the situation somehow, because the
library to be used for each collation is explicitly specified in some
way, and you just have to get it to match what you want to have
happen.


My vote is for something like #5. The collversion should indicate a 
specific immutable ordering behavior.



--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Robert Haas

On Mon, Nov 28, 2022 at 4:53 PM Peter Geoghegan  wrote:
> Imagine if we actually had 64-bit XIDs -- let's assume for a moment
> that it's a done deal. This raises a somewhat awkward question: do you
> just let the system get further and further behind on freezing,
> forever? We can all agree that 2 billion XIDs is very likely the wrong
> time to start refusing new XIDs -- because it just isn't designed with
> debt in mind. But what's the right time, if any? How much debt is too
> much?

I simply don't see a reason to ever stop the server entirely. I don't
even agree with the idea of slowing down XID allocation, let alone
refusing it completely. When the range of allocated XIDs become too
large, several bad things happen. First, we become unable to allocate
new XIDs without corrupting the database. Second, pg_clog and other
SLRUs become uncomfortably large. There may be some other things too
that I'm not thinking about. But these things are not all equally bad.
If these were medical problems, being unable to allocate new XIDs
without data corruption would be a heart attack, and SLRUs getting
bigger on disk would be acne. You don't handle problems of such wildly
differing severity in the same way. When someone is having a heart
attack, an ambulance rushes them to the hospital, running red lights
as necessary. When someone has acne, you don't take them to the same
hospital in the same ambulance and drive it at a slower rate of speed.
You do something else entirely, and it's something that is in every
way much less dramatic. There's no such thing as an attack of acne
that's so bad that it requires an ambulance ride, but even a mild
heart attack should result in a fast trip to the ER. So here. The two
problems are so qualitatively different that the responses should also
be qualitatively different.

> Admittedly this argument works a lot better with the failsafe than it
> does with xidStopLimit. Both are removed by the patch.

I don't think the failsafe stuff should be removed, but it should
probably be modified in some way. Running out of XIDs is the only
valid reason for stopping the world, at least IMO, but it is
definitely NOT the only reason for vacuuming more aggressively.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-29 Thread SATYANARAYANA NARLAPURAM

On Sun, Nov 27, 2022 at 10:33 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Nov 28, 2022 at 12:57 AM Andrey Borodin 
> wrote:
> >
> > Some funny stuff. If a user tries to cancel a non-replicated transaction
> > Azure Postgres will answer: "user requested cancel while waiting for
> > synchronous replication ack. The COMMIT record has already flushed to
> > WAL locally and might not have been replicatead to the standby. We
> > must wait here."
> > AWS RDS will answer: "ignoring request to cancel wait for synchronous
> > replication"
> > Yandex Managed Postgres will answer: "canceling wait for synchronous
> > replication due requested, but cancelation is not allowed. The
> > transaction has already committed locally and might not have been
> > replicated to the standby. We must wait here."
> >
> > So, for many services providing Postgres as a service it's only a
> > matter of wording.
>
> Thanks for verifying the behaviour. And many thanks for an off-list chat.
>
> FWIW, I'm planning to prepare a patch as per the below idea which is
> something similar to the initial proposal in this thread. Meanwhile,
> thoughts are welcome.
>
> 1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for
> sync replication acknowledgement.
>

+1


> 2. Process proc die immediately when a backend is waiting for sync
> replication acknowledgement, as it does today, however, upon restart,
> don't open up for business (don't accept ready-only connections)
> unless the sync standbys have caught up.
>

Are you planning to block connections or queries to the database? It would
be good to allow connections and let them query the monitoring views but
block the queries until sync standby have caught up. Otherwise, this leaves
a monitoring hole. In cloud, I presume superusers are allowed to connect
and monitor (end customers are not the role members and can't query the
data). The same can't be true for all the installations. Could you please
add more details on your approach?


>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>

Re: fixing CREATEROLE

2022-11-29 Thread Robert Haas

On Tue, Nov 29, 2022 at 2:32 AM  wrote:
> I propose a slightly different syntax instead:
>
> ALTER DEFAULT PRIVILEGES GRANT CREATED ROLE TO role_specification WITH ...;
>
> This, together with the proposal above regarding the grantor, should be
> consistent.

I think that is more powerful than what I proposed but less fit for
purpose. If alice is a CREATEROLE user and issues CREATE ROLE bob, my
proposal allows alice to automatically obtain access to bob's
privileges. Your proposal would allow that, but it would also allow
alice to automatically confer bob's privileges on some third user, say
charlie. Maybe that's useful to somebody, I don't know.

But one significant disadvantage of this is that every CREATEROLE user
must have their own configuration. If we have CREATE ROLE users alice,
dave, and ellen, then allice needs to execute ALTER DEFAULT PRIVILEGES
GRANT CREATED ROLE TO alice WITH ...; dave needs to do the same thing
with dave instead of alice; and ellen needs to do the same thing with
ellen instead of alice. There's no way to apply a system-wide
configuration that applies nicely to all CREATEROLE users.

A GUC would of course allow that, because it could be set in
postgresql.conf and then overridden for particular databases, users,
or sessions.

David claims that "these aren't privileges, they are memberships." I
don't entirely agree with that, because I think that we're basically
using memberships as a pseudonym for privileges where roles are
concerned. However, it is true that there's no precedent for referring
to role grants using the keyword PRIVILEGES at the SQL level, and the
fact that the underlying works in somewhat similar ways doesn't
necessarily mean that it's OK to conflate the two concepts at the SQL
level.

So I'm still not very sold on this idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Jan Wieck


On 11/29/22 09:46, Bruce Momjian wrote:

As far as I know, all our freeze values are focused on avoiding XID
wraparound.  If XID wraparound is no longer an issue, we might find that
our freeze limits can be much higher than they are now.



I'd be careful in that direction as the values together with maintenance 
work mem also keep a lid on excessive index cleanup rounds.



Regards, Jan

Re: fixing CREATEROLE

2022-11-29 Thread David G. Johnston

On Tue, Nov 29, 2022 at 12:32 AM  wrote:

>
> Is there any other argument to be made against ADP?
>

These aren't privileges, they are memberships.  The pg_default_acl catalog
is also per-data while these settings should be present in a catalog which,
like pg_authid, is catalog-wide.  This latter point, for me, disqualifies
the command itself from being used for this purpose.  If we'd like to
create ALTER DEFAULT MEMBERSHIP (and a corresponding cluster-wide catalog)
then maybe the rest of the design would work within that.

>
> Note, that ADP allows much more than just creating a grant for the
> CREATEROLE user, which would be the case if the default GRANT was made
> TO the_create_role_user. But it could be made towards *other* users as
> well, so you could do something like this:
>
> CREATE ROLE alice CREATEROLE;
> CREATE ROLE bob;
>
> ALTER DEFAULT PRIVILEGES FOR alice GRANT CREATED ROLE TO bob WITH SET
> TRUE, INHERIT FALSE;
>

What does that accomplish?  bob cannot create roles to actually exercise
his privilege.

> This is much more flexible than role attributes or GUCs.
>
>
The main advantage of GUC over a role attribute is that you can institute
layers of defaults according to a given cluster's specific needs.  ALTER
ROLE SET (pg_db_role_setting - also cluster-wide) also comes into play;
maybe alice wants auto-inherit while in db-a but not db-b (this would/will
be more convincing if we end up having per-database roles).

If we accept that some external configuration knowledge is going to
influence the result of executing this command (Tom?) then it seems that
all the features a GUC provides are desirable in determining how the final
execution context is configured. Which makes sense as this kind of thing is
precisely what the GUC subsystem was designed to handle - session context
environments related to the user and database presently connected.

David J.

Re: Support load balancing in libpq

2022-11-29 Thread Jelte Fennema

Attached is a new version with the tests cleaned up a bit (more comments 
mostly).

@Michael, did you have a chance to look at the last version? Because I feel 
that the 
patch is pretty much ready for a committer to look at, at this point.

v5-0001-Support-load-balancing-in-libpq.patch
Description: v5-0001-Support-load-balancing-in-libpq.patch

Re: Fix database creation during installchecks for ICU cluster

2022-11-29 Thread Nazir Bilal Yavuz


Hi,

Thanks for the patch!


On 10/29/22 12:54, Marina Polyakova wrote:


1) The ECPG tests fail because they use the SQL_ASCII encoding [2], 
the database template0 uses the ICU locale provider and SQL_ASCII is 
not supported by ICU:


$ make -C src/interfaces/ecpg/ installcheck
...
== creating database "ecpg1_regression" ==
ERROR:  encoding "SQL_ASCII" is not supported with ICU provider
ERROR:  database "ecpg1_regression" does not exist
command failed: "/home/marina/postgresql/master/my/inst/bin/psql" -X 
-c "CREATE DATABASE \"ecpg1_regression\" TEMPLATE=template0 
ENCODING='SQL_ASCII'" -c "ALTER DATABASE \"ecpg1_regression\" SET 
lc_messages TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_monetary 
TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_numeric TO 'C';ALTER 
DATABASE \"ecpg1_regression\" SET lc_time TO 'C';ALTER DATABASE 
\"ecpg1_regression\" SET bytea_output TO 'hex';ALTER DATABASE 
\"ecpg1_regression\" SET timezone_abbreviations TO 'Default';" "postgres"




I can confirm that same error happens on my end and your patch fixes the 
issue. But, do ECPG tests really require SQL_ASCII encoding? I removed 
ECPG tests' encoding line [1], rebuilt it and 'make -C 
src/interfaces/ecpg/ installcheck' passed without applying your patch.





2) The option --no-locale in pg_regress is described as "use C locale" 
[3]. But in this case the created databases actually use the ICU 
locale provider with the ICU cluster locale from template0 (see 
diff_check_backend_used_provider.txt):


$ make NO_LOCALE=1 installcheck



This works on my end without applying your patch. Commands I used are:

$ ./configure --with-icu --prefix=$PWD/build_dir
$ make && make install && export PATH=$PWD/build_dir/bin:$PATH
$ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D 
data -l logfile start

$ make NO_LOCALE=1 installcheck

[1] 
https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18



Regards,
Nazir Bilal Yavuz

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Bruce Momjian

On Tue, Nov 29, 2022 at 02:35:20PM +0100, Chris Travers wrote:
> So I think the problem is that PostgreSQL is becoming more and more scalabile,
> hardware is becoming more capable, and certain use cases are continuing to
> scale up.  Over time, we tend to find ourselves approaching the end of the
> runway at ever higher velocities.  That's a problem that will get 
> significantly
> worse over time.
> 
> Of course, as I think we agree, the priorities should be (in order):
> 1.  Avoid trouble
> 2.  Recover from trouble early
> 3.  Provide more and better options for recovery.

Warn about trouble is another area we should focus on here.

> I think 64bit xids are a very good idea, but they really fit in this bottom
> tier.   Not being up against mathematical limits to the software when things
> are going bad is certainly a good thing.  But I am really worried about the
> attitude that this patch really avoids trouble because in many cases, I don;t
> think it does and therefore I believe we need to make sure we are not reducing
> visibility of underlying problems.

As far as I know, all our freeze values are focused on avoiding XID
wraparound.  If XID wraparound is no longer an issue, we might find that
our freeze limits can be much higher than they are now.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Support logical replication of DDLs

2022-11-29 Thread vignesh C

On Tue, 29 Nov 2022 at 17:51, li jie  wrote:
>
> I will continue to give feedback for this patch.

Thanks a lot, that will be very helpful for us.

> 1.  LIKE STORAGE
> ```
> CREATE TABLE ctlt (a text, c text);
> ALTER TABLE ctlt ALTER COLUMN c SET STORAGE EXTERNAL;
> CREATE TABLE ctlt_storage (LIKE ctlt INCLUDING STORAGE);
> ```
>
> postgres=# \d+ ctlt_storage
>
>  Table "public.ctlt_storage"
>
>  Column | Type | Collation | Nullable | Default | Storage  |
> Compression | Stats target | Description
>
> +--+---+--+-+--+-+--+-
>
>  a  | text |   |  | | extended |
>   |  |
>
>  c  | text |   |  | | extended |
>   |  |
>
>
> It can be seen that the storage attribute in column C of table
> ctlt_storage is not replicated.
>
> After the CREATE TABLE LIKE statement is converted,
> the LIKE STORAGE attribute is lost because it is difficult to display
> it in the CREATE TABLE syntax.
> Maybe we need to add a statement to it, like 'ALTER TABLE ctlt_storage
> ALTER COLUMN c SET STORAGE EXTERNAL;'.
>
> 2. Reference subcommand be dropped.
> ```
> create table another (f1 int, f2 text, f3 text);
>
> alter table another
>   alter f1 type text using f2 || ' and ' || f3 || ' more',
>   alter f2 type bigint using f1 * 10,
>   drop column f3;
> ```
>
> The following error occurs downstream:
> ERROR:  column "?dropped?column?" does not exist at character 206
> STATEMENT:  ALTER TABLE public.another DROP COLUMN f3 , ALTER COLUMN
> f1 SET DATA TYPE pg_catalog.text COLLATE pg_catalog."default" USING
> (((f2 OPERATOR(pg_catalog.||) ' and '::pg_catalog.text)
> OPERATOR(pg_catalog.||) "?dropped?column?") OPERATOR(pg_catalog.||) '
> more'::pg_catalog.text), ALTER COLUMN f2 SET DATA TYPE pg_catalog.int8
> USING (f1 OPERATOR(pg_catalog.*) 10)
>
> Obviously, column f3 has been deleted and its name no longer exists.
> Maybe we need to keep it and save it in advance like a drop object.
>  However,  ATLER TABLE is complex, and this problem also occurs in
> other similar scenarios.

I will analyze these issues and post a patch to handle it.

Regards,
Vignesh

Re: Privileges on PUBLICATION

2022-11-29 Thread Antonin Houska

Antonin Houska  wrote:

> Peter Eisentraut  wrote:
> 
> > On 04.11.22 08:28, Antonin Houska wrote:
> > > I thought about the whole concept a bit more and I doubt if the 
> > > PUBLICATION
> > > privilege is the best approach. In particular, the user specified in 
> > > CREATE
> > > SUBSCRIPTION ... CONNECTION ... (say "subscription user") needs to have 
> > > SELECT
> > > privilege on the tables replicated. So if the DBA excludes some columns 
> > > from
> > > the publication's column list and sets the (publication) privileges in 
> > > such a
> > > way that the user cannot get the column values via other publications, the
> > > user still can connect to the database directly and get values of the 
> > > excluded
> > > columns.
> > 
> > Why are the SELECT privileges needed?  Maybe that's something to think about
> > and maybe change.
> 
> I haven't noticed an explanation in comments nor did I search in the mailing
> list archives, but the question makes sense: the REPLICATION attribute of a
> role is sufficient for streaming replication, so why should the logical
> replication require additional privileges?
> 
> Technically the SELECT privilege is needed because the sync worker does
> actually execute SELECT query on the published tables. However, I realize now
> that it's not checked by the output plugin. Thus if SELECT is revoked from the
> "subscription user" after the table has been synchronized, the replication
> continues to work. So the necessity for the SELECT privilege might be an
> omission rather than a design choice. (Even the documentation says that the
> SELECT privilege is needed only for the initial synchronization [1], however
> it does not tell why.)
> 
> > > As an alternative to the publication privileges, I think that the CREATE
> > > SUBSCRIPTION command could grant ACL_SELECT automatically to the 
> > > subscription
> > > user on the individual columns contained in the publication column list, 
> > > and
> > > DROP SUBSCRIPTION would revoke that privilege.
> > 
> > I think that approach is weird and unusual.  Privileges and object creation
> > should be separate operations.
> 
> ok. Another approach would be to skip the check for the SELECT privilege (as
> well as the check for the USAGE privilege on the corresponding schema) if
> given column is being accessed via a publication which has it on its column
> list and if the subscription user has the USAGE privilege on that publication.
> 
> So far I wasn't sure if we can do that because, if pg_upgrade grants the USAGE
> privilege on all publications to the "public" role, the DBAs who relied on the
> SELECT privileges might not notice that any role having the REPLICATION
> attribute can access all the published tables after the upgrade. (pg_upgrade
> can hardly do anything else because it has no information on the "subscription
> users", so it cannot convert the SELECT privilege on tables to the USAGE
> privileges on publications.)
> 
> But now that I see that the logical replication doesn't check the SELECT
> privilege properly anyway, I think we can get rid of it.

The attached version tries to do that - as you can see in 0001, the SELECT
privilege is not required for the walsender process.

I also added PUBLICATION_NAMES option to the COPY TO command so that the
publisher knows which publications are subject to the ACL check. Only data of
those publications are returned to the subscriber. (In the previous patch
version the ACL checks were performed on the subscriber side, but I that's not
ideal in terms of security.)

I also added the regression tests for publications, enhanced psql (the \dRp+
command) so that it displays the publication ACL and added a few missing
pieces of documentation.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



publication_privileges_v03.tgz
Description: application/gzip

Re: Introduce a new view for checkpointer related stats

2022-11-29 Thread Drouvot, Bertrand


Hi,

On 11/28/22 6:58 PM, Robert Haas wrote:

On Tue, Nov 22, 2022 at 3:53 PM Andres Freund  wrote:

I think we should consider deprecating the pg_stat_bgwriter columns but
leaving them in place for a few years. New stuff should only be added to
pg_stat_checkpointer, but we don't need to break old monitoring queries.


I vote to just remove them. I think that most people won't update
their queries until they are forced to do so.  I don't think it
matters very much when we force them to do that.

Our track record in following through on deprecations is pretty bad
too, which is another consideration.



Same point of view.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

'Flexible "partition pruning" hook' redux?

2022-11-29 Thread Ted Toth

There's an old thread that interests me but which ended without any
resolution/solution:

https://www.postgresql.org/messageid/CA%2BHiwqGpoKmDg%2BTJdMfoeu3k4VQnxcfBon4fwBRp8ex_CTCsnA%40mail.gmail.com

Some of our basic requirements are:
1) All data must be labeled at a specific level (using SELinux multi-level
security (MLS) policy).
2) Data of different levels cannot be stored in the same file on disk.
3) The Bell-LaPadula model must be applied meaning read (select) down
(return
rows labeled at levels dominated by the querying processes level) is
allowed,
updates (insert/update/delete) can only be done to data at the same level as
executing process. BLM allows for write up but in reality since processes
don't
know about levels which dominate theirs this doesn't happen.

In the past I've used RLS, sepgsql and some additional custom functions to
create MLS databases but this does not satisfy #2. Partitioning looks to be
a
way to achieve #2 and to possibly improve query performance since
partitions
could be pruned based on the level of data stored in them. However I'm not
aware of a means to implement table level dominance pruning. The patch,
in the thread noted above, proposed a hook to allow customized pruning of
partitions which is something I think would be useful. However a number of
questions and concerns were raised (some beyond my ability to even
comprehend since I don't have intimate knowledge of the code base) but
never addressed.
What's the best way forward in a situation like this?

Ted

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Chris Travers

On Mon, Nov 28, 2022 at 11:06 PM Peter Geoghegan  wrote:

> On Mon, Nov 28, 2022 at 1:52 PM Bruce Momjian  wrote:
> > I think the problem is that we still have bloat with 64-bit XIDs,
> > specifically pg_xact and pg_multixact files.  Yes, that bloat is less
> > serious, but it is still an issue worth reporting in the server logs,
> > though not serious enough to stop the server from write queries.
>
> That's definitely a big part of it.
>
> Again, I don't believe that the idea is fundamentally without merit.
> Just that it's not worth it, given that having more XID space is very
> much not something that I think fixes most of the problems. And given
> the real risk of serious bugs with something this invasive.

> I believe that it would be more useful to focus on just not getting
> into trouble in the first place, as well as on mitigating specific
> problems that lead to the system reaching xidStopLimit in practice. I
> don't think that there is any good reason to allow datfrozenxid to go
> past about a billion. When it does the interesting questions are
> questions about what went wrong, and how that specific failure can be
> mitigated in a fairly direct way.
>
> We've already used way to much "XID space runway", so why should using
> even more help? It might, I suppose, but it almost seems like a
> distraction to me, as somebody that wants to make things better for
> users in general. As long as the system continues to misbehave (in
> whatever way it happens to be misbehaving), why should any amount of
> XID space ever be enough?
>

So I think the problem is that PostgreSQL is becoming more and more
scalabile, hardware is becoming more capable, and certain use cases are
continuing to scale up.  Over time, we tend to find ourselves approaching
the end of the runway at ever higher velocities.  That's a problem that
will get significantly worse over time.

Of course, as I think we agree, the priorities should be (in order):
1.  Avoid trouble
2.  Recover from trouble early
3.  Provide more and better options for recovery.

I think 64bit xids are a very good idea, but they really fit in this bottom
tier.   Not being up against mathematical limits to the software when
things are going bad is certainly a good thing.  But I am really worried
about the attitude that this patch really avoids trouble because in many
cases, I don;t think it does and therefore I believe we need to make sure
we are not reducing visibility of underlying problems.

>
> I think that we'll be able to get rid of freezing in a few years time.
> But as long as we have freezing, we have these problems.
>
> --
> Peter Geoghegan
>

Re: Logical Replication Custom Column Expression

2022-11-29 Thread Ashutosh Bapat

On Fri, Nov 25, 2022 at 4:13 PM Stavros Koureas
 wrote:
>
> Yes, if the property is on the subscription side then it should be applied 
> for all the tables that the connected publication is exposing.
> So if the property is enabled you should be sure that this origin column 
> exists to all of the tables that the publication is exposing...
>

That would be too restrictive - not necessarily in your application
but generally. There could be some tables where consolidating rows
with same PK from different publishers into a single row in subscriber
would be desirable. I think we need to enable the property for every
subscriber that intends to add publisher column to the desired and
subscribed tables. But there should be another option per table which
will indicate that receiver should add publisher when INSERTING row to
that table.

> Sure this is the complete idea, that the subscriber should match the PK of 
> origin, 
> As the subscription table will contain same key values from different 
> origins, for example:
>

And yes, probably you need to change the way you reply to email on
this list. Top-posting is generally avoided. See
https://wiki.postgresql.org/wiki/Mailing_Lists.

-- 
Best Wishes,
Ashutosh Bapat

Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-29 Thread Julien Tachoires

Hi Tomas,

Le mar. 29 nov. 2022 à 14:06, Tomas Vondra
 a écrit :
>
> On 11/29/22 13:49, Simon Riggs wrote:
> > On Thu, 17 Nov 2022 at 17:29, Simon Riggs  
> > wrote:
> >
> >> (yes, the last line shows x10 performance patched, that is not a typo)
> >
> > New version of patch, now just a one-line patch!
> >
> > Results show it's still a good win for XidInMVCCSnapshot().
> >
>
> I'm a bit confused - for which workload/benchmark are there results?
> It's generally a good idea to share the scripts used to run the test and
> not just a chart.

The scripts have been attached to this thread with the initial
performance results.
Anyway, re-sending those (including a minor fix).

-- 
Julien Tachoires
EDB


subtrans-benchmark.tar.gz
Description: GNU Zip compressed data

Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-29 Thread Tomas Vondra

On 11/29/22 13:49, Simon Riggs wrote:
> On Thu, 17 Nov 2022 at 17:29, Simon Riggs  
> wrote:
> 
>> (yes, the last line shows x10 performance patched, that is not a typo)
> 
> New version of patch, now just a one-line patch!
> 
> Results show it's still a good win for XidInMVCCSnapshot().
> 

I'm a bit confused - for which workload/benchmark are there results?
It's generally a good idea to share the scripts used to run the test and
not just a chart.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Chris Travers

Hi;

I suppose I must not have been clear in what I am suggesting we do and
why.   I will try to answer specific points below and then restate what I
think the problem is, and what I think should be done about it.

On Mon, Nov 28, 2022 at 5:53 PM Robert Haas  wrote:

> On Sat, Nov 26, 2022 at 4:08 AM Chris Travers 
> wrote:
> > I didn't see any changes to pg_upgrade to make this change possible on
> upgrade.  Is that also outside of the scope of your patch set?  If so how
> is that continuity supposed to be ensured?
>
> The scheme is documented in their 0006 patch, in a README.XID file.
> I'm not entirely confident that it's the best design and have argued
> against it in the past, but it's not crazy.
>

Right.  Per previous discussion I thought there was some discussion of
allowing people to run with the existing behavior.I must have been
mistaken.  If that is off the table then pg_upgrade and runtime replication
checks don't matter.

>
> More generally, while I think there's plenty of stuff to be concerned
> about in this patch set and while I'm somewhat skeptical about the
> likelihood of its getting or staying committed, I can't really
> understand your concerns in particular. The thrust of your concern
> seems to be that if we allow people to get further behind, recovery
> will be more difficult. I'm not sure I see the problem. Suppose that
> we adopt this proposal and that it is bug-free. Now, consider a user
> who gets 8 billion XIDs behind. They probably have to vacuum pretty
> much every page in the database to do that, or least every page in the
> tables that haven't been vacuumed recently. But that would likely also
> be true if they were 800 million XIDs behind, as is possible today.
> The effort to catch up doesn't increase linearly with how far behind
> you are, and is always bounded by the DB size.
>

Right.  I agree with all of that.

>
> It is true that if the table is progressively bloating, it is likely
> to be more bloated by the time you are 8 billion XIDs behind than it
> was when you were 800 million XIDs behind. I don't see that as a very
> good reason not to adopt this patch, because you can bloat the table
> by an arbitrarily large amount while consuming only a small number of
> XiDs, even just 1 XID. Protecting against bloat is good, but shutting
> down the database when the XID age reaches a certain value is not a
> particularly effective way of doing that, so saying that we'll be
> hurting people by not shutting down the database at the point where we
> do so today doesn't ring true to me. I think that most people who get
> to the point of wraparound shutdown have workloads where bloat isn't a
> huge issue, because those who do start having problems with the bloat
> way before they run out of XIDs.
>

To be clear, I never suggested shutting down the database.  What I have
suggested is that repurposing the current approaching-xid-wraparound
warnings to start complaining loudly when a threshold is exceeded would be
helpful.   I think it makes sense to make that threshold configurable
especially if we eventually have people running bloat-free table structures.

There are two fundamental problems here.  The first is that if, as you say,
a table is progressively bloating and we are getting further and further
behind on vacuuming and freezing, something is seriously wrong and we need
to do something about it.  In these cases, I my experience is that
vacuuming and related tools tend to suffer degraded performance, and
determining how to solve the problem takes quite a bit more time than a
routine bloat issue would.  So what I am arguing against is treating the
problem just as a bloat issue.  If you get there due to vacuum being slow,
something else is wrong and you are probably going to have to find and fix
that as well in order to catch up.  At least that's my experience.

I don't object to the db continuing to run, allocate xids etc.  What I
object to is it doing so in silently where things are almost certainly
going very wrong.

>
> It would be entirely possible to add a parameter to the system that
> says "hey, you know we can keep running even if we're a shazillion
> XiDs behind, but instead shut down when we are behind by this number
> of XIDs." Then, if somebody wants to force an automatic shutdown at
> that point, they could, and I think that then the scenario you're
> worried about just can't happen any more . But isn't that a little bit
> silly? You could also just monitor how far behind you are and page the
> DBA when you get behind by more than a certain number of XIDs. Then,
> you wouldn't be risking a shutdown, and you'd still be able to stay on
> top of the XID ages of your tables.
>
> Philosophically, I disagree with the idea of shutting down the
> database completely in any situation in which a reasonable alternative
> exists. Losing read and write availability is really bad, and I don't
> think it's what users want. I think that most users want the database
> to

Re: Patch: Global Unique Index

2022-11-29 Thread Vik Fearing


On 11/24/22 19:15, Cary Huang wrote:

   On Thu, 24 Nov 2022 08:00:59 -0700  Thomas Kellerer  wrote ---
  > Pavel Stehule schrieb am 24.11.2022 um 07:03:
  > > There are many Oracle users that find global indexes useful despite
  > > their disadvantages.
  > >
  > > I have seen this mostly when the goal was to get the benefits of
  > > partition pruning at runtime which turned the full table scan (=Seq 
Scan)
  > > on huge tables to partition scans on much smaller partitions.
  > > Partition wise joins were also helpful for query performance.
  > > The substantially slower drop partition performance was accepted in 
thos cases
  > >
  > >
  > > I think it would be nice to have the option in Postgres as well.
  > >
  > > I do agree however, that the global index should not be created 
automatically.
  > >
  > > Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better
  > >
  > >
  > > Is it necessary to use special marks like GLOBAL if this index will
  > > be partitioned, and uniqueness will be ensured by repeated
  > > evaluations?
  > >
  > > Or you think so there should be really forced one relation based
  > > index?
  > >
  > > I can imagine a unique index on partitions without a special mark,
  > > that will be partitioned,  and a second variant classic index created
  > > over a partitioned table, that will be marked as GLOBAL.
  >
  >
  > My personal opinion is, that a global index should never be created
  > automatically.
  >
  > The user should consciously decide on using a feature
  > that might have a serious impact on performance in some areas.


Agreed, if a unique index is created on non-partition key columns without 
including the special mark (partition key columns), it may be a mistake from 
user. (At least I make this mistake all the time). Current PG will give you a 
warning to include the partition keys, which is good.

If we were to automatically turn that into a global unique index, user may be 
using the feature without knowing and experiencing some performance impacts (to 
account for extra uniqueness check in all partitions).


I disagree.  A user does not need to know that a table is partitionned, 
and if the user wants a unique constraint on the table then making them 
type an extra word to get it is just annoying.

--
Vik Fearing

Re: Bug in wait time when waiting on nested subtransaction

2022-11-29 Thread Simon Riggs

On Mon, 28 Nov 2022 at 21:10, Robert Haas  wrote:
>
> On Mon, Nov 28, 2022 at 3:01 PM Tom Lane  wrote:
> > One thing we need to be pretty careful of here is to not break the
> > promise of atomic commit.  At topmost commit, all subxacts must
> > appear committed simultaneously.  It's not quite clear to me whether
> > we need a similar guarantee in the rollback case.  It seems like
> > we shouldn't, but maybe I'm missing something, in which case maybe
> > the current behavior is correct?
>
> In short, I think Simon is right that there's a problem and right
> about which commit caused it, but I'm not sure what I think we ought
> to do about it.

I'm comfortable with ignoring it, on the basis that it *is* a
performance optimization, but I suggest we keep the test (with
modified output) and document the behavior, if we do.

The really big issue is the loss of performance we get from having
subtrans point only to its immediate parent, which makes
XidInMVCCSnapshot() go really slow in the presence of lots of
subtransactions. So ignoring the issue on this thread will open the
door for the optimization posted for this patch:
https://commitfest.postgresql.org/40/3806/

--
Simon Riggshttp://www.EnterpriseDB.com/

Re: Introduce a new view for checkpointer related stats

2022-11-29 Thread Amit Kapila

On Mon, Nov 28, 2022 at 11:29 PM Robert Haas  wrote:
>
> On Tue, Nov 22, 2022 at 3:53 PM Andres Freund  wrote:
> > I think we should consider deprecating the pg_stat_bgwriter columns but
> > leaving them in place for a few years. New stuff should only be added to
> > pg_stat_checkpointer, but we don't need to break old monitoring queries.
>
> I vote to just remove them. I think that most people won't update
> their queries until they are forced to do so.  I don't think it
> matters very much when we force them to do that.
>

+1.

-- 
With Regards,
Amit Kapila.

Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-29 Thread Simon Riggs

On Thu, 17 Nov 2022 at 17:29, Simon Riggs  wrote:

> (yes, the last line shows x10 performance patched, that is not a typo)

New version of patch, now just a one-line patch!

Results show it's still a good win for XidInMVCCSnapshot().

-- 
Simon Riggshttp://www.EnterpriseDB.com/


subtrans_points_to_topxid.v13.patch
Description: Binary data

Re: GUC for temporarily disabling event triggers

2022-11-29 Thread Daniel Gustafsson

> On 3 Nov 2022, at 21:47, Daniel Gustafsson  wrote:

> The patch adds a new GUC, ignore_event_trigger with two option values, 'all'
> and 'none' (the login event patch had 'login' as well).

The attached v2 fixes a small bug which caused testfailures the CFBot.

--
Daniel Gustafsson   https://vmware.com/



v2-0001-Add-GUC-for-temporarily-disabling-event-triggers.patch
Description: Binary data

Re: Patch: Global Unique Index

2022-11-29 Thread Ilya Anfimov

On Fri, Nov 18, 2022 at 12:03:53PM +0300, Sergei Kornilov wrote:
> Hello
> Do we need new syntax actually? I think that a global unique index can be 
> created automatically instead of raising an error "unique constraint on 
> partitioned table must include all partitioning columns"

 I may suggest even more of the new syntax.

 If  someone has to implement sequential index checking on unique
constraints, then it would be useful to be able to do that  inde-
pendent of partitioning also.

E.g.  for  some  kinds  of manual partitions or for strangely de-
signed datasets.  Or for some of the table partitions instead for
all of them.

For that reason, perhaps some other type of unique index --  that
is  not  an index per se, but a check against a set of indexes --
could be added.  Or, perhaps, not an index, but an  EXCLUDE  con-
straint of that kind.

Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-29 Thread Amit Kapila

On Tue, Nov 29, 2022 at 10:18 AM houzj.f...@fujitsu.com
 wrote:
>
> Attach the new version patch which addressed all comments.
>

Review comments on v53-0001*
==
1.
 Subscription *MySubscription = NULL;
-static bool MySubscriptionValid = false;
+bool MySubscriptionValid = false;

It seems still this variable is used in worker.c, so why it's scope changed?

2.
/* fields valid only when processing streamed transaction */
-static bool in_streamed_transaction = false;
+bool in_streamed_transaction = false;

Is it really required to change the scope of this variable? Can we
think of exposing a macro or inline function to check it in
applyparallelworker.c?

3.
should_apply_changes_for_rel(LogicalRepRelMapEntry *rel)
 {
  if (am_tablesync_worker())
  return MyLogicalRepWorker->relid == rel->localreloid;
+ else if (am_parallel_apply_worker())
+ {
+ if (rel->state != SUBREL_STATE_READY)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication parallel apply worker for subscription
\"%s\" will stop",

Is this check sufficient? What if the rel->state is
SUBREL_STATE_UNKNOWN? I think that will be possible when the refresh
publication has not been yet performed after adding a new relation to
the publication. If that is true then won't we need to simply ignore
that change and continue instead of erroring out? Can you please once
test and check this case?

4.
+
+ case TRANS_PARALLEL_APPLY:
+ list_free(subxactlist);
+ subxactlist = NIL;
+
+ apply_handle_commit_internal(_data);

I don't think we need to retail pfree subxactlist as this is allocated
in TopTransactionContext and will be freed at commit/prepare. This way
freeing looks a bit adhoc to me and you need to expose this list
outside applyparallelworker.c which doesn't seem like a good idea to
me either.

5.
+ apply_handle_commit_internal(_data);
+
+ pa_set_xact_state(MyParallelShared, PARALLEL_TRANS_FINISHED);
+ pa_unlock_transaction(xid, AccessShareLock);
+
+ elog(DEBUG1, "finished processing the transaction finish command");

I think in this and similar DEBUG logs, we can tell the exact command
instead of writing 'finish'.

6.
apply_handle_stream_commit()
{
...
+ /*
+ * After sending the data to the parallel apply worker, wait for
+ * that worker to finish. This is necessary to maintain commit
+ * order which avoids failures due to transaction dependencies and
+ * deadlocks.
+ */
+ pa_wait_for_xact_finish(winfo);
+
+ pgstat_report_stat(false);
+ store_flush_position(commit_data.end_lsn);
+ stop_skipping_changes();
+
+ (void) pa_free_worker(winfo, xid);
...
}

apply_handle_stream_prepare(StringInfo s)
{
+
+ /*
+ * After sending the data to the parallel apply worker, wait for
+ * that worker to finish. This is necessary to maintain commit
+ * order which avoids failures due to transaction dependencies and
+ * deadlocks.
+ */
+ pa_wait_for_xact_finish(winfo);
+ (void) pa_free_worker(winfo, prepare_data.xid);

- /* unlink the files with serialized changes and subxact info. */
- stream_cleanup_files(MyLogicalRepWorker->subid, prepare_data.xid);
+ in_remote_transaction = false;
+
+ store_flush_position(prepare_data.end_lsn);

In both of the above functions, we should be consistent in calling
pa_free_worker() function which I think should be immediately after
pa_wait_for_xact_finish(). Is there a reason for not being consistent
here?

7.
+ res = shm_mq_receive(winfo->error_mq_handle, , , true);
+
+ /*
+ * The leader will detach from the error queue and set it to NULL
+ * before preparing to stop all parallel apply workers, so we don't
+ * need to handle error messages anymore.
+ */
+ if (!winfo->error_mq_handle)
+ continue;

This check must be done before calling shm_mq_receive. So, changed it
in the attached patch.

8.
@@ -2675,6 +3156,10 @@ store_flush_position(XLogRecPtr remote_lsn)
 {
  FlushPosition *flushpos;

+ /* Skip for parallel apply workers. */
+ if (am_parallel_apply_worker())
+ return;

It is okay to always update the flush position by leader apply worker
but I think the leader won't have updated value for XactLastCommitEnd
as the local transaction is committed by parallel apply worker.

9.
@@ -3831,11 +4366,11 @@ ApplyWorkerMain(Datum main_arg)

  ereport(DEBUG1,
  (errmsg_internal("logical replication apply worker for subscription
\"%s\" two_phase is %s",
- MySubscription->name,
- MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_DISABLED
? "DISABLED" :
- MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING ?
"PENDING" :
- MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED ?
"ENABLED" :
- "?")));
+ MySubscription->name,
+ MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_DISABLED
? "DISABLED" :
+ MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING ?
"PENDING" :
+ MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED ?
"ENABLED" :
+ "?")));

Is this change related to this patch?

10. What is the reason to expose

Re: Reducing power consumption on idle servers

2022-11-29 Thread Simon Riggs

On Sun, 20 Nov 2022 at 20:00, Simon Riggs  wrote:
>
> On Thu, 24 Mar 2022 at 16:21, Robert Haas  wrote:
> >
> > On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs
>
> > > What changes will be acceptable for bgwriter, walwriter and logical 
> > > worker?
> >
> > Hmm, I think it would be fine to introduce some kind of hibernation
> > mechanism for logical workers. bgwriter and wal writer already have a
> > hibernation mechanism, so I'm not sure what your concern is there
> > exactly. In your initial email you said you weren't proposing changes
> > there, but maybe that changed somewhere in the course of the
> > subsequent discussion. If you could catch me up on your present
> > thinking that would be helpful.
>
> Now that we seem to have solved the problem for Startup process, let's
> circle back to the others

Thanks for committing changes to the startup process.

> Bgwriter does hibernate currently, but only at 50x the bgwriter_delay.
> At default values that is 5s, but could be much less. So we need to
> move that up to 60s, same as others.
>
> WALwriter also hibernates currently, but only at 25x the
> wal_writer_delay. At default values that is 2.5s, but could be much
> less. So we need to move that up to 60s, same as others. At the same
> time, make sure that when we hibernate we use a new WaitEvent,
> similarly to the way bgwriter reports its hibernation state (which
> also helps test the patch).

Re-attaching patch for bgwriter and walwriter, so it is clear this is
not yet committed.

I've added this to the next CF, since the entry was closed when the
startup patch was committed.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_bgwriter_walwriter.v5.patch
Description: Binary data

Re: Support logical replication of DDLs

2022-11-29 Thread li jie

I will continue to give feedback for this patch.

1.  LIKE STORAGE
```
CREATE TABLE ctlt (a text, c text);
ALTER TABLE ctlt ALTER COLUMN c SET STORAGE EXTERNAL;
CREATE TABLE ctlt_storage (LIKE ctlt INCLUDING STORAGE);
```

postgres=# \d+ ctlt_storage

 Table "public.ctlt_storage"

 Column | Type | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description

+--+---+--+-+--+-+--+-

 a  | text |   |  | | extended |
  |  |

 c  | text |   |  | | extended |
  |  |


It can be seen that the storage attribute in column C of table
ctlt_storage is not replicated.

After the CREATE TABLE LIKE statement is converted,
the LIKE STORAGE attribute is lost because it is difficult to display
it in the CREATE TABLE syntax.
Maybe we need to add a statement to it, like 'ALTER TABLE ctlt_storage
ALTER COLUMN c SET STORAGE EXTERNAL;'.

2. Reference subcommand be dropped.
```
create table another (f1 int, f2 text, f3 text);

alter table another
  alter f1 type text using f2 || ' and ' || f3 || ' more',
  alter f2 type bigint using f1 * 10,
  drop column f3;
```

The following error occurs downstream:
ERROR:  column "?dropped?column?" does not exist at character 206
STATEMENT:  ALTER TABLE public.another DROP COLUMN f3 , ALTER COLUMN
f1 SET DATA TYPE pg_catalog.text COLLATE pg_catalog."default" USING
(((f2 OPERATOR(pg_catalog.||) ' and '::pg_catalog.text)
OPERATOR(pg_catalog.||) "?dropped?column?") OPERATOR(pg_catalog.||) '
more'::pg_catalog.text), ALTER COLUMN f2 SET DATA TYPE pg_catalog.int8
USING (f1 OPERATOR(pg_catalog.*) 10)

Obviously, column f3 has been deleted and its name no longer exists.
Maybe we need to keep it and save it in advance like a drop object.
 However,  ATLER TABLE is complex, and this problem also occurs in
other similar scenarios.


Thoughts?   Adger.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Simon Riggs

On Mon, 28 Nov 2022 at 23:40, Nathan Bossart  wrote:
>
> Okay, here is a new patch set.  0004 adds logic to prevent custodian tasks
> from delaying shutdown.

That all seems good, thanks.

The last important point for me is tests, in src/test/modules
probably. It might be possible to reuse the final state of other
modules' tests to test cleanup, or at least integrate a custodian test
into each module.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-29 Thread Nikita Malakhov

Hi!

I'm working on this issue according to the plan Tom proposed above -

>I agree that we can't simply widen varatt_external to use 8 bytes for
>the toast ID in all cases.  Also, I now get the point about avoiding
>use of globally assigned OIDs here: if the counter starts from zero
>for each table, then a variable-width varatt_external could actually
>be smaller than currently for many cases.  However, that bit is somewhat
>orthogonal, and it's certainly not required for fixing the basic problem.

Have I understood correctly that you suppose using an individual counter
for each TOAST table? I'm working on this approach, so we store counters
in cache, but I see an issue with the first insert - when there is no
counter
in cache so we have to loop through the table with increasing steps to find
available one (i.e. after restart). Or we still use single global counter,
but
64-bit with a wraparound?

>So it seems like the plan of attack ought to be:

>1. Invent a new form or forms of varatt_external to allow different
>widths of the toast ID.  Use the narrowest width possible for any
>given ID value.

I'm using the VARTAG field - there are plenty of available values, so there
is no problem in distinguishing regular toast pointer with 'short' value id
(4 bytes) from long (8 bytes).

varatt_external currently is 32-bit aligned, so there is no reason in using
narrower type for value ids up to 16 bits.Or is it?

>2. Allow TOAST tables/indexes to store either 4-byte or 8-byte IDs.
>(Conversion could be done as a side effect of table-rewrite
>operations, perhaps.)

Still on toast/detoast part, would get to that later.

>3. Localize ID selection so that tables can have small toast IDs
>even when other tables have many IDs.  (Optional, could do later.)

>
Thank you!

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: Reducing power consumption on idle servers

2022-11-29 Thread Simon Riggs

On Mon, 28 Nov 2022 at 23:16, Thomas Munro  wrote:
>
> I found some more comments and some documentation to word-smith very
> lightly, and pushed.

Thanks

> The comments were stray references to the
> trigger file.  It's
> a little confusing because the remaining mechanism also uses a file,
> but it uses a signal first so seems better to refer to promotion
> requests rather than files.

..and again

> /me wonders how many idle servers there are in the world

My estimate is there are 100 million servers in use worldwide, with
only about 1% of them on a continuously busy duty cycle and most of
them not in the cloud.

If we guess that we save 10W when idle, then that saves some proportion of a GW.

It's not a huge contribution to the effort, but if by doing this we
inspire others to do the same, we may yet make a difference.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Non-decimal integer literals

2022-11-29 Thread Dean Rasheed

On Wed, 23 Nov 2022 at 08:56, David Rowley  wrote:
>
> On Wed, 23 Nov 2022 at 21:54, David Rowley  wrote:
> > I wonder if you'd be better off with something like:
> >
> > while (*ptr && isxdigit((unsigned char) *ptr))
> > {
> > if (unlikely(tmp & UINT64CONST(0xF000)))
> > goto out_of_range;
> >
> > tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
> > }
>
> Here's a delta diff with it changed to work that way.
>

This isn't correct, because those functions are meant to accumulate a
negative number in "tmp".

The overflow check can't just ignore the final digit either, so I'm
not sure how much this would end up saving once those issues are
fixed.

Regards,
Dean

Re: [PoC] Reducing planning time when tables have many partitions

2022-11-29 Thread Yuya Watari

Dear Andrey and Thom,

Thank you for reviewing and testing the patch. I really apologize for
my late response.

On Tue, Nov 8, 2022 at 8:31 PM Andrey Lepikhov
 wrote:
> Looking into find_em_for_rel() changes I see that you replaced
> if (bms_is_subset(em->em_relids, rel->relids)
> with assertion statement.
> According of get_ecmember_indexes(), the em_relids field of returned
> equivalence members can contain relids, not mentioned in the relation.
> I don't understand, why it works now? For example, we can sort by t1.x,
> but have an expression t1.x=t1.y*t2.z. Or I've missed something? If it
> is not a mistake, maybe to add a comment why assertion here isn't failed?

As you pointed out, changing the bms_is_subset() condition to an
assertion is logically incorrect here. Thank you for telling me about
it. I fixed it and attached the modified patch to this email.

On Thu, Nov 17, 2022 at 9:05 PM Thom Brown  wrote:
> No issues with applying. Created 1024 partitions, each of which is
> partitioned into 64 partitions.
>
> I'm getting a generic planning time of 1415ms. Is that considered
> reasonable in this situation? Bear in mind that the planning time
> prior to this patch was 282311ms, so pretty much a 200x speedup.

Thank you for testing the patch with an actual query. This speedup is
very impressive. When I used an original query with 1024 partitions,
its planning time was about 200ms. Given that each partition is also
partitioned in your workload, I think the result of 1415ms is
reasonable.

-- 
Best regards,
Yuya Watari

v9-0001-Apply-eclass_member_speedup_v3.patch.patch
Description: Binary data

v9-0002-Revert-one-of-the-optimizations.patch
Description: Binary data

v9-0003-Use-conventional-algorithm-for-smaller-size-cases.patch
Description: Binary data

v9-0004-Fix-incorrect-assertion.patch
Description: Binary data

RE: Avoid distributing new catalog snapshot again for the transaction being decoded.

2022-11-29 Thread wangw.f...@fujitsu.com

On Sat, Nov 26, 2022 at 19:50 PM Amit Kapila  wrote:
> On Fri, Nov 25, 2022 at 5:30 PM Ashutosh Bapat
>  wrote:
> >
> > Hi Hou,
> > Thanks for the patch. With a simple condition, we can eliminate the
> > need to queueing snapshot change in the current transaction and then
> > applying it. Saves some memory and computation. This looks useful.
> >
> > When the queue snapshot change is processed in
> > ReorderBufferProcessTXN(), we call SetupHistoricSnapshot(). But I
> > didn't find a path through which SetupHistoricSnapshot() is called
> > from SnapBuildCommitTxn().
> >
> 
> It will be called after SnapBuildCommitTxn() via ReorderBufferCommit()-
> >ReorderBufferReplay()->ReorderBufferProcessTXN().
> But, I think what I don't see is how the snapshot we build in
> SnapBuildCommitTxn() will be assigned as a historic snapshot. We assign
> base_snapshot as a historic snapshot and the new snapshot we build in
> SnapBuildCommitTxn() is assigned as base_snapshot only if the same is not set
> previously. I might be missing something but if that is true then I don't 
> think the
> patch is correct, OTOH I would expect some existing tests to fail if this 
> change is
> incorrect.

Hi,

I also think that the snapshot we build in SnapBuildCommitTxn() will not be
assigned as a historic snapshot. But I think that when the commandID message is
processed in the function ReorderBufferProcessTXN, the snapshot of the current
transaction will be updated. And I also did some tests and found no problems. 

Here is my detailed analysis:

I think that when a transaction internally modifies the catalog, a record of
XLOG_HEAP2_NEW_CID will be inserted into the WAL (see function
log_heap_new_cid). Then during logical decoding, this record will be decoded
into a change of type REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID (see function
ReorderBufferAddNewCommandId). And I think the function ReorderBufferProcessTXN
would update the HistoricSnapshot (member subxip and member curcid of snapshot)
when processing this change.

And here is only one small comment:

+* We've already done required modifications in snapshot for the
+* transaction that just committed, so there's no need to add a 
new
+* snapshot for the transaction again.
+*/
+   if (xid == txn->xid)
+   continue;

This comment seems a bit inaccurate. How about the comment below?
```
We don't need to add a snapshot to the transaction that just committed as it
will be able to see the new catalog contents after processing the new commandID
in ReorderBufferProcessTXN.
```

Regards,
Wang wei

< 1 2

101 - 142 of 142 matches

Mail list logo