date:20230328

I think it's fine to remove it. It originated from postmaster.c, where
I copied the original implementation of libpq_prng_init from.

On Tue, 28 Mar 2023 at 09:22, Daniel Gustafsson  wrote:
>
> > On 28 Mar 2023, at 09:16, Tatsuo Ishii  wrote:
>
> > "unlikely" macro is used in libpq_prng_init() in the patch. I wonder
> > if the place is really 'hot' to use "unlikely" macro.
>
> I don't think it is, I was thinking to rewrite as the below sketch:
>
> {
> if (pg_prng_strong_seed(&conn->prng_state)))
> return;
>
> /* fallback seeding */
> }
>
> --
> Daniel Gustafsson
>

Re: [EXTERNAL] Support load balancing in libpq

2023-03-28 Thread Tatsuo Ishii

>> "unlikely" macro is used in libpq_prng_init() in the patch. I wonder
>> if the place is really 'hot' to use "unlikely" macro.
> 
> I don't think it is, I was thinking to rewrite as the below sketch:
> 
> {
> if (pg_prng_strong_seed(&conn->prng_state)))
> return;
> 
> /* fallback seeding */
> }

+1.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: [EXTERNAL] Support load balancing in libpq

2023-03-28 Thread Tatsuo Ishii

> I think it's fine to remove it. It originated from postmaster.c, where
> I copied the original implementation of libpq_prng_init from.

I agree to remove unlikely macro here.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Moving forward with TDE

2023-03-28 Thread Chris Travers

On Tue, Mar 28, 2023 at 8:35 AM Bruce Momjian  wrote:

> On Tue, Mar 28, 2023 at 02:03:50AM +0200, Stephen Frost wrote:
> > The remote storage is certainly an independent system. Multi-mount LUNs
> are
> > entirely possible in a SAN (and absolutely with NFS, or just the NFS
> server
> > itself is compromised..), so while the attacker may not have any access
> to the
> > database server itself, they may have access to these other systems, and
> that’s
> > not even considering in-transit attacks which are also absolutely
> possible,
> > especially with iSCSI or NFS.
> >
> > I don’t understand what is being claimed that the remote storage is “not
> an
> > independent system” based on my understanding of, eg, NFS. With NFS, a
> > directory on the NFS server is exported and the client mounts that
> directory as
> > NFS locally, all over a network which may or may not be secured against
> > manipulation.  A user on the NFS server with root access is absolutely
> able to
> > access and modify files on the NFS server trivially, even if they have no
> > access to the PG server.  Would you explain what you mean?
>
> The point is that someone could change values in the storage, pg_xact,
> encryption settings, binaries, that would allow the attacker to learn
> the encryption key.  This is not possible for two secure endpoints and
> someone changing data in transit.  Yeah, it took me a while to
> understand these boundaries too.
>
> > So the idea is that the backup user can be compromised without the
> data
> > being vulnerable --- makes sense, though that use-case seems narrow.
> >
> > That’s perhaps a fair consideration- but it’s clearly of enough value
> that many
> > of our users are asking for it and not using PG because we don’t have it
> today.
> > Ultimately though, this clearly makes it more than a “checkbox” feature.
> I hope
> > we are able to agree on that now.
>
> It is more than a check box feature, yes, but I am guessing few people
> are wanting the this for the actual features beyond check box.
>
> > Yes, there is value beyond the check-box, but in most cases those
> > values are limited considering the complexity of the features, and
> the
> > check-box is what most people are asking for, I think.
> >
> > For the users who ask on the lists for this feature, regularly, how many
> don’t
> > ask because they google or find prior responses on the list to the
> question of
> > if we have this capability?  How do we know that their cases are
> “checkbox”?
>
> Because I have rarely heard people articulate the value beyond check
> box.
>

I think there is value.  I am going to try to articulate a case for this
here.

The first is that if people just want a "checkbox" then they can implement
PostgreSQL in ways that have encryption at rest today.  This includes using
LUKS and the encryption options in pgbackrest.  That's good enough for a
checkbox.  It isn't good enough for a real, secured instance however.

There are a few problems with trying to do this for a secured instance.
The first is that you have multiple links in the encryption chain, and the
failure of any one of them ill lead to cleartext exposure of data files.
This is not a problem for those who just want to tick a checkbox.  Also the
fact that backups and main systems are separately encrypted there (if the
backups are encrypted at all) means that people have to choose between
complicating a restore process and simply ditching encryption on the
backup, which makes the checkbox somewhat pointless.

Where I have usually seen this come up is in the question of "how do you
prevent the problem of someone pulling storage devices from your servers
and taking them away to compromise your data?"  Physical security comes
into it but often times people want more than that as an answer.  I saw
questions like that from external auditors when I was at Adjust.

If you want to actually address that problem, then the current tooling is
quite cumbersome.  Yes you can do it, but it is very hard to make sure it
has been fully secured and also very hard to monitor.  TDE would make the
setup and verification of this much easier.  And in particular it solves a
number of other issues that I can see arising from LUKS and similar
approaches since it doesn't rely on the kernel to be able to translate
plain text to and from cypher text.

I have actually worked with folks who have PII and need to protect it and
who currently use LUKS and pg_backrest to do so.  I would be extremely
happy to see TDE replace those for their needs.  I can imagine that those
who hold high value data would use it as well instead of these other more
error prone and less secure setups.

>
> > Consider that there are standards groups which explicitly consider these
> attack
> > vectors and consider them important enough to require mitigations to
> address
> > those vectors. Do the end users of PG understand the attack vectors or
> why they
> > matter?  Perhaps not, but just because

Re: Should vacuum process config file reload more often

2023-03-28 Thread Kyotaro Horiguchi

At Mon, 27 Mar 2023 14:12:03 -0400, Melanie Plageman 
 wrote in 
> So, I've attached an alternate version of the patchset which takes the
> approach of having one commit which only enables cost-based delay GUC
> refresh for VACUUM and another commit which enables it for autovacuum
> and makes the changes to balancing variables.
> 
> I still think the commit which has workers updating their own
> wi_cost_delay in vacuum_delay_point() is a bit weird. It relies on no one
> else emulating our bad behavior and reading from wi_cost_delay without a
> lock and on no one else deciding to ever write to wi_cost_delay (even
> though it is in shared memory [this is the same as master]). It is only
> safe because our process is the only one (right now) writing to
> wi_cost_delay, so when we read from it without a lock, we know it isn't
> being written to. And everyone else takes a lock when reading from
> wi_cost_delay right now. So, it seems...not great.
> 
> This approach also introduces a function that is only around for
> one commit until the next commit obsoletes it, which seems a bit silly.

(I'm not sure what this refers to, though..) I don't think it's silly
if a later patch removes something that the preceding patches
introdcued, as long as that contributes to readability.  Untimately,
they will be merged together on committing.

> Basically, I think it is probably better to just have one commit
> enabling guc refresh for VACUUM and then another which correctly
> implements what is needed for autovacuum to do the same.
> Attached v9 does this.
> 
> I've provided both complete versions of both approaches (v9 and v8).

I took a look at v9 and have a few comments.

0001:

I don't believe it is necessary, as mentioned in the commit
message. It apperas that we are resetting it at the appropriate times.

0002:

I felt a bit uneasy on this. It seems somewhat complex (and makes the
succeeding patches complex), has confusing names, and doesn't seem
like self-contained. I think it'd be simpler to add a global boolean
(maybe VacuumCostActiveForceDisable or such) that forces
VacuumCostActive to be false and set VacuumCostActive using a setter
function that follows the boolean.


0003:

+* Reload the configuration file if requested. This allows changes to
+* vacuum_cost_limit and vacuum_cost_delay to take effect while a table 
is
+* being vacuumed or analyzed. Analyze should not reload configuration
+* file if it is in an outer transaction, as GUC values shouldn't be
+* allowed to refer to some uncommitted state (e.g. database objects
+* created in this transaction).

I'm not sure GUC reload is or should be related to transactions. For
instance, work_mem can be changed by a reload during a transaction
unless it has been set in the current transaction. I don't think we
need to deliberately suppress changes in variables caused by realods
during transactions only for analzye. If analyze doesn't like changes
to certain GUC variables, their values should be snapshotted before
starting the process.


0004:
-   double  at_vacuum_cost_delay;
-   int at_vacuum_cost_limit;
+   double  at_table_option_vac_cost_delay;
+   int at_table_option_vac_cost_limit;

We call that options "relopt(ion)". I don't think there's any reason
to use different names.


dlist_head  av_runningWorkers;
WorkerInfo  av_startingWorker;
AutoVacuumWorkItem av_workItems[NUM_WORKITEMS];
+   pg_atomic_uint32 av_nworkers_for_balance;

The name of the new member doesn't seem to follow the surrounding
convention. (However, I don't think the member is needed. See below.)

-   /*
-* Remember the prevailing values of the vacuum cost GUCs.  We 
have to
-* restore these at the bottom of the loop, else we'll compute 
wrong
-* values in the next iteration of autovac_balance_cost().
-*/
-   stdVacuumCostDelay = VacuumCostDelay;
-   stdVacuumCostLimit = VacuumCostLimit;
+   av_table_option_cost_limit = 
tab->at_table_option_vac_cost_limit;
+   av_table_option_cost_delay = 
tab->at_table_option_vac_cost_delay;

I think this requires a comment.


+   /* There is at least 1 autovac worker (this worker). */
+   int nworkers_for_balance = 
Max(pg_atomic_read_u32(
+   
&AutoVacuumShmem->av_nworkers_for_balance), 1);

I think it *must* be greater than 0. However, to begin with, I don't
think we need that variable to be shared. I don't believe it matters
if we count involved workers every time we calcualte the delay.

+/*
+ * autovac_balance_cost
+ * Recalculate the number of workers to consider, given table 
options and
+ * the current number of active workers.
+ *
+ * Caller must h

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-28 Thread Kyotaro Horiguchi

At Tue, 28 Mar 2023 15:16:36 +0900, Michael Paquier  wrote 
in 
> On Tue, Mar 28, 2023 at 07:49:45AM +0200, Drouvot, Bertrand wrote:
> > What about something like?
> > 
> > for pg_stat_get_xact_blocks_fetched(): "block read requests for table or 
> > index, in the current
> > transaction. This number minus pg_stat_get_xact_blocks_hit() gives the 
> > number of kernel
> > read() calls."
> > 
> > pg_stat_get_xact_blocks_hit(): "block read requests for table or index, in 
> > the current
> > transaction, found in cache (not triggering kernel read() calls)".
> 
> Something among these lines within the table would be also OK by me.
> Horiguchi-san or Melanie-san, perhaps you have counter-proposals?

No. Fine by me, except that "block read requests" seems to suggest
kernel read() calls, maybe because it's not clear whether "block"
refers to our buffer blocks or file blocks to me.. If it is generally
clear, I'm fine with the proposal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: running logical replication as the subscription owner

On Mon, 27 Mar 2023 at 22:47, Robert Haas  wrote:
> Can a table owner defend themselves
> against a subscription owner who wants to usurp their privileges?

I have a hard time following the discussion. And I think it's because
there's lots of different possible privilege escalations to think
about. Below is the list of escalations I've gathered and how I think
they interact with the different patches:
1. Table owner escalating to a high-privileged subscription owner.
i.e. the subscription owner is superuser, or has SET ROLE privileges
for all owners of tables in the subscription.
2. Table owner escalating to a low-privileged subscription owner. e.g.
the subscription owner can only insert into the tables in its
subscription
a. The subscription owner only has insert permissions for a tables
owned by a single other user
b. The subscription owner has insert permissions for tables owned
by multiple other users (but still does not have SET ROLE, or possibly
even select/update/delete)
3. Publisher applying into tables that the subscriber side doesn't want it to
4. Subscription-owner escalating to table owner
a. The subscription owner is highly privileged (allows SET ROLE to
table owner)
b. The subscription owner is lowly privileged

Which can currently only be addressed by having 1
subscription/publication pair for every table owner. This has the big
issue that WAL needs to be decoded multiple times on the publisher.
This patch would make escalation 1 impossible without having to do
anything special when setting up the subscription. With Citus we only
run into this escalation issue with logical replication at the moment.
We want to replicate lots of different tables, possibly owned by
multiple users from one node to another. We trust the publisher and we
trust the subscription owner, but we don't trust the table owners at
all. This is why I'm very much in favor of a version of this patch.

2.a seems a non-issue, because in this case the table owner can easily
give itself the same permissions as the subscription owner (if it
didn't have them yet). So by "escalating" to the subscription owner
the table owner only gets fewer permissions. 2.b is actually
interesting from a security perspective, because by escalating to the
subscription owner, the table owner might be able to insert into
tables that it normally can't. Patch v1 would "solve" both these
issues by simply not supporting these scenarios. My patch v2 keeps the
existing behaviour, where both "escalations" are possible and who-ever
sets up the replication should create a dedicated subscriber for each
table owner to make sure that only 2.a ever occurs and 2.b does not.

3 is something I have not run into. But I can easily imagine a case
where a subscriber connects to a (somewhat) untrusted publisher for
the purpose of replicating changes from a single table, e.g. some
events table. But you don't want to allow replication into any other
tables, even if the publisher tells you to. Patch v1 would force you
to have SET ROLE privileges on the target events table its owner. So
if that owner owns other tables too, then effectively you'd allow the
publisher to write into those tables too. The current behaviour
(without any patch) supports protecting against this escalation, by
giving only INSERT permissions on a single table without the need for
SET ROLE. My v2 patch preserves that ability.

4.a again seems like an obvious non-issue to me because the
subscription owner can already "escalate" to table owner using SET
ROLE. 4.b seems like it's pretty much the same as 3, afaict all the
same arguments apply. And I honestly can't think of a real situation
where you would not trust the subscription owner (as opposed to the
publisher). If any of you have an example of such a situation I'd love
to understand this one better.

All in all, I think patch v2 is the right direction here, because it
covers all these escalations to some extent.

Re: running logical replication as the subscription owner

On Tue, 28 Mar 2023 at 11:15, Jelte Fennema  wrote:
> Which can currently only be addressed by having 1
> subscription/publication pair for every table owner.

Oops, moving sentences around in my email made me not explicitly
reference escalation 1 anymore. The above should have been:
1 can currently only be addressed by having 1 subscription/publication
pair for every table owner.

Re: Save a few bytes in pg_attribute


On 23.03.23 13:45, Peter Eisentraut wrote:
My suggestion is to use this patch and then consider the column 
encryption patch as it stands now.


I have committed this.

Re: Remove 'htmlhelp' documentat format (was meson documentation build open issues)


On 24.03.23 17:58, Andres Freund wrote:

On 2023-03-24 11:59:23 +0100, Peter Eisentraut wrote:

Another option here is to remove support for htmlhelp.


That might actually be the best path - it certainly doesn't look like anybody
has been actively using it. Or otherwise somebody would have complained about
there not being any instructions on how to actually compile a .chm file. And
perhaps complained that it takes next to forever to build.

I also have the impression that people don't use the .chm stuff much anymore,
but that might just be me not using windows.


I think in ancient times, pgadmin used it for its internal help.

But I have heard less about htmlhelp over the years than about the info 
format.

Re: Initial Schema Sync for Logical Replication

2023-03-28 Thread Amit Kapila

On Mon, Mar 27, 2023 at 8:17 AM Masahiko Sawada  wrote:
>
> On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin  wrote:
> >
> > > From: Amit Kapila 
> > > > I think we won't be able to use same snapshot because the transaction 
> > > > will
> > > > be committed.
> > > > In CreateSubscription() we can use the transaction snapshot from
> > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure
> > > > about this part maybe walrcv_disconnect() calls the commits internally 
> > > > ?).
> > > > So somehow we need to keep this snapshot alive, even after transaction
> > > > is committed(or delay committing the transaction , but we can have
> > > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart
> > > > before tableSync is able to use the same snapshot.)
> > > >
> > >
> > > Can we think of getting the table data as well along with schema via
> > > pg_dump? Won't then both schema and initial data will correspond to the
> > > same snapshot?
> >
> > Right , that will work, Thanks!
>
> While it works, we cannot get the initial data in parallel, no?
>

Another possibility is that we dump/restore the schema of each table
along with its data. One thing we can explore is whether the parallel
option of dump can be useful here. Do you have any other ideas?

One related idea is that currently, we fetch the table list
corresponding to publications in subscription and create the entries
for those in pg_subscription_rel during Create Subscription, can we
think of postponing that work till after the initial schema sync? We
seem to be already storing publications list in pg_subscription, so it
appears possible if we somehow remember the value of copy_data. If
this is feasible then I think that may give us the flexibility to
perform the initial sync at a later point by the background worker.

> >
> > > > I think we can have same issues as you mentioned New table t1 is added
> > > > to the publication , User does a refresh publication.
> > > > pg_dump / pg_restore restores the table definition. But before
> > > > tableSync can start,  steps from 2 to 5 happen on the publisher.
> > > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100
> > > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120
> > > > > 5. Insert t1 (3, 3, 3); --LSN 130
> > > > And table sync errors out
> > > > There can be one more issue , since we took the pg_dump without
> > > snapshot (wrt to replication slot).
> > > >
> > >
> > > To avoid both the problems mentioned for Refresh Publication, we can do
> > > one of the following: (a) create a new slot along with a snapshot for this
> > > operation and drop it afterward; or (b) using the existing slot, 
> > > establish a
> > > new snapshot using a technique proposed in email [1].
> > >
> >
> > Thanks, I think option (b) will be perfect, since we don’t have to create a 
> > new slot.
>
> Regarding (b), does it mean that apply worker stops streaming,
> requests to create a snapshot, and then resumes the streaming?
>

Shouldn't this be done by the backend performing a REFRESH publication?

-- 
With Regards,
Amit Kapila.

Re: Remove 'htmlhelp' documentat format (was meson documentation build open issues)

2023-03-28 Thread Dave Page

On Tue, 28 Mar 2023 at 10:46, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 24.03.23 17:58, Andres Freund wrote:
> > On 2023-03-24 11:59:23 +0100, Peter Eisentraut wrote:
> >> Another option here is to remove support for htmlhelp.
> >
> > That might actually be the best path - it certainly doesn't look like
> anybody
> > has been actively using it. Or otherwise somebody would have complained
> about
> > there not being any instructions on how to actually compile a .chm file.
> And
> > perhaps complained that it takes next to forever to build.
> >
> > I also have the impression that people don't use the .chm stuff much
> anymore,
> > but that might just be me not using windows.
>
> I think in ancient times, pgadmin used it for its internal help.
>

Yes, very ancient :-). We use Sphinx now.


>
> But I have heard less about htmlhelp over the years than about the info
> format.
>
>
>

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-28 Thread wangw.f...@fujitsu.com

On Tues, Mar 28, 2023 at 7:02 AM Jacob Champion  wrote:
> On Mon, Mar 20, 2023 at 11:22 PM Amit Kapila 
> wrote:
> > If the tests you have in mind are only related to this patch set then
> > feel free to propose them here if you feel the current ones are not
> > sufficient.
> 
> I think the new tests added by Wang cover my concerns (thanks!). I share
> Peter's comment that we don't seem to have a regression test covering
> only the bug description itself -- just ones that combine that case with
> row and column restrictions -- but if you're all happy with the existing
> approach then I have nothing much to add there.

The scenario of this bug is to subscribe to two publications at the same time,
and these two publications publish parent table and child table respectively.
And option via_root is specified in both publications or only in the publication
of the parent table. At this time, the data on the publisher-side will be copied
twice (the data will be copied to the two tables on the subscribe-side
respectively).
So, I think we have covered this bug itself in 013_partition.pl. We inserted the
initial data into the parent table tab4 on the publisher-side, and checked
whether the sync is completed as we expected (there is data in table tab4, but
there is no data in table tab4_1).

> I was staring at this subquery in fetch_table_list():
> 
> > +"  ( SELECT array_agg(a.attname ORDER 
> > BY a.attnum)\n"
> > +"FROM pg_attribute a\n"
> > +"WHERE a.attrelid = gpt.relid 
> > AND\n"
> > +"  a.attnum = ANY(gpt.attrs)\n"
> > +"  ) AS attnames\n"
> 
> On my machine this takes up roughly 90% of the runtime of the query,
> which makes for a noticeable delay with a bigger test case (a couple of
> FOR ALL TABLES subscriptions on the regression database). And it seems
> like we immediately throw all that work away: if I understand correctly,
> we only use the third column for its interaction with DISTINCT. Would it
> be enough to just replace that whole thing with gpt.attrs?

Make sense.
Changed as suggested.

Attach the new patch.

Regards,
Wang Wei

v25-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch
Description:  v25-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch

Re: Add standard collation UNICODE


On 23.03.23 21:16, Jeff Davis wrote:

Another thought: for ICU, do we want the default collation to be
UNICODE (root collation)? What we have now gets the default from the
environment, which is consistent with the libc provider.

But now that we have the UNICODE collation, it makes me wonder if we
should just default to that. The server's environment doesn't
necessarily say much about the locale of the data stored in it or the
locale of the applications accessing it.


As long as we still have to initialize the libc locale fields to some 
language, I think it would be less confusing to keep the ICU locale on 
the same language.


If we ever manage to get rid of that, then I would also support making 
the ICU locale the root collation by default.

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-28 Thread wangw.f...@fujitsu.com

On Tues, Mar 28, 2023 at 18:00 PM Wang, Wei/王 威  wrote:
> Attach the new patch.

Sorry, I attached the wrong patch.
Here is the correct new version patch which addressed all comments so far.

Regards,
Wang Wei


v25-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch
Description:  v25-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch

Re: doc: add missing "id" attributes to extension packaging page

2023-03-28 Thread Brar Piening


On 28.03.2023 at 00:11, Peter Smith wrote:

FYI, there is a lot of overlap between this last attachment and the
patches of Kuroda-san's current thread here [1] which is also adding
ids to create_subscription.sgml.

(Anyway, I guess you might already be aware of that other thread
because your new ids look like they have the same names as those
chosen by Kuroda-san)

--
[1] 
https://www.postgresql.org/message-id/flat/CAHut%2BPvzo6%3DKKLqMR6-mAQdM%2Bj_dse0eUreGmrFouL7gbLbv2w%40mail.gmail.com#7da8d0e3b73096375847c16c856b4aed


Thanks, I actually was not aware of this.

Also, kudos for capturing the missing id and advocating for consistency
regarding ids even before this is actively enforced. This nurtures my
optimism that consistency might actually be achieveable without
everybody getting angry at me because my patch will enforce it.

Regarding the overlap, I currently try to make it as easy as possible
for a potential committer and I'm happy to rebase my patch upon request
or if Kuroda-san's patch gets applied first.

Regards,

Brar

Move definition of standard collations from initdb to pg_collation.dat

While working on [0], I was wondering why the collations ucs_basic and 
unicode are not in pg_collation.dat.  I traced this back through 
history, and I think this was just lost in a game of telephone.


The initial commit for pg_collation.h (414c5a2ea6) has only the default 
collation in pg_collation.h (pre .dat), with initdb handling everything 
else.  Over time, additional collations "C" and "POSIX" were moved to 
pg_collation.h, and other logic was moved from initdb to 
pg_import_system_collations().  But ucs_basic was untouched.  Commit 
0b13b2a771 rearranged the relative order of operations in initdb and 
added the current comment "We don't want to pin these", but looking at 
the email[1], I think this was more a guess about the previous intent.


I suggest we fix this now; see attached patch.


[0]: 
https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d3c2%40enterprisedb.com


[1]: https://www.postgresql.org/message-id/28195.1498172402%40sss.pgh.pa.usFrom 0d2c6b92a3340833f13bab395e0556ce1f045226 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 28 Mar 2023 12:04:34 +0200
Subject: [PATCH] Move definition of standard collations from initdb to
 pg_collation.dat

---
 src/bin/initdb/initdb.c  | 15 +--
 src/include/catalog/pg_collation.dat |  7 +++
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index bae97539fc..9ccbf998ec 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1695,20 +1695,7 @@ setup_description(FILE *cmdfd)
 static void
 setup_collation(FILE *cmdfd)
 {
-   /*
-* Add SQL-standard names.  We don't want to pin these, so they don't go
-* in pg_collation.dat.  But add them before reading system collations, 
so
-* that they win if libc defines a locale with the same name.
-*/
-   PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, 
collowner, collprovider, collisdeterministic, collencoding, colliculocale)"
- "VALUES 
(pg_nextoid('pg_catalog.pg_collation', 'oid', 
'pg_catalog.pg_collation_oid_index'), 'unicode', 'pg_catalog'::regnamespace, 
%u, '%c', true, -1, 'und');\n\n",
- BOOTSTRAP_SUPERUSERID, COLLPROVIDER_ICU);
-
-   PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, 
collowner, collprovider, collisdeterministic, collencoding, collcollate, 
collctype)"
- "VALUES 
(pg_nextoid('pg_catalog.pg_collation', 'oid', 
'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, 
%u, '%c', true, %d, 'C', 'C');\n\n",
- BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, 
PG_UTF8);
-
-   /* Now import all collations we can find in the operating system */
+   /* Import all collations we can find in the operating system */
PG_CMD_PUTS("SELECT pg_import_system_collations('pg_catalog');\n\n");
 }
 
diff --git a/src/include/catalog/pg_collation.dat 
b/src/include/catalog/pg_collation.dat
index f4bda1c769..14df398ad2 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -23,5 +23,12 @@
   descr => 'standard POSIX collation',
   collname => 'POSIX', collprovider => 'c', collencoding => '-1',
   collcollate => 'POSIX', collctype => 'POSIX' },
+{ oid => '962',
+  descr => 'sorts using the Unicode Collation Algorithm with default settings',
+  collname => 'unicode', collprovider => 'i', collencoding => '-1',
+  colliculocale => 'und' },
+{ oid => '963', descr => 'sorts by Unicode code point',
+  collname => 'ucs_basic', collprovider => 'c', collencoding => '6',
+  collcollate => 'C', collctype => 'C' },
 
 ]
-- 
2.40.0

"variable not found in subplan target list"

2023-03-28 Thread Alvaro Herrera

I have to run now so can't dissect it, but while running sqlsmith on the
SQL/JSON patch after Justin's report, I got $SUBJECT in this query:

MERGE INTO public.target_parted as target_0
USING (select  
  subq_0.c5 as c0, 
  subq_0.c0 as c1, 
  ref_0.a as c2, 
  subq_0.c1 as c3, 
  subq_0.c9 as c4, 
  (select c from public.prt2_m_p3 limit 1 offset 1)
 as c5, 
  subq_0.c8 as c6, 
  ref_0.a as c7, 
  subq_0.c7 as c8, 
  subq_0.c1 as c9, 
  pg_catalog.system_user() as c10
from 
  public.itest1 as ref_0
left join (select  
  ref_1.matches as c0, 
  ref_1.typ as c1, 
  ref_1.colname as c2, 
  (select slotname from public.iface limit 1 offset 44)
 as c3, 
  ref_1.matches as c4, 
  ref_1.op as c5, 
  ref_1.matches as c6, 
  ref_1.value as c7, 
  ref_1.op as c8, 
  ref_1.op as c9, 
  ref_1.typ as c10
from 
  public.brinopers_multi as ref_1
where cast(null as polygon) <@ (select polygon from 
public.tab_core_types limit 1 offset 22)
) as subq_0
on (cast(null as macaddr8) >= cast(null as macaddr8))
where subq_0.c10 > subq_0.c2
limit 49) as subq_1
ON target_0.b = subq_1.c2 
WHEN MATCHED 
  AND (cast(null as box) |>> cast(null as box)) 
or (cast(null as lseg) ?-| (select s from public.lseg_tbl limit 1 
offset 6)
)
   THEN DELETE
WHEN NOT MATCHED AND (EXISTS (
  select  
  21 as c0, 
  subq_2.c0 as c1
from 
  public.itest14 as sample_0 tablesample system (3.6) 
inner join public.num_exp_sqrt as sample_1 tablesample 
bernoulli (0.3) 
on (cast(null as "char") <= cast(null as "char")),
  lateral (select  
sample_1.id as c0
  from 
public.a as ref_2
  where (cast(null as lseg) <@ cast(null as line)) 
or ((select b3 from public.bit_defaults limit 1 offset 80)
 <> (select b3 from public.bit_defaults limit 1 offset 
4)
)
  limit 158) as subq_2
where (cast(null as name) !~ (select t from public.test_tsvector 
limit 1 offset 5)
  ) 
  and ((select bool from public.tab_core_types limit 1 offset 61)
   < (select pg_catalog.bool_or(v) from public.rtest_view1)
  ))) 
or (18 is NULL)
   THEN INSERT VALUES ( pg_catalog.int4um(
cast(public.func_with_bad_set() as int4)), 13)
WHEN MATCHED AND ((24 is not NULL) 
  or (true)) 
or (cast(null as "timestamp") <= cast(null as timestamptz))
   THEN UPDATE  set 
b = target_0.b


Ugh.

I got no more SQL/JSON related crashes so far.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)

Re: Move definition of standard collations from initdb to pg_collation.dat

Peter Eisentraut  writes:
> While working on [0], I was wondering why the collations ucs_basic and 
> unicode are not in pg_collation.dat.  I traced this back through 
> history, and I think this was just lost in a game of telephone.
> The initial commit for pg_collation.h (414c5a2ea6) has only the default 
> collation in pg_collation.h (pre .dat), with initdb handling everything 
> else.  Over time, additional collations "C" and "POSIX" were moved to 
> pg_collation.h, and other logic was moved from initdb to 
> pg_import_system_collations().  But ucs_basic was untouched.  Commit 
> 0b13b2a771 rearranged the relative order of operations in initdb and 
> added the current comment "We don't want to pin these", but looking at 
> the email[1], I think this was more a guess about the previous intent.

Yeah, I was just loath to change the previous behavior in that
patch.  I can't see any strong reason not to pin these entries.

> I suggest we fix this now; see attached patch.

While we're here, do we want to adopt some other spelling of "the
root locale" than "und", in view of recent discoveries about the
instability of that on old ICU versions?

regards, tom lane

Re: "variable not found in subplan target list"

Alvaro Herrera  writes:
> I have to run now so can't dissect it, but while running sqlsmith on the
> SQL/JSON patch after Justin's report, I got $SUBJECT in this query:

Reproduces in HEAD and v15 too (once you replace pg_catalog.system_user
with some function that exists in v15).  So it's not the fault of the
JSON patch, nor of my outer-join hacking which had been my first thought.

regards, tom lane

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

On Mon, Dec 19, 2022 at 12:37:19PM +0100, Alvaro Herrera wrote:
> I have created one in the January commitfest,
> https://commitfest.postgresql.org/41/
> and rebased the patch on current master.  (I have not reviewed this.)

I have spent some time on that, and here are some comments with an
updated version of the patch attached.

The checks in XLogRegisterData() seemed overcomplicated to me.  In
this context, I think that we should just care about making sure that
mainrdata_len does not overflow depending on the length given by the
caller, which is where pg_add_u32_overflow() becomes handy.

XLogRegisterBufData() added a check on UINT16_MAX in an assert, though
we already check for overflow a couple of lines down.  This is not
necessary, it seems.

@@ -535,6 +567,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecord *rechdr;
char   *scratch = hdr_scratch;

+   /* ensure that any assembled record can be decoded */
+   Assert(AllocSizeIsValid(DecodeXLogRecordRequiredSpace(MaxXLogRecordSize)));

A hardcoded check like that has no need to be in a code path triggered
each time a WAL record is assembled.  One place where this could be is
InitXLogInsert().  It still means that it is called one time for each
backend, but seeing it where the initialization of xloginsert.c feels
natural, at least.  A postmaster location would be enough, as well.

XLogRecordMaxSize just needs to be checked once IMO, around the end of
XLogRecordAssemble() once we know the total size of the record that
will be fed to a XLogReader.  One thing that we should be more careful
of is to make sure that total_len does not overflow its uint32 value
while assembling the record, as well.

I have removed XLogErrorDataLimitExceeded(), replacing it with more
context about the errors happening.  Perhaps this has no need to be
that much verbose, but it can be really useful for developers.

Some comments had no need to be updated, and there were some typos.

I am on board with the idea of a XLogRecordMaxSize that's bounded at
1020MB, leaving 4MB as room for the extra data needed by a
XLogReader.

At the end, I think that this is quite interesting long-term.  For
example, if we lift up XLogRecordMaxSize, we can evaluate the APIs
adding buffer data or main data separately.

Thoughts about this version?
--
Michael
From 16b40324a5bc5bbe6e06cbfb86251c907f400151 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 28 Mar 2023 20:34:31 +0900
Subject: [PATCH v10] Add protections in xlog record APIs against overflows

Before this, it was possible for an extension to create malicious WAL
records that were too large to replay; or that would overflow the
xl_tot_len field, causing potential corruption in WAL record IO ops.

Emitting invalid records was also possible through
pg_logical_emit_message(), which allowed you to emit arbitrary amounts
of data up to 2GB, much higher than the replay limit of approximately
1GB.

This patch adds a limit to the size of an XLogRecord (1020MiB), checks
against overflows, and ensures that the reader infrastructure can read
any max-sized XLogRecords, such that the wal-generating backend will
fail when it attempts to insert unreadable records, instead of that
insertion succeeding but breaking any replication streams.

Author: Matthias van de Meent 
---
 src/include/access/xlogrecord.h | 11 
 src/backend/access/transam/xloginsert.c | 73 +
 2 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index 0d576f7883..0a7b0bb6fc 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -54,6 +54,17 @@ typedef struct XLogRecord

 #define SizeOfXLogRecord	(offsetof(XLogRecord, xl_crc) + sizeof(pg_crc32c))

+/*
+ * XLogReader needs to allocate all the data of an xlog record in a single
+ * chunk.  This means that a single XLogRecord cannot exceed MaxAllocSize
+ * in length if we ignore any allocation overhead of the XLogReader.
+ *
+ * To accommodate some overhead, this value allows for 4M of allocation
+ * overhead, that should be plenty enough for what
+ * DecodeXLogRecordRequiredSpace() expects as extra.
+ */
+#define XLogRecordMaxSize	(1020 * 1024 * 1024)
+
 /*
  * The high 4 bits in xl_info may be used freely by rmgr. The
  * XLR_SPECIAL_REL_UPDATE and XLR_CHECK_CONSISTENCY bits can be passed by
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 008612e032..86fa6a5276 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -33,6 +33,7 @@
 #include "access/xloginsert.h"
 #include "catalog/pg_control.h"
 #include "common/pg_lzcompress.h"
+#include "common/int.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -351,11 +352,13 @@ void
 XLogRegisterData(char *data, uint32 len)
 {
 	XLogRecData *rdata;
+	uint32		result = 0;

 	Assert(begininsert_call

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-28 Thread torikoshia


On 2023-03-27 23:28, Damir Belyalov wrote:

Hi!

I made the specified changes and my patch turned out the same as
yours. The performance measurements were the same too.


Thanks for your review and measurements.


The only thing left to do is how not to add IGNORE_DATATYPE_ERRORS as
a keyword. See how this is done for parameters such as FORCE_NOT_NULL,
FORCE_NULL, FORCE_QUOTE. They are not in kwlist.h and are not as
keywords in gram.y.


I might misunderstand something, but I believe the v5 patch uses 
copy_generic_opt_list and it does not add IGNORE_DATATYPE_ERRORS as a 
keyword.

It modifies neither kwlist.h nor gram.y.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-28 Thread Damir Belyalov

>
> I might misunderstand something, but I believe the v5 patch uses
> copy_generic_opt_list and it does not add IGNORE_DATATYPE_ERRORS as a
> keyword.
> It modifies neither kwlist.h nor gram.y.
>

Sorry, didn't notice that. I think everything is alright now.

Regards,
Damir Belyalov
Postgres Professional

Re: PGdoc: add missing ID attribute to create_subscription.sgml

2023-03-28 Thread Amit Kapila

On Tue, Mar 28, 2023 at 1:00 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Amit,
>
> Thank you for reviewing! PSA new version.
>
> > Isn't it better to move the link-related part to the next line
> > wherever possible? Currently, it looks bit odd.
>
> Previously I preferred not to add a new line inside the  tag, but it 
> caused
> long-line. So I adjusted them not to be too short/long length.
>

There is no need to break the link line. See attached.

-- 
With Regards,
Amit Kapila.


v7-0001-Add-XML-ID-attributes-to-create_subscription.sgml.patch
Description: Binary data

Re: Add standard collation UNICODE

2023-03-28 Thread Joe Conway


On 3/28/23 06:07, Peter Eisentraut wrote:

On 23.03.23 21:16, Jeff Davis wrote:

Another thought: for ICU, do we want the default collation to be
UNICODE (root collation)? What we have now gets the default from the
environment, which is consistent with the libc provider.

But now that we have the UNICODE collation, it makes me wonder if we
should just default to that. The server's environment doesn't
necessarily say much about the locale of the data stored in it or the
locale of the applications accessing it.


As long as we still have to initialize the libc locale fields to some
language, I think it would be less confusing to keep the ICU locale on
the same language.


I definitely agree with that.


If we ever manage to get rid of that, then I would also support making
the ICU locale the root collation by default.


+1

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: "variable not found in subplan target list"

Alvaro Herrera  writes:
> I have to run now so can't dissect it, but while running sqlsmith on the
> SQL/JSON patch after Justin's report, I got $SUBJECT in this query:

I reduced this down to

MERGE INTO public.target_parted as target_0
USING public.itest1 as ref_0
ON target_0.b = ref_0.a
WHEN NOT MATCHED
   THEN INSERT VALUES (42, 13);

The critical moving part seems to just be that the MERGE target
is a partitioned table ... but surely somebody tested that before?

regards, tom lane

Re: Hash table scans outside transactions

Bumping it to attract some attention.

On Tue, Mar 21, 2023 at 12:51 PM Ashutosh Bapat
 wrote:
>
> Hi,
> Hash table scans (seq_scan_table/level) are cleaned up at the end of a
> transaction in AtEOXact_HashTables(). If a hash seq scan continues
> beyond transaction end it will meet "ERROR: no hash_seq_search scan
> for hash table" in deregister_seq_scan(). That seems like a limiting
> the hash table usage.
>
> Our use case is
> 1. Add/update/remove entries in hash table
> 2. Scan the existing entries and perform one transaction per entry
> 3. Close scan
>
> repeat above steps in an infinite loop. Note that we do not
> add/modify/delete entries in step 2. We can't use linked lists since
> the entries need to be updated or deleted using hash keys. Because the
> hash seq scan is cleaned up at the end of the transaction, we
> encounter error in the 3rd step. I don't see that the actual hash
> table scan depends upon the seq_scan_table/level[] which is cleaned up
> at the end of the transaction.
>
> I have following questions
> 1. Is there a way to avoid cleaning up seq_scan_table/level() when the
> transaction ends?
> 2. Is there a way that we can use hash table implementation in
> PostgreSQL code for our purpose?
>
>
> --
> Best Wishes,
> Ashutosh Bapat



-- 
Best Wishes,
Ashutosh Bapat

Re: "variable not found in subplan target list"

I wrote:
> I reduced this down to

> MERGE INTO public.target_parted as target_0
> USING public.itest1 as ref_0
> ON target_0.b = ref_0.a
> WHEN NOT MATCHED
>THEN INSERT VALUES (42, 13);

> The critical moving part seems to just be that the MERGE target
> is a partitioned table ... but surely somebody tested that before?

Oh, it's not just any partitioned table:

regression=# \d+ target_parted
Partitioned table "public.target_parted"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression | 
Stats target | Description 
+-+---+--+-+-+-+--+-
 a  | integer |   |  | | plain   | |
  | 
 b  | integer |   |  | | plain   | |
  | 
Partition key: LIST (a)
Number of partitions: 0

The planner is reducing the scan of target_parted to
a dummy scan, as is reasonable, but it forgets to
provide ctid as an output from that scan; then the
parent join node is unhappy because it does have
a ctid output.  So it looks like the problem is some
shortcut we take while creating the dummy scan.

I suppose that without the planner bug, this'd fail at
runtime for lack of a partition to put (42,13) into.
Because of that, the case isn't really interesting
for production, which may explain the lack of reports.

regards, tom lane

Re: Memory leak from ExecutorState context?

2023-03-28 Thread Jehan-Guillaume de Rorthais

On Tue, 28 Mar 2023 00:43:34 +0200
Tomas Vondra  wrote:

> On 3/27/23 23:13, Jehan-Guillaume de Rorthais wrote:
> > Please, find in attachment a patch to allocate bufFiles in a dedicated
> > context. I picked up your patch, backpatch'd it, went through it and did
> > some minor changes to it. I have some comment/questions thought.
> > 
> >   1. I'm not sure why we must allocate the "HashBatchFiles" new context
> >   under ExecutorState and not under hashtable->hashCxt?
> > 
> >   The only references I could find was in hashjoin.h:30:
> > 
> >/* [...]
> > * [...] (Exception: data associated with the temp files lives in the
> > * per-query context too, since we always call buffile.c in that
> > context.)
> > 
> >   And in nodeHashjoin.c:1243:ExecHashJoinSaveTuple() (I reworded this
> >   original comment in the patch):
> > 
> >/* [...]
> > * Note: it is important always to call this in the regular executor
> > * context, not in a shorter-lived context; else the temp file buffers
> > * will get messed up.
> > 
> > 
> >   But these are not explanation of why BufFile related allocations must be
> > under a per-query context. 
> >   
> 
> Doesn't that simply describe the current (unpatched) behavior where
> BufFile is allocated in the per-query context? 

I wasn't sure. The first quote from hashjoin.h seems to describe a stronger
rule about «**always** call buffile.c in per-query context». But maybe it ought
to be «always call buffile.c from one of the sub-query context»? I assume the
aim is to enforce the tmp files removal on query end/error?

> I mean, the current code calls BufFileCreateTemp() without switching the
> context, so it's in the ExecutorState. But with the patch it very clearly is
> not.
> 
> And I'm pretty sure the patch should do
> 
> hashtable->fileCxt = AllocSetContextCreate(hashtable->hashCxt,
>  "HashBatchFiles",
>  ALLOCSET_DEFAULT_SIZES);
> 
> and it'd still work. Or why do you think we *must* allocate it under
> ExecutorState?

That was actually my very first patch and it indeed worked. But I was confused
about the previous quoted code comments. That's why I kept your original code
and decided to rise the discussion here.

Fixed in new patch in attachment.

> FWIW The comment in hashjoin.h needs updating to reflect the change.

Done in the last patch. Is my rewording accurate?

> >   2. Wrapping each call of ExecHashJoinSaveTuple() with a memory context
> > switch seems fragile as it could be forgotten in futur code path/changes.
> > So I added an Assert() in the function to make sure the current memory
> > context is "HashBatchFiles" as expected.
> >   Another way to tie this up might be to pass the memory context as
> > argument to the function.
> >   ... Or maybe I'm over precautionary.
> >   
> 
> I'm not sure I'd call that fragile, we have plenty other code that
> expects the memory context to be set correctly. Not sure about the
> assert, but we don't have similar asserts anywhere else.

I mostly sticked it there to stimulate the discussion around this as I needed
to scratch that itch.

> But I think it's just ugly and overly verbose

+1

Your patch was just a demo/debug patch by the time. It needed some cleanup now
:)

> it'd be much nicer to e.g. pass the memory context as a parameter, and do
> the switch inside.

That was a proposition in my previous mail, so I did it in the new patch. Let's
see what other reviewers think.

> >   3. You wrote:
> >   
> >>> A separate BufFile memory context helps, although people won't see it
> >>> unless they attach a debugger, I think. Better than nothing, but I was
> >>> wondering if we could maybe print some warnings when the number of batch
> >>> files gets too high ...  
> > 
> >   So I added a WARNING when batches memory are exhausting the memory size
> >   allowed.
> > 
> >+   if (hashtable->fileCxt->mem_allocated > hashtable->spaceAllowed)
> >+   elog(WARNING, "Growing number of hash batch is exhausting
> > memory");
> > 
> >   This is repeated on each call of ExecHashIncreaseNumBatches when BufFile
> >   overflows the memory budget. I realize now I should probably add the
> > memory limit, the number of current batch and their memory consumption.
> >   The message is probably too cryptic for a user. It could probably be
> >   reworded, but some doc or additionnal hint around this message might help.
> >   
> 
> Hmmm, not sure is WARNING is a good approach, but I don't have a better
> idea at the moment.

I stepped it down to NOTICE and added some more infos.

Here is the output of the last patch with a 1MB work_mem:

  =# explain analyze select * from small join large using (id);
  WARNING:  increasing number of batches from 1 to 2
  WARNING:  increasing number of batches from 2 to 4
  WARNING:  increasing number of batches from 4 to 8
  WARNING:  increasing number of batches from 8 to 16
  WARNING:  increasing numb

Re: TAP output format in pg_regress

2023-03-28 Thread Daniel Gustafsson

> On 15 Mar 2023, at 11:36, Peter Eisentraut 
>  wrote:
> 
> On 24.02.23 10:49, Daniel Gustafsson wrote:
>> Another rebase on top of 337903a16f.  Unless there are conflicting reviews, I
>> consider this patch to be done and ready for going in during the next CF.
> 
> I think this is just about as good as it's going to get, so I think we can 
> consider committing this soon.
> 
> A few comments along the way:
> 
> 1) We can remove the gettext markers _() inside calls like note(), bail(), 
> etc.  If we really wanted to do translation, we would do that inside those 
> functions (similar to errmsg() etc.).

Fixed.

The attached also removes all explicit \n from output and leaves the decision
on when to add a linebreak to the TAP emitting function.  I think this better
match how we typically handle printing of output like this.  It also ensures
that all bail messages follow the same syntax.

> 2) There are a few fprintf(stderr, ...) calls left.  Should those be changed 
> to something TAP-enabled?

Initially the patch kept errors happening before testing started a non-TAP
output, there were leftovers which are now converted.

> 3) Maybe these lines
> 
> +++ isolation check in src/test/isolation +++
> 
> should be changed to TAP format?  Arguably, if I run "make -s check", then 
> everything printed should be valid TAP, right?

Fixed.

> 4) From the introduction lines
> 
> == creating temporary instance==
> == initializing database system   ==
> == starting postmaster==
> running on port 61696 with PID 85346
> == creating database "regression" ==
> == running regression test queries==
> 
> you have kept
> 
> # running on port 61696 with PID 85346
> 
> which, well, is that the most interesting of those?
> 
> The first three lines (creating, initializing, starting) take some noticeable 
> amount of time, so they could be kept as a progress indicator.  Or just 
> delete all of them?  I suppose some explicit indication about temp-instance 
> versus existing instance would be useful.

This was discussed in 20220221164736.rq3ornzjdkmwk...@alap3.anarazel.de which
concluded that this was about the only thing of interest, and even at that it
was more of a maybe than a definite yes. 

As this patch stands, it prints the above for a temp install and the host/port
for an existing install, but I don't have ny problems removing that as well.

> 5) About the output format.  Obviously, this will require some retraining of 
> the eye.  But my first impression is that there is a lot of whitespace 
> without any guiding lines, so to speak, horizontally or vertically.  It makes 
> me a bit dizzy. ;-)
> 
> I think instead
> 
> # parallel group (2 tests):  event_trigger oidjoins
> ok 210  event_trigger  131 ms
> ok 211  oidjoins   190 ms
> ok 212   fast_default  158 ms
> ok 213   tablespace319 ms
> 
> Something like this would be less dizzy-making:
> 
> # parallel group (2 tests):  event_trigger oidjoins
> ok 210 - + event_trigger   131 ms
> ok 211 - + oidjoins190 ms
> ok 212 - fast_default  158 ms
> ok 213 - tablespace319 ms

The current format is chosen to be close to the old format, while also adding
sufficient padding that it won't yield ragged columns.  The wide padding is
needed to cope with long names in the isolation and ecgp test suites and not so
much regress suite.

In the attached I've dialled back the padding a little bit to make it a bit
more compact, but I doubt it's enough.

> Just an idea, we don't have to get this exactly perfect right now, but it's 
> something to think about.

I'm quite convinced that this will be revisited once this lands and is in front
of developers.

I think the attached is a good candidate for going in, but I would like to see 
it
for another spin in the CF bot first.

--
Daniel Gustafsson

v18-0001-pg_regress-Emit-TAP-compliant-output.patch
Description: Binary data

Re: Add standard collation UNICODE

On Tue, 2023-03-28 at 08:46 -0400, Joe Conway wrote:
> > As long as we still have to initialize the libc locale fields to
> > some
> > language, I think it would be less confusing to keep the ICU locale
> > on
> > the same language.
> 
> I definitely agree with that.

Sounds good -- no changes then.

Regards,
Jeff Davis

>

Re: Infinite Interval

On Sun, Mar 19, 2023 at 12:18 AM Joseph Koshakow  wrote:
>
> The problem is that `secs = rint(secs)` rounds the seconds too early
> and loses any fractional seconds. Do we have an overflow detecting
> multiplication function for floats?

We have float8_mul() which checks for overflow. typedef double float8;

>
> >+if (INTERVAL_NOT_FINITE(result))
> >+ereport(ERROR,
> >+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> >+ errmsg("interval out of range")));
> >
> >Probably, I added these kind of checks. But I don't remember if those are
> >defensive checks or whether it's really possible that the arithmetic 
> > above
> >these lines can yield an non-finite interval.
>
> These checks appear in `make_interval`, `justify_X`,
> `interval_um_internal`, `interval_pl`, `interval_mi`, `interval_mul`,
> `interval_div`. For all of these it's possible that the interval
> overflows/underflows the non-finite ranges, but does not
> overflow/underflow the data type. For example
> `SELECT INTERVAL '2147483646 months' + INTERVAL '1 month'` would error
> on this check.

Without this patch
postgres@64807=#SELECT INTERVAL '2147483646 months' + INTERVAL '1 month';
?column?

 178956970 years 7 mons
(1 row)

That result looks correct

postgres@64807=#select 178956970 * 12 + 7;
  ?column?

 2147483647
(1 row)

So some backward compatibility break. I don't think we can avoid the
backward compatibility break without expanding interval structure and
thus causing on-disk breakage. But we can reduce the chances of
breaking, if we change INTERVAL_NOT_FINITE to check all the three
fields, instead of just month.

>
>
> >+else
> >+{
> >+result->time = -interval->time;
> >+result->day = -interval->day;
> >+result->month = -interval->month;
> >+
> >+if (INTERVAL_NOT_FINITE(result))
> >+ereport(ERROR,
> >+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> >+ errmsg("interval out of range")));
> >
> >If this error ever gets to the user, it could be confusing. Can we 
> > elaborate by
> >adding context e.g. errcontext("while negating an interval") or some 
> > such?
>
> Done.

Thanks. Can we add relevant contexts at similar other places?

Also if we use all the three fields, we will need to add such checks
in interval_justify_hours()

>
> I replaced these checks with the following:
>
> + else if (interval->time == PG_INT64_MIN || interval->day == PG_INT32_MIN || 
> interval->month == PG_INT32_MIN)
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
>
> I think this covers the same overflow check but is maybe a bit more
> obvious. Unless, there's something I'm missing?

Thanks. Your current version is closer to int4um().

Some more review comments in the following email.

--
Best Wishes,
Ashutosh Bapat

Re: SQL/JSON revisited

2023-03-28 Thread Erik Rijkers


Op 3/27/23 om 20:54 schreef Alvaro Herrera:

Docs amended as I threatened.  Other than that, this has required more


> [v12-0001-SQL-JSON-constructors.patch]
> [v12-0001-delta-uniqueifyJsonbObject-bugfix.patch]

In doc/src/sgml/func.sgml, some minor stuff:

'which specify the data type returned'  should be
'which specifies the data type returned'

In the json_arrayagg() description, it says:
'If ABSENT ON NULL is specified, any NULL values are omitted.'
That's true, but as omitting NULL values is the default (i.e., also 
without that clause) maybe it's better to say:

'Any NULL values are omitted unless NULL ON NULL is specified'


I've found no bugs in functionality.

Thanks,

Erik Rijkers

Re: Infinite Interval

On Sun, Mar 26, 2023 at 1:28 AM Tom Lane  wrote:
>
> I think you can take it as read that simple C test programs on modern
> platforms will exhibit IEEE-compliant handling of float infinities.
>

For the record, I tried the attached. It gives a warning at compilation time.

$gcc float_inf.c
float_inf.c: In function ‘main’:
float_inf.c:10:17: warning: division by zero [-Wdiv-by-zero]
   10 |  float inf = 1.0/0;
  | ^
float_inf.c:11:20: warning: division by zero [-Wdiv-by-zero]
   11 |  float n_inf = -1.0/0;
  |^
$ ./a.out
inf = inf
-inf = -inf
inf + inf = inf
inf + -inf = -nan
-inf + inf = -nan
-inf + -inf = -inf
inf - inf = -nan
inf - -inf = inf
-inf - inf = -inf
-inf - -inf = -nan
float 0.0 equals 0.0
float 1.0 equals 1.0
 5.0 * inf = inf
 5.0 * - inf = -inf
 5.0 / inf = 0.00
 5.0 / - inf = -0.00
 inf / 5.0 = inf
 - inf / 5.0 = -inf

The changes in the patch are compliant with the observations above.

-- 
Best Wishes,
Ashutosh Bapat
#include 


void
test_flag(int flag, char *flag_name);

int
main(void)
{
	float inf = 1.0/0;
	float n_inf = -1.0/0;

	printf("inf = %f\n", inf);
	printf("-inf = %f\n", n_inf);

	/* Additions */
	printf("inf + inf = %f\n", inf + inf);
	printf("inf + -inf = %f\n", inf + n_inf);
	printf("-inf + inf = %f\n", n_inf + inf);
	printf("-inf + -inf = %f\n", n_inf + n_inf);

	/* Subtractions */
	printf("inf - inf = %f\n", inf - inf);
	printf("inf - -inf = %f\n", inf - n_inf);
	printf("-inf - inf = %f\n", n_inf - inf);
	printf("-inf - -inf = %f\n", n_inf - n_inf);

	if (0.0 == 0.0)
		printf("float 0.0 equals 0.0\n");

	if (0.5 + 0.5 == .64 + .36)
		printf("float 1.0 equals 1.0\n");

	/* Multiplication */
	printf(" 5.0 * inf = %f\n", 5.0 * inf);
	printf(" 5.0 * - inf = %f\n", 5.0 * n_inf);

	/* Division */
	printf(" 5.0 / inf = %f\n", 5.0 / inf);
	printf(" 5.0 / - inf = %f\n", 5.0 / n_inf);
	printf(" inf / 5.0 = %f\n", inf / 5.0);
	printf(" - inf / 5.0 = %f\n", n_inf / 5.0);
}

Re: Infinite Interval

On Mon, Mar 20, 2023 at 3:16 AM Joseph Koshakow  wrote:
>
>
>
> On Sun, Mar 19, 2023 at 5:13 PM Tom Lane  wrote:
> >
> >Did you actually write "if TIMESTAMP_IS_NOBEGIN(dt2)" and not
> >"if (TIMESTAMP_IS_NOBEGIN(dt2))"?  If the former, I'm not surprised
> >that pgindent gets confused.  The parentheses are required by the
> >C standard.  Your code might accidentally work because the macro
> >has parentheses internally, but call sites have no business
> >knowing that.  For example, it would be completely legit to change
> >TIMESTAMP_IS_NOBEGIN to be a plain function, and then this would be
> >syntactically incorrect.
>
> Oh duh. I've been doing too much Rust development and did this without
> thinking. I've attached a patch with a fix.
>

Thanks for fixing this.

On this latest patch, I have one code comment

@@ -3047,7 +3180,30 @@ timestamptz_pl_interval_internal(TimestampTz timestamp,
TimestampTz result;
int tz;

-   if (TIMESTAMP_NOT_FINITE(timestamp))
+   /*
+* Adding two infinites with the same sign results in an infinite
+* timestamp with the same sign. Adding two infintes with different signs
+* results in an error.
+*/
+   if (INTERVAL_IS_NOBEGIN(span))
+   {
+   if TIMESTAMP_IS_NOEND(timestamp)
+   ereport(ERROR,
+   (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+errmsg("interval out of range")));
+   else
+   TIMESTAMP_NOBEGIN(result);
+   }
+   else if (INTERVAL_IS_NOEND(span))
+   {
+   if TIMESTAMP_IS_NOBEGIN(timestamp)
+   ereport(ERROR,
+   (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+errmsg("interval out of range")));
+   else
+   TIMESTAMP_NOEND(result);
+   }
+   else if (TIMESTAMP_NOT_FINITE(timestamp))

This code is duplicated in timestamp_pl_interval(). We could create a function
to encode the infinity handling rules and then call it in these two places. The
argument types are different, Timestamp and TimestampTz viz. which map to in64,
so shouldn't be a problem. But it will be slightly unreadable. Or use macros
but then it will be difficult to debug.

What do you think?

Next I will review the test changes and also make sure that every
operator that interval as one of its operands or the result has been
covered in the code. This is the following list

#select oprname, oprcode from pg_operator where oprleft =
'interval'::regtype or oprright = 'interval'::regtype or oprresult =
'interval'::regtype;
 oprname | oprcode
-+-
 +   | date_pl_interval
 -   | date_mi_interval
 +   | timestamptz_pl_interval
 -   | timestamptz_mi
 -   | timestamptz_mi_interval
 =   | interval_eq
 <>  | interval_ne
 <   | interval_lt
 <=  | interval_le
 >   | interval_gt
 >=  | interval_ge
 -   | interval_um
 +   | interval_pl
 -   | interval_mi
 -   | time_mi_time
 *   | interval_mul
 *   | mul_d_interval
 /   | interval_div
 +   | time_pl_interval
 -   | time_mi_interval
 +   | timetz_pl_interval
 -   | timetz_mi_interval
 +   | interval_pl_time
 +   | timestamp_pl_interval
 -   | timestamp_mi
 -   | timestamp_mi_interval
 +   | interval_pl_date
 +   | interval_pl_timetz
 +   | interval_pl_timestamp
 +   | interval_pl_timestamptz
(30 rows)

-- 
Best Wishes,
Ashutosh Bapat

Re: TAP output format in pg_regress

2023-03-28 Thread Daniel Gustafsson

> On 28 Mar 2023, at 15:26, Daniel Gustafsson  wrote:

> I think the attached is a good candidate for going in, but I would like to 
> see it
> for another spin in the CF bot first.

Another candidate due to a thinko which raised a compiler warning.

--
Daniel Gustafsson



v19-0001-pg_regress-Emit-TAP-compliant-output.patch
Description: Binary data

Re: Move definition of standard collations from initdb to pg_collation.dat


On 28.03.23 13:33, Tom Lane wrote:

While we're here, do we want to adopt some other spelling of "the
root locale" than "und", in view of recent discoveries about the
instability of that on old ICU versions?


That issue was fixed by 3b50275b12, so we can keep using the "und" spelling.

Re: Request for comment on setting binary format output per session

2023-03-28 Thread Dave Cramer

On Sun, 26 Mar 2023 at 21:30, Tom Lane  wrote:

> Dave Cramer  writes:
> > On Sun, 26 Mar 2023 at 18:12, Tom Lane  wrote:
> >> I would not expect DISCARD ALL to reset a session-level property.
>
> > Well if we can't reset it with DISCARD ALL how would that work with
> > pgbouncer, or any pool for that matter since it doesn't know which client
> > asked for which (if any) OID's to be binary.
>
> Well, it'd need to know that, just like it already needs to know
> which clients asked for which database or which login role.
>

OK, IIUC what you are proposing here is that there would be a separate pool
for
database, user, and OIDs. This doesn't seem too flexible. For instance if I
create a UDT and then want it to be returned
as binary then I have to reconfigure the pool to be able to accept a new
list of OID's.

Am I mis-understanding how this would potentially work?

Dave

>
>

Re: Support logical replication of global object commands

2023-03-28 Thread Zheng Li

> I think that there are some (possibly) tricky challenges that haven't
> been discussed yet to support replicating global objects.
>
> First, as for publications having global objects (roles, databases,
> and tablespaces), but storing them in database specific tables like
> pg_publication doesn't make sense, because it should be at some shared
> place where all databases can have access to it. Maybe we need to have
> a shared catalog like pg_shpublication or pg_publication_role to store
> publications related to global objects or the relationship between
> such publications and global objects. Second, we might need to change
> the logical decoding infrastructure so that it's aware of shared
> catalog changes.

Thanks for the feedback. This is insightful.

> Currently we need to scan only db-specific catalogs.
> Finally, since we process CREATE DATABASE in a different way than
> other DDLs (by cloning another database such as template1), simply
> replicating the CREATE DATABASE statement would not produce the same
> results as the publisher. Also, since event triggers are not fired on
> DDLs for global objects, always WAL-logging such DDL statements like
> the proposed patch does is not a good idea.

> Given that there seems to be some tricky problems and there is a
> discussion for cutting the scope to make the initial patch small[1], I
> think it's better to do this work after the first version.

Agreed.

Regards,
Zane

Re: Kerberos delegation support in libpq and postgres_fdw

2023-03-28 Thread Stephen Frost

Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Greg Stark (st...@mit.edu) wrote:
> > The CFBot says there's a function be_gssapi_get_proxy() which is
> > undefined. Presumably this is a missing #ifdef or a definition that
> > should be outside an #ifdef.
> 
> Yup, just a couple of missing #ifdef's.
> 
> Updated and rebased patch attached.

... and a few more.  Apparently hacking on a plane without enough sleep
leads to changing ... and unchanging configure flags before testing.

Thanks,

Stephen
From 9808fd4eb4920e40468709af7325a27b18066cec Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 28 Feb 2022 20:17:55 -0500
Subject: [PATCH] Add support for Kerberos credential delegation

Support GSSAPI/Kerberos credentials being delegated to the server by a
client.  With this, a user authenticating to PostgreSQL using Kerberos
(GSSAPI) credentials can choose to delegate their credentials to the
PostgreSQL server (which can choose to accept them, or not), allowing
the server to then use those delegated credentials to connect to
another service, such as with postgres_fdw or dblink or theoretically
any other service which is able to be authenticated using Kerberos.

Both postgres_fdw and dblink are changed to allow non-superuser
password-less connections but only when GSSAPI credentials have been
proxied to the server by the client and GSSAPI is used to authenticate
to the remote system.

Authors: Stephen Frost, Peifeng Qiu
Discussion: https://postgr.es/m/co1pr05mb8023cc2cb575e0faad7df4f8a8...@co1pr05mb8023.namprd05.prod.outlook.com
---
 contrib/dblink/dblink.c   | 127 ---
 contrib/dblink/expected/dblink.out|   4 +-
 contrib/postgres_fdw/connection.c |  72 +++-
 .../postgres_fdw/expected/postgres_fdw.out|  19 +-
 contrib/postgres_fdw/option.c |   3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |   3 +-
 doc/src/sgml/config.sgml  |  17 +
 doc/src/sgml/dblink.sgml  |   5 +-
 doc/src/sgml/libpq.sgml   |  41 +++
 doc/src/sgml/monitoring.sgml  |   9 +
 doc/src/sgml/postgres-fdw.sgml|   5 +-
 src/backend/catalog/system_views.sql  |   3 +-
 src/backend/foreign/foreign.c |   1 +
 src/backend/libpq/auth.c  |  13 +-
 src/backend/libpq/be-gssapi-common.c  |  51 +++
 src/backend/libpq/be-secure-gssapi.c  |  26 +-
 src/backend/utils/activity/backend_status.c   |   1 +
 src/backend/utils/adt/pgstatfuncs.c   |  20 +-
 src/backend/utils/init/postinit.c |   8 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/libpq/auth.h  |   1 +
 src/include/libpq/be-gssapi-common.h  |   3 +
 src/include/libpq/libpq-be.h  |   2 +
 src/include/utils/backend_status.h|   1 +
 src/interfaces/libpq/exports.txt  |   1 +
 src/interfaces/libpq/fe-auth.c|  15 +-
 src/interfaces/libpq/fe-connect.c |  17 +
 src/interfaces/libpq/fe-secure-gssapi.c   |  23 +-
 src/interfaces/libpq/libpq-fe.h   |   1 +
 src/interfaces/libpq/libpq-int.h  |   2 +
 src/test/kerberos/Makefile|   3 +
 src/test/kerberos/t/001_auth.pl   | 335 --
 src/test/perl/PostgreSQL/Test/Utils.pm|  27 ++
 src/test/regress/expected/rules.out   |  11 +-
 36 files changed, 752 insertions(+), 135 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 78a8bcee6e..d4dd338055 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -48,6 +48,7 @@
 #include "funcapi.h"
 #include "lib/stringinfo.h"
 #include "libpq-fe.h"
+#include "libpq/libpq-be.h"
 #include "libpq/libpq-be-fe-helpers.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -111,7 +112,8 @@ static HeapTuple get_tuple_of_interest(Relation rel, int *pkattnums, int pknumat
 static Relation get_rel_from_relname(text *relname_text, LOCKMODE lockmode, AclMode aclmode);
 static char *generate_relation_name(Relation rel);
 static void dblink_connstr_check(const char *connstr);
-static void dblink_security_check(PGconn *conn, remoteConn *rconn);
+static bool dblink_connstr_has_pw(const char *connstr);
+static void dblink_security_check(PGconn *conn, remoteConn *rconn, const char *connstr);
 static void dblink_res_error(PGconn *conn, const char *conname, PGresult *res,
 			 bool fail, const char *fmt,...) pg_attribute_printf(5, 6);
 static char *get_connect_string(const char *servername);
@@ -213,7 +215,7 @@ dblink_get_conn(char *conname_or_str,
 	 errmsg("could not establish connection"),
 	 errdetail_internal("%s", msg)));
 		}
-		dblink_security_check(conn, rconn);
+		dblink_security_check(conn, rconn, connstr);
 		if (PQcl

Re: Missing update of all_hasnulls in BRIN opclasses

Hi,

It took me a while but I finally got back to reworking this to use the
bt_info bit, as proposed by Alvaro. And it turned out to work great,
because (a) it's a tuple-level flag, i.e. the right place, and (b) it
does not overload existing flags.

This greatly simplified the code in add_values_to_range and (especially)
union_tuples, making it much easier to understand, I think.

One disadvantage is we are unable to see which ranges are empty in
current pageinspect, but 0002 addresses that by adding "empty" column to
the brin_page_items() output. That's a matter for master only, though.
It's a trivial patch and it makes it easier/possible to test this, so we
should consider to squeeze it into PG16.

I did quite a bit of testing - the attached 0003 adds extra tests, but I
don't propose to get this committed as is - it's rather overkill. Maybe
some reduced version of it ...

The hardest thing to test is the union_tuples() part, as it requires
concurrent operations with "correct" timing. Easy to simulate by
breakpoints in GDB, not so much in plain regression/TAP tests.

There's also a stress tests, doing a lot of randomized summarizations,
etc. Without the fix this failed in maybe 30% of runs, now I did ~100
runs without a single failure.

I haven't done any backporting, but I think it should be simpler than
with the earlier approach. I wonder if we need to care about starting to
use the previously unused bit - I don't think so, in the worst case
we'll just ignore it, but maybe I'm missing something (e.g. when using
physical replication).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 10efaf9964806e5a30818994e3cfda879bb90171 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 8 Jan 2023 16:43:06 +0100
Subject: [PATCH 1/3] Fix handling of NULLs in BRIN indexes

BRIN indexes did not properly distinguish between summaries for empty
(no rows) and all-NULL ranges. All summaries were initialized with
allnulls=true, and the opclasses simply reset allnulls to false when
processing the first non-NULL value. This however fails if the range
starts with a NULL value (or a sequence of NULL values), in which case
we forget the range contains NULL values.

This happens because the allnulls flag is used for two separate
purposes - to mark empty ranges (not representing any rows yet) and
ranges containing only NULL values.

Opclasses don't know which of these cases it is, and so don't know
whether to set hasnulls=true. Setting hasnulls=true in both cases would
make it correct, but it would also make BRIN indexes useless for queries
with IS NULL clauses - all ranges start empty (and thus allnulls=true),
so all ranges would end up with either allnulls=true or hasnulls=true.

The severity of the issue is somewhat reduced by the fact that it only
happens when adding values to an existing summary with allnulls=true,
not when the summarization is processing values in bulk (e.g. during
CREATE INDEX or automatic summarization). In this case the flags were
updated in a slightly different way, not forgetting the NULL values.

This introduces a new a new flag marking index tuples representing
ranges with no rows. Luckily we have an unused tuple in the BRIN tuple
header that we can use for this.

We still store index tuples for empty ranges, because otherwise we'd not
be able to say whether a range is empty or not yet summarized, and we'd
have to process them for any query.

Backpatch to 11. The issue exists since BRIN indexes were introduced in
9.5, but older releases are already EOL.

Backpatch-through: 11
Reviewed-by: Alvaro Herrera, Justin Pryzby, Matthias van de Meent
Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80aff...@enterprisedb.com
---
 src/backend/access/brin/brin.c| 115 +-
 src/backend/access/brin/brin_tuple.c  |  15 ++-
 src/include/access/brin_tuple.h   |   6 +-
 ...summarization-and-inprogress-insertion.out |   8 +-
 ...ummarization-and-inprogress-insertion.spec |   1 +
 5 files changed, 137 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 53e4721a54e..162a0c052aa 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -592,6 +592,17 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 
 	bval = &dtup->bt_columns[attno - 1];
 
+	/*
+	 * If the BRIN tuple indicates that this range is empty,
+	 * we can skip it: there's nothing to match.  We don't
+	 * need to examine the next columns.
+	 */
+	if (dtup->bt_empty_range)
+	{
+		addrange = false;
+		break;
+	}
+
 	/*
 	 * First check if there are any IS [NOT] NULL scan keys,
 	 * and if we're violating them. In that case we can
@@ -1590,6 +1601,8 @@ form_and_insert_tuple(BrinBuildState *state)
 /*
  * Given two deformed tuples, adjust the first one so that it's consistent
  * with the summary values

Re: "variable not found in subplan target list"

I wrote:
> The planner is reducing the scan of target_parted to
> a dummy scan, as is reasonable, but it forgets to
> provide ctid as an output from that scan; then the
> parent join node is unhappy because it does have
> a ctid output.  So it looks like the problem is some
> shortcut we take while creating the dummy scan.

Oh, actually the problem is in distribute_row_identity_vars,
which is supposed to handle this case, but it thinks it doesn't
have to back-fill the rel's reltarget.  Wrong.  Now that I see
the problem, I wonder if we can't reproduce a similar symptom
without MERGE, which would mean that v14 has the issue too.

The attached seems to fix it, but I'm going to look for a
non-MERGE test case before pushing.

regards, tom lane

diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 9d377385f1..c1b1557570 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -21,6 +21,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
 #include "parser/parsetree.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -994,9 +995,10 @@ distribute_row_identity_vars(PlannerInfo *root)
 	 * certainly process no rows.  Handle this edge case by re-opening the top
 	 * result relation and adding the row identity columns it would have used,
 	 * as preprocess_targetlist() would have done if it weren't marked "inh".
-	 * (This is a bit ugly, but it seems better to confine the ugliness and
-	 * extra cycles to this unusual corner case.)  We needn't worry about
-	 * fixing the rel's reltarget, as that won't affect the finished plan.
+	 * Then re-run build_base_rel_tlists() to ensure that the added columns
+	 * get propagated to the relation's reltarget.  (This is a bit ugly, but
+	 * it seems better to confine the ugliness and extra cycles to this
+	 * unusual corner case.)
 	 */
 	if (root->row_identity_vars == NIL)
 	{
@@ -1006,6 +1008,8 @@ distribute_row_identity_vars(PlannerInfo *root)
 		add_row_identity_columns(root, result_relation,
  target_rte, target_relation);
 		table_close(target_relation, NoLock);
+		build_base_rel_tlists(root, root->processed_tlist);
+		/* There are no ROWID_VAR Vars in this case, so we're done. */
 		return;
 	}

Re: Support logical replication of DDLs

2023-03-28 Thread Jonathan S. Katz

On 3/27/23 2:37 AM, Amit Kapila wrote:

On Mon, Mar 27, 2023 at 2:52 AM Tom Lane wrote:

And TBH, I don't think that I quite believe the premise in the
first place. The whole point of using logical rather than physical
replication is that the subscriber installation(s) aren't exactly like
the publisher. Given that, how can we expect that automated DDL
replication is going to do the right thing often enough to be a useful
tool rather than a disastrous foot-gun?

One of the major use cases as mentioned in the initial email was for
online version upgrades. And also, people would be happy to
automatically sync the schema for cases where the logical replication
is set up to get a subset of the data via features like row filters.
Having said that, I agree with you that it is very important to define
the scope of this feature if we want to see it becoming reality.

To echo Amit, this is actually one area where PostgreSQL replication
lags behind (no pun intended) other mature RDBMSes. As Amit says, the
principal use case is around major version upgrades, but also migration
between systems or moving data/schemas between systems that speak the
PostgreSQL protocol. All of these are becoming more increasingly common
as PostgreSQL is taking on more workloads that are sensitive to downtime
or are distributed in nature.

There are definitely footguns with logical replication of DDL -- I've
seen this from reading other manuals that support this feature and in my
own experiments. However, like many features, users have strategies thy
use to avoid footgun scenarios. For example, in systems that use logical
replication as part of their HA, users will either:

* Not replicate DDL, but use some sort of rolling orchestration process
to place it on each instance

* Replicate DDL, but coordinate it with some kind of global lock
* Replica only a subset of DDL, possibly with lock coordination

I'll comment on the patch scope further downthread. I agree it's very
big -- I had given some of that feedback privately a few month back --
and it could benefit from the "step back, holistic review." For example,
I was surprised that a fairly common pattern[1] did not work due to
changes we made when addressing a CVE (some follow up work was proposed
but we haven't done it yet).

I do agree this patch would benefit from stepping back, and I do think
we can work many of the issues. From listening to users and prospective
users, it's pretty clear we need to support DDL replication in some
capacity.

Thanks,

Jonathan

[1]
https://www.postgresql.org/message-id/263bea1c-a897-417d-3765-ba6e1e24711e%40postgresql.org

OpenPGP_signature
Description: OpenPGP digital signature

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-03-28 Thread Denis Laxalde

Hi Jelte,

I had a look into your patchset (v16), did a quick review and played a
bit with the feature.

Patch 2 is missing the documentation about PQcancelSocket() and contains
a few typos; please find attached a (fixup) patch to correct these.


--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -321,16 +328,28 @@ extern PostgresPollingStatusType PQresetPoll(PGconn 
*conn);
 /* Synchronous (blocking) */
 extern void PQreset(PGconn *conn);
 
+/* issue a cancel request */
+extern PGcancelConn * PQcancelSend(PGconn *conn);
[...]

Maybe I'm missing something, but this function above seems a bit
strange. Namely, I wonder why it returns a PGcancelConn and what's the
point of requiring the user to call PQcancelStatus() to see if something
got wrong. Maybe it could be defined as:

  int PQcancelSend(PGcancelConn *cancelConn);

where the return value would be status? And the user would only need to
call PQcancelErrorMessage() in case of error. This would leave only one
single way to create a PGcancelConn value (i.e. PQcancelConn()), which
seems less confusing to me.

Jelte Fennema wrote:
> Especially since I ran into another use case that I would want to use
> this patch for recently: Adding an async cancel function to Python
> it's psycopg3 library. This library exposes both a Connection class
> and an AsyncConnection class (using python its asyncio feature). But
> one downside of the AsyncConnection type is that it doesn't have a
> cancel method.

As part of my testing, I've implemented non-blocking cancellation in
Psycopg, based on v16 on this patchset. Overall this worked fine and
seems useful; if you want to try it:

  https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel

(The only thing I found slightly inconvenient is the need to convey the
connection encoding (from PGconn) when handling error message from the
PGcancelConn.)

Cheers,
Denis
>From a5f9cc680ffa520b05fe34b7cac5df2e60a6d4ad Mon Sep 17 00:00:00 2001
From: Denis Laxalde 
Date: Tue, 28 Mar 2023 16:06:42 +0200
Subject: [PATCH] fixup! Add non-blocking version of PQcancel

---
 doc/src/sgml/libpq.sgml  | 16 +++-
 src/interfaces/libpq/fe-connect.c|  2 +-
 src/test/modules/libpq_pipeline/libpq_pipeline.c |  2 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 29f08a4317..aa404c4d15 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -5795,7 +5795,7 @@ PGcancelConn *PQcancelSend(PGconn *conn);
options as the the original PGconn. So when the
original connection is encrypted (using TLS or GSS), the connection for
the cancel request is encrypted in the same way. Any connection options
-   that only are only used during authentication or after authentication of
+   that are only used during authentication or after authentication of
the client are ignored though, because cancellation requests do not
require authentication and the connection is closed right after the
cancellation request is submitted.
@@ -5912,6 +5912,20 @@ ConnStatusType PQcancelStatus(const PGcancelConn *conn);
  
 
 
+
+ PQcancelSocketPQcancelSocket
+
+ 
+  
+   A version of  that can be used for
+   cancellation connections.
+
+int PQcancelSocket(PGcancelConn *conn);
+
+  
+ 
+
+
 
  PQcancelPollPQcancelPoll
 
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 74e337fddf..16af7303d4 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -808,7 +808,7 @@ PQcancelConn(PGconn *conn)
 	/*
 	 * Cancel requests should not iterate over all possible hosts. The request
 	 * needs to be sent to the exact host and address that the original
-	 * connection used. So we we manually create the host and address arrays
+	 * connection used. So we manually create the host and address arrays
 	 * with a single element after freeing the host array that we generated
 	 * from the connection options.
 	 */
diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/test/modules/libpq_pipeline/libpq_pipeline.c
index e8e904892c..6764ab513b 100644
--- a/src/test/modules/libpq_pipeline/libpq_pipeline.c
+++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c
@@ -89,7 +89,7 @@ pg_fatal_impl(int line, const char *fmt,...)
 /*
  * Check that the query on the given connection got cancelled.
  *
- * This is a function wrapped in a macrco to make the reported line number
+ * This is a function wrapped in a macro to make the reported line number
  * in an error match the line number of the invocation.
  */
 #define confirm_query_cancelled(conn) confirm_query_cancelled_impl(__LINE__, conn)
-- 
2.30.2

Re: Memory leak from ExecutorState context?

2023-03-28 Thread Jehan-Guillaume de Rorthais

Hi,

Sorry for the late answer, I was reviewing the first patch and it took me some
time to study and dig around.

On Thu, 23 Mar 2023 08:07:04 -0400
Melanie Plageman  wrote:

> On Fri, Mar 10, 2023 at 1:51 PM Jehan-Guillaume de Rorthais
>  wrote:
> > > So I guess the best thing would be to go through these threads, see what
> > > the status is, restart the discussion and propose what to do. If you do
> > > that, I'm happy to rebase the patches, and maybe see if I could improve
> > > them in some way.  
> >
> > OK! It took me some time, but I did it. I'll try to sum up the situation as
> > simply as possible.  
> 
> Wow, so many memories!
> 
> I'm excited that someone looked at this old work (though it is sad that
> a customer faced this issue). And, Jehan, I really appreciate your great
> summarization of all these threads. This will be a useful reference.

Thank you!

> > 1. "move BufFile stuff into separate context"
> > [...]
> >I suppose this simple one has been forgotten in the fog of all other
> >discussions. Also, this probably worth to be backpatched.  
> 
> I agree with Jehan-Guillaume and Tomas that this seems fine to commit
> alone.

This is a WIP.

> > 2. "account for size of BatchFile structure in hashJoin"
> > [...] 
> 
> I think I would have to see a modern version of a patch which does this
> to assess if it makes sense. But, I probably still agree with 2019
> Melanie :)

I volunteer to work on this after the memory context patch, unless someone grab
it in the meantime.

> [...]
> On Mon, Mar 20, 2023 at 10:12 AM Jehan-Guillaume de Rorthais
>  wrote:
> > BNJL and/or other considerations are for 17 or even after. In the meantime,
> > Melanie, who authored BNLJ, +1 the balancing patch as it can coexists with
> > other discussed solutions. No one down vote since then. Melanie, what is
> > your opinion today on this patch? Did you change your mind as you worked
> > for many months on BNLJ since then?  
> 
> So, in order to avoid deadlock, my design of adaptive hash join/block
> nested loop hash join required a new parallelism concept not yet in
> Postgres at the time -- the idea of a lone worker remaining around to do
> work when others have left.
> 
> See: BarrierArriveAndDetachExceptLast()
> introduced in 7888b09994
> 
> Thomas Munro had suggested we needed to battle test this concept in a
> more straightforward feature first, so I implemented parallel full outer
> hash join and parallel right outer hash join with it.
> 
> https://commitfest.postgresql.org/42/2903/
> 
> This has been stalled ready-for-committer for two years. It happened to
> change timing such that it made an existing rarely hit parallel hash
> join bug more likely to be hit. Thomas recently committed our fix for
> this in 8d578b9b2e37a4d (last week). It is my great hope that parallel
> full outer hash join goes in before the 16 feature freeze.

This is really interesting to follow. I kinda feel/remember how this could
be useful for your BNLJ patch. It's good to see things are moving, step by
step.

Thanks for the pointers.

> If it does, I think it could make sense to try and find committable
> smaller pieces of the adaptive hash join work. As it is today, parallel
> hash join does not respect work_mem, and, in some sense, is a bit broken.
> 
> I would be happy to work on this feature again, or, if you were
> interested in picking it up, to provide review and any help I can if for
> you to work on it.

I don't think I would be able to pick up such a large and complex patch. But I'm
interested to help, test and review, as far as I can!

Regards,

Re: Initial Schema Sync for Logical Replication

2023-03-28 Thread Masahiko Sawada

On Tue, Mar 28, 2023 at 6:47 PM Amit Kapila  wrote:
>
> On Mon, Mar 27, 2023 at 8:17 AM Masahiko Sawada  wrote:
> >
> > On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin  wrote:
> > >
> > > > From: Amit Kapila 
> > > > > I think we won't be able to use same snapshot because the transaction 
> > > > > will
> > > > > be committed.
> > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure
> > > > > about this part maybe walrcv_disconnect() calls the commits 
> > > > > internally ?).
> > > > > So somehow we need to keep this snapshot alive, even after transaction
> > > > > is committed(or delay committing the transaction , but we can have
> > > > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart
> > > > > before tableSync is able to use the same snapshot.)
> > > > >
> > > >
> > > > Can we think of getting the table data as well along with schema via
> > > > pg_dump? Won't then both schema and initial data will correspond to the
> > > > same snapshot?
> > >
> > > Right , that will work, Thanks!
> >
> > While it works, we cannot get the initial data in parallel, no?
> >
>
> Another possibility is that we dump/restore the schema of each table
> along with its data. One thing we can explore is whether the parallel
> option of dump can be useful here. Do you have any other ideas?

A downside of the idea of dumping both table schema and table data
would be that we need to temporarily store data twice the size of the
table (the dump file and the table itself) during the load. One might
think that we can redirect the pg_dump output into the backend so that
it can load it via SPI, but it doesn't work since "COPY tbl FROM
stdin;" doesn't work via SPI. The --inserts option of pg_dump could
help it out but it makes restoration very slow.

>
> One related idea is that currently, we fetch the table list
> corresponding to publications in subscription and create the entries
> for those in pg_subscription_rel during Create Subscription, can we
> think of postponing that work till after the initial schema sync? We
> seem to be already storing publications list in pg_subscription, so it
> appears possible if we somehow remember the value of copy_data. If
> this is feasible then I think that may give us the flexibility to
> perform the initial sync at a later point by the background worker.

It sounds possible. With this idea, we will be able to have the apply
worker restore the table schemas (and create pg_subscription_rel
entries) as the first thing. Another point we might need to consider
is that the initial schema sync (i.e. creating tables) and creating
pg_subscription_rel entries need to be done in the same transaction.
Otherwise, we could end up committing either one change. I think it
depends on how we restore the schema data.

>
> > >
> > > > > I think we can have same issues as you mentioned New table t1 is added
> > > > > to the publication , User does a refresh publication.
> > > > > pg_dump / pg_restore restores the table definition. But before
> > > > > tableSync can start,  steps from 2 to 5 happen on the publisher.
> > > > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100
> > > > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120
> > > > > > 5. Insert t1 (3, 3, 3); --LSN 130
> > > > > And table sync errors out
> > > > > There can be one more issue , since we took the pg_dump without
> > > > snapshot (wrt to replication slot).
> > > > >
> > > >
> > > > To avoid both the problems mentioned for Refresh Publication, we can do
> > > > one of the following: (a) create a new slot along with a snapshot for 
> > > > this
> > > > operation and drop it afterward; or (b) using the existing slot, 
> > > > establish a
> > > > new snapshot using a technique proposed in email [1].
> > > >
> > >
> > > Thanks, I think option (b) will be perfect, since we don’t have to create 
> > > a new slot.
> >
> > Regarding (b), does it mean that apply worker stops streaming,
> > requests to create a snapshot, and then resumes the streaming?
> >
>
> Shouldn't this be done by the backend performing a REFRESH publication?

Hmm, I might be missing something but the idea (b) uses the existing
slot to establish a new snapshot, right? What existing replication
slot do we use for that? I thought it was the one used by the apply
worker.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Memory leak from ExecutorState context?




On 3/28/23 15:17, Jehan-Guillaume de Rorthais wrote:
> On Tue, 28 Mar 2023 00:43:34 +0200
> Tomas Vondra  wrote:
> 
>> On 3/27/23 23:13, Jehan-Guillaume de Rorthais wrote:
>>> Please, find in attachment a patch to allocate bufFiles in a dedicated
>>> context. I picked up your patch, backpatch'd it, went through it and did
>>> some minor changes to it. I have some comment/questions thought.
>>>
>>>   1. I'm not sure why we must allocate the "HashBatchFiles" new context
>>>   under ExecutorState and not under hashtable->hashCxt?
>>>
>>>   The only references I could find was in hashjoin.h:30:
>>>
>>>/* [...]
>>> * [...] (Exception: data associated with the temp files lives in the
>>> * per-query context too, since we always call buffile.c in that
>>> context.)
>>>
>>>   And in nodeHashjoin.c:1243:ExecHashJoinSaveTuple() (I reworded this
>>>   original comment in the patch):
>>>
>>>/* [...]
>>> * Note: it is important always to call this in the regular executor
>>> * context, not in a shorter-lived context; else the temp file buffers
>>> * will get messed up.
>>>
>>>
>>>   But these are not explanation of why BufFile related allocations must be
>>> under a per-query context. 
>>>   
>>
>> Doesn't that simply describe the current (unpatched) behavior where
>> BufFile is allocated in the per-query context? 
> 
> I wasn't sure. The first quote from hashjoin.h seems to describe a stronger
> rule about «**always** call buffile.c in per-query context». But maybe it 
> ought
> to be «always call buffile.c from one of the sub-query context»? I assume the
> aim is to enforce the tmp files removal on query end/error?
> 

I don't think we need this info for tempfile cleanup - CleanupTempFiles
relies on the VfdCache, which does malloc/realloc (so it's not using
memory contexts at all).

I'm not very familiar with this part of the code, so I might be missing
something. But you can try that - just stick an elog(ERROR) somewhere
into the hashjoin code, and see if that breaks the cleanup.

Not an explicit proof, but if there was some hard-wired requirement in
which memory context to allocate BufFile stuff, I'd expect it to be
documented in buffile.c. But that actually says this:

 * Note that BufFile structs are allocated with palloc(), and therefore
 * will go away automatically at query/transaction end.  Since the
underlying
 * virtual Files are made with OpenTemporaryFile, all resources for
 * the file are certain to be cleaned up even if processing is aborted
 * by ereport(ERROR).  The data structures required are made in the
 * palloc context that was current when the BufFile was created, and
 * any external resources such as temp files are owned by the ResourceOwner
 * that was current at that time.

which I take as confirmation that it's legal to allocate BufFile in any
memory context, and that cleanup is handled by the cache in fd.c.


>> I mean, the current code calls BufFileCreateTemp() without switching the
>> context, so it's in the ExecutorState. But with the patch it very clearly is
>> not.
>>
>> And I'm pretty sure the patch should do
>>
>> hashtable->fileCxt = AllocSetContextCreate(hashtable->hashCxt,
>>  "HashBatchFiles",
>>  ALLOCSET_DEFAULT_SIZES);
>>
>> and it'd still work. Or why do you think we *must* allocate it under
>> ExecutorState?
> 
> That was actually my very first patch and it indeed worked. But I was confused
> about the previous quoted code comments. That's why I kept your original code
> and decided to rise the discussion here.
> 

IIRC I was just lazy when writing the experimental patch, there was not
much thought about stuff like this.

> Fixed in new patch in attachment.
> 
>> FWIW The comment in hashjoin.h needs updating to reflect the change.
> 
> Done in the last patch. Is my rewording accurate?
> 
>>>   2. Wrapping each call of ExecHashJoinSaveTuple() with a memory context
>>> switch seems fragile as it could be forgotten in futur code path/changes.
>>> So I added an Assert() in the function to make sure the current memory
>>> context is "HashBatchFiles" as expected.
>>>   Another way to tie this up might be to pass the memory context as
>>> argument to the function.
>>>   ... Or maybe I'm over precautionary.
>>>   
>>
>> I'm not sure I'd call that fragile, we have plenty other code that
>> expects the memory context to be set correctly. Not sure about the
>> assert, but we don't have similar asserts anywhere else.
> 
> I mostly sticked it there to stimulate the discussion around this as I needed
> to scratch that itch.
> 
>> But I think it's just ugly and overly verbose
> 
> +1
> 
> Your patch was just a demo/debug patch by the time. It needed some cleanup now
> :)
> 
>> it'd be much nicer to e.g. pass the memory context as a parameter, and do
>> the switch inside.
> 
> That was a proposition in my previous mail, so I did it in the new patch. 
> Let's
> see what other re

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

On Tue, 28 Mar 2023 at 16:54, Denis Laxalde  wrote:
> I had a look into your patchset (v16), did a quick review and played a
> bit with the feature.
>
> Patch 2 is missing the documentation about PQcancelSocket() and contains
> a few typos; please find attached a (fixup) patch to correct these.

Thanks applied that patch and attached a new patchset

> Namely, I wonder why it returns a PGcancelConn and what's the
> point of requiring the user to call PQcancelStatus() to see if something
> got wrong. Maybe it could be defined as:
>
>   int PQcancelSend(PGcancelConn *cancelConn);
>
> where the return value would be status? And the user would only need to
> call PQcancelErrorMessage() in case of error. This would leave only one
> single way to create a PGcancelConn value (i.e. PQcancelConn()), which
> seems less confusing to me.

To clarify what you mean, the API would then be like this:
PGcancelConn cancelConn = PQcancelConn(conn);
if (PQcancelSend(cancelConn) == CONNECTION_BAD) {
   printf("ERROR %s\n", PQcancelErrorMessage(cancelConn))
   exit(1)
}

Instead of:
PGcancelConn cancelConn = PQcancelSend(conn);
if (PQcancelStatus(cancelConn) == CONNECTION_BAD) {
   printf("ERROR %s\n", PQcancelErrorMessage(cancelConn))
   exit(1)
}

Those are so similar, that I have no preference either way. If more
people prefer one over the other I'm happy to change it, but for now
I'll keep it as is.

> As part of my testing, I've implemented non-blocking cancellation in
> Psycopg, based on v16 on this patchset. Overall this worked fine and
> seems useful; if you want to try it:
>
>   https://github.com/dlax/psycopg3/tree/pg16/non-blocking-pqcancel

That's great to hear! I'll try to take a closer look at that change tomorrow.

> (The only thing I found slightly inconvenient is the need to convey the
> connection encoding (from PGconn) when handling error message from the
> PGcancelConn.)

Could you expand a bit more on this? And if you have any idea on how
to improve the API with regards to this?


v17-0002-Copy-and-store-addrinfo-in-libpq-owned-private-m.patch
Description: Binary data


v17-0005-Start-using-new-libpq-cancel-APIs.patch
Description: Binary data


v17-0003-Return-2-from-pqReadData-on-EOF.patch
Description: Binary data


v17-0004-Add-non-blocking-version-of-PQcancel.patch
Description: Binary data


v17-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data

Re: Add LZ4 compression in pg_dump

2023-03-28 Thread gkokolatos






--- Original Message ---
On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me  
wrote:

> 
> --- Original Message ---
> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
> tomas.von...@enterprisedb.com wrote:
> 
> > This leaves the empty-data issue (which we have a fix for) and the
> > switch to LZ4F. And then the zstd part.
> 
> Please expect promptly a patch for the switch to frames.

Please find the expected patch attached. Note that the bulk of the
patch is code unification, variable renaming to something more
appropriate, and comment addition. These are changes that are not
strictly necessary to switch to LZ4F. I do believe that are
essential for code hygiene after the switch and they do belong
on the same commit. 

Cheers,
//Georgios

> 
> Cheers,
> //GeorgiosFrom c289fb8d49b680ad180a44b20fff1dc9553b7494 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Tue, 28 Mar 2023 15:48:06 +
Subject: [PATCH v1] Use LZ4 frames in pg_dump's compressor API.

This change allows for greater compaction of data, especially so in very narrow
relations, by avoiding at least a compaction header and footer per row. Since
LZ4 frames are now used by both compression APIs, some code deduplication
opportunities have become obvious and are also implemented.

Reported by: Justin Pryzby
---
 src/bin/pg_dump/compress_lz4.c | 358 ++---
 1 file changed, 244 insertions(+), 114 deletions(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index fc2f4e116d..078dc35cd6 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -17,7 +17,6 @@
 #include "compress_lz4.h"
 
 #ifdef USE_LZ4
-#include 
 #include 
 
 /*
@@ -29,102 +28,279 @@
 #endif
 
 /*--
- * Compressor API
- *--
+ * Common to both APIs
  */
 
-typedef struct LZ4CompressorState
+/*
+ * State used for LZ4 (de)compression by both APIs.
+ */
+typedef struct LZ4State
 {
-	char	   *outbuf;
-	size_t		outsize;
-} LZ4CompressorState;
+	/*
+	 * Used by the File API to keep track of the file stream.
+	 */
+	FILE	   *fp;
+
+	LZ4F_preferences_t prefs;
+
+	LZ4F_compressionContext_t	ctx;
+	LZ4F_decompressionContext_t	dtx;
+
+	/*
+	 * Used by the File API's lazy initialization.
+	 */
+	bool		inited;
+
+	/*
+	 * Used by the File API to distinguish between compression
+	 * and decompression operations.
+	 */
+	bool		compressing;
+
+	/*
+	 * Used by the Compressor API to mark if the compression
+	 * headers have been written after initialization.
+	 */
+	bool		needs_header_flush;
+
+	size_t		buflen;
+	char	   *buffer;
+
+	/*
+	 * Used by the File API to store already uncompressed
+	 * data that the caller has not consumed.
+	 */
+	size_t		overflowalloclen;
+	size_t		overflowlen;
+	char	   *overflowbuf;
+
+	/*
+	 * Used by both APIs to keep track of the compressed
+	 * data length stored in the buffer.
+	 */
+	size_t		compressedlen;
+
+	/*
+	 * Used by both APIs to keep track of error codes.
+	 */
+	size_t		errcode;
+} LZ4State;
+
+/*
+ * Initialize the required LZ4State members for compression. Write the LZ4 frame
+ * header in a buffer keeping track of its length. Users of this function can
+ * choose when and how to write the header to a file stream.
+ *
+ * Returns true on success. In case of a failure returns false, and stores the
+ * error code in state->errcode.
+ */
+static bool
+LZ4_compression_state_init(LZ4State *state)
+{
+	size_t		status;
+
+	state->buflen = LZ4F_compressBound(DEFAULT_IO_BUFFER_SIZE, &state->prefs);
+
+	/*
+	 * LZ4F_compressBegin requires a buffer that is greater or equal to
+	 * LZ4F_HEADER_SIZE_MAX. Verify that the requirement is met.
+	 */
+	if (state->buflen < LZ4F_HEADER_SIZE_MAX)
+		state->buflen = LZ4F_HEADER_SIZE_MAX;
+
+	status = LZ4F_createCompressionContext(&state->ctx, LZ4F_VERSION);
+	if (LZ4F_isError(status))
+	{
+		state->errcode = status;
+		return false;
+	}
+
+	state->buffer = pg_malloc(state->buflen);
+	status = LZ4F_compressBegin(state->ctx,
+state->buffer, state->buflen,
+			   &state->prefs);
+	if (LZ4F_isError(status))
+	{
+		state->errcode = status;
+		return false;
+	}
+
+	state->compressedlen = status;
+
+	return true;
+}
+
+/*--
+ * Compressor API
+ *--
+ */
 
 /* Private routines that support LZ4 compressed data I/O */
-static void ReadDataFromArchiveLZ4(ArchiveHandle *AH, CompressorState *cs);
-static void WriteDataToArchiveLZ4(ArchiveHandle *AH, CompressorState *cs,
-  const void *data, size_t dLen);
-static void EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs);
 
 static void
 ReadDataFromArchiveLZ4(ArchiveHandle *AH, CompressorState *cs)
 {
-	LZ4_streamDecode_t lz4StreamDecode;
-	char	   *buf;
-	char	   *decbuf;
-	size_t		buflen;
-	size_t		cnt;
-
-	buflen = DEFAULT_IO_BUFFER_SIZE;
-	buf = pg_malloc(buflen);
-	decbuf = pg_malloc(buflen);
+	size_t		r;
+	size_t		readbuflen;
+	char	   *outbuf;
+	char	   *re

Re: Commitfest 2023-03 starting tomorrow!

Status summary:
  Needs review: 116.
  Waiting on Author: 30.
  Ready for Committer: 32.
  Committed: 94.
  Moved to next CF: 17.
  Returned with Feedback: 6.
  Rejected: 6.
  Withdrawn: 18.
Total: 319.



Ok, here are the patches that have been stuck in "Waiting
on Author" for a while. I divided them into three groups.

* The first group have been stuck for over a week and mostly look like
  they should be RwF. Some I guess just moved to next CF. But some of
  them I'm uncertain if I should leave or if they really should be RfC
  or NR.

* The other two groups have had some updates in the last week
  (actually I used 10 days). But some of them still look like they're
  pretty much dead for this CF and should either be moved forward or
  Rwf or Rejected.

So here's the triage list. I'm going to send emails and start clearing
out the patches pretty much right away. Some of these are pretty
clearcut.


Nothing in over a week:
--

* Better infrastructure for automated testing of concurrency issues

- Consensus that this is desirable. But it's not clear what it's
  actually waiting on Author for. RwF?

* Provide the facility to set binary format output for specific OID's
per session

- I think Dave was looking for feedback and got it from Tom and
  Peter. I don't actually see a specific patch here but there are two
  patches linked in the original message. There seems to be enough
  feedback to proceed but nobody's working on it. RwF?

* pg_visibility's pg_check_visible() yields false positive when
working in parallel with autovacuum

- Bug, but tentatively a false positive...

* CAST( ... ON DEFAULT)

- it'll have to wait till there's something solid from the committee"
-- Rejected?

* Fix tab completion MERGE

- Partly committed but
  v9-0002-psql-Add-PartialMatches-macro-for-better-tab-complet.patch
  remains. There was a review from Dean Rasheed. Move to next CF?

* Fix recovery conflict SIGUSR1 handling

- This looks like a suite of bug fixes and looks like it should be
  Needs Review or Ready for Commit

* Prefetch the next tuple's memory during seqscans

- David Rowley said it was dependeny on "heapgettup() refactoring"
  which has been refactored. So is it now Needs Review or Ready for
  Commit? Is it likely to happen this CF?

* Pluggable toaster

- This seems to have digressed from the original patch. There were
  patches early on and a lot of feedback. Is the result that the
  original patches are Rejected or are they still live?

* psql - refactor echo code

- "I think this patch requires an up-to-date summary and explanation"
  from Peter. But it seems like Tom was ok with it and just had some
  additional improvements he wanted that were added. It sounds like
  this might be "Ready for Commit" if someone familiar with the patch
  looked at it.

* Push aggregation down to base relations and joins

- Needs a significant rebase (since March 1).

* Remove self join on a unique column

- An offer of a Bounty! There was one failing test which was
  apparently fixed? But it looks like this should be in Needs Review
  or Ready for Commit.

* Split index and table statistics into different types of stats

- Was stuck on "Generate pg_stat_get_xact*() functions with Macros"
  which was committed. So "Ready for Commit" now?

* Default to ICU during initdb

- Partly committed, 0001 waiting until after CF

* suppressing useless wakeups in logical/worker.c

- Got feedback March 17. Doesn't look like it's going to be ready this CF.

* explain analyze rows=%.0f

- Patch updated January, but I think Tom still had some simple if
  tedious changes he asked for

* Fix order of checking ICU options in initdb and create database

- Feedback Last November but no further questions or progress

* Introduce array_shuffle() and array_sample() functions

- Feedback from Tom last September. No further questions or progress



Status Updates in last week:


* Some revises in adding sorting path

- Got feedback Feb 21 and author responded but doesn't look like it's
  going to be ready this CF

* Add TAP tests for psql \g piped into program

- Peter Eisentraut asked for a one-line change, otherwise it looks
  like it's Ready for Commit?

* Improvements to Meson docs

- Some feedback March 15 but no response. I assume this is still in
  play



Emails in last week:
---

* RADIUS tests and improvements

- last feedback March 20, last patch March 4. Should probably be moved
  to the next CF unless there's progress soon.

* Direct SSL Connections

- (This is mine) Code for SSL is pretty finished. The last patch for
  ALPN support needs a bit of polish. I'll be doing that momentarily.

* Fix alter subscription concurrency errors

- "patch as-submitted is pretty uninteresting" and "patch that I don't
  care much about" ... I guess this is Rejected or Withdrawn

* Fix improper qual pushdown after applying outer join identity 3

- Tom Lane's patch. Active discussion as of M

Re: running logical replication as the subscription owner

2023-03-28 Thread Robert Haas

On Mon, Mar 27, 2023 at 6:08 PM Jelte Fennema  wrote:
> Attached is an updated version of your patch with what I had in mind
> (admittedly it needed one more line than "just" the return to make it
> work). But as you can see all previous tests for a lowly privileged
> subscription owner that **cannot** SET ROLE to the table owner
> continue to work as they did before. While still downgrading to the
> table owners role when the subscription owner **can** SET ROLE to the
> table owner.
>
> Obviously this needs some comments explaining what's going on and
> probably some code refactoring and/or variable renaming, but I hope
> it's clear what I meant now: For high privileged subscription owners,
> we downgrade to the permissions of the table owner, but for low
> privileged ones we care about permissions of the subscription owner
> itself.

Hmm. This is an interesting idea. A variant on this theme could be:
what if we made this an explicit configuration option?

I'm worried that if we just try to switch users and silently fall back
to not doing so when we don't have enough permissions, the resulting
behavior is going to be difficult to understand and troubleshoot. I'm
thinking that maybe if you let people pick the behavior they want that
becomes more comprehensible. It's also a nice insurance policy: say
for the sake of argument we make switch-to-table-owner the new
default. If that new behavior causes something to happen to somebody
that they don't like, they can always turn it off, even if they are a
highly privileged user who doesn't "need" to turn it off.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: zstd compression for pg_dump




On 3/27/23 19:28, Justin Pryzby wrote:
> On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote:
>> On 3/16/23 05:50, Justin Pryzby wrote:
>>> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote:
 On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion  
 wrote:
> I did some smoke testing against zstd's GitHub release on Windows. To
> build against it, I had to construct an import library, and put that
> and the DLL into the `lib` folder expected by the MSVC scripts...
> which makes me wonder if I've chosen a harder way than necessary?

 It looks like pg_dump's meson.build is missing dependencies on zstd
 (meson couldn't find the headers in the subproject without them).
>>>
>>> I saw that this was added for LZ4, but I hadn't added it for zstd since
>>> I didn't run into an issue without it.  Could you check that what I've
>>> added works for your case ?
>>>
> Parallel zstd dumps seem to work as expected, in that the resulting
> pg_restore output is identical to uncompressed dumps and nothing
> explodes. I haven't inspected the threading implementation for safety
> yet, as you mentioned.

 Hm. Best I can tell, the CloneArchive() machinery is supposed to be
 handling safety for this, by isolating each thread's state. I don't feel
 comfortable pronouncing this new addition safe or not, because I'm not
 sure I understand what the comments in the format-specific _Clone()
 callbacks are saying yet.
>>>
>>> My line of reasoning for unix is that pg_dump forks before any calls to
>>> zstd.  Nothing zstd does ought to affect the pg_dump layer.  But that
>>> doesn't apply to pg_dump under windows.  This is an opened question.  If
>>> there's no solid answer, I could disable/ignore the option (maybe only
>>> under windows).
>>
>> I may be missing something, but why would the patch affect this? Why
>> would it even affect safety of the parallel dump? And I don't see any
>> changes to the clone stuff ...
> 
> zstd supports using threads during compression, with -Z zstd:workers=N.
> When unix forks, the child processes can't do anything to mess up the
> state of the parent processes.  
> 
> But windows pg_dump uses threads instead of forking, so it seems
> possible that the pg_dump -j threads that then spawn zstd threads could
> "leak threads" and break the main thread.  I suspect there's no issue,
> but we still ought to verify that before declaring it safe.
> 

OK. I don't have access to a Windows machine so I can't test that. Is it
possible to disable the zstd threading, until we figure this out?

>>> The function is first checking if it was passed a filename which already
>>> has a suffix.  And if not, it searches through a list of suffixes,
>>> testing for an existing file with each suffix.  The search with stat()
>>> doesn't happen if it has a suffix.  I'm having trouble seeing how the
>>> hasSuffix() branch isn't dead code.  Another opened question.
>>
>> AFAICS it's done this way because of this comment in pg_backup_directory
>>
>>  * ...
>>  * ".gz" suffix is added to the filenames. The TOC files are never
>>  * compressed by pg_dump, however they are accepted with the .gz suffix
>>  * too, in case the user has manually compressed them with 'gzip'.
>>
>> I haven't tried, but I believe that if you manually compress the
>> directory, it may hit this branch.
> 
> That would make sense, but when I tried, it didn't work like that.
> The filenames were all uncompressed names.  Maybe it worked differently
> in an older release.  Or maybe it changed during development of the
> parallel-directory-dump patch and it's actually dead code.
> 

Interesting. Would be good to find out. I wonder if a little bit of
git-log digging could tell us more. Anyway, until we confirm it's dead
code, we should probably do what .gz does and have the same check for
.lz4 and .zst files.

> This is rebased over the updated compression API.
> 
> It seems like I misunderstood something you said before, so now I put
> back "supports_compression()".
> 

Thanks! I need to do a bit more testing and review, but it seems pretty
much RFC to me, assuming we can figure out what to do about threading.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical decoding and replication of sequences, take 2

2023-03-28 Thread Masahiko Sawada

On Mon, Mar 27, 2023 at 11:46 PM Tomas Vondra
 wrote:
>
>
>
> On 3/27/23 03:32, Masahiko Sawada wrote:
> > Hi,
> >
> > On Fri, Mar 24, 2023 at 7:26 AM Tomas Vondra
> >  wrote:
> >>
> >> I merged the earlier "fixup" patches into the relevant parts, and left
> >> two patches with new tweaks (deducing the corrent "WAL" state from the
> >> current state read by copy_sequence), and the interlock discussed here.
> >>
> >
> > Apart from that, how does the publication having sequences work with
> > subscribers who are not able to handle sequence changes, e.g. in a
> > case where PostgreSQL version of publication is newer than the
> > subscriber? As far as I tested the latest patches, the subscriber
> > (v15)  errors out with the error 'invalid logical replication message
> > type "Q"' when receiving a sequence change. I'm not sure it's sensible
> > behavior. I think we should instead either (1) deny starting the
> > replication if the subscriber isn't able to handle sequence changes
> > and the publication includes that, or (2) not send sequence changes to
> > such subscribers.
> >
>
> I agree the "invalid message" error is not great, but it's not clear to
> me how to do either (1). The trouble is we don't really know if the
> publication contains (or will contain) sequences. I mean, what would
> happen if the replication starts and then someone adds a sequence?
>
> For (2), I think that's not something we should do - silently discarding
> some messages seems error-prone. If the publication includes sequences,
> presumably the user wanted to replicate those. If they want to replicate
> to an older subscriber, create a publication without sequences.
>
> Perhaps the right solution would be to check if the subscriber supports
> replication of sequences in the output plugin, while attempting to write
> the "Q" message. And error-out if the subscriber does not support it.

It might be related to this topic; do we need to bump the protocol
version? The commit 64824323e57d introduced new streaming callbacks
and bumped the protocol version. I think the same seems to be true for
this change as it adds sequence_cb callback.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Add LZ4 compression in pg_dump




On 3/28/23 18:07, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me  
> wrote:
> 
>>
>> --- Original Message ---
>> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
>> tomas.von...@enterprisedb.com wrote:
>>
>>> This leaves the empty-data issue (which we have a fix for) and the
>>> switch to LZ4F. And then the zstd part.
>>
>> Please expect promptly a patch for the switch to frames.
> 
> Please find the expected patch attached. Note that the bulk of the
> patch is code unification, variable renaming to something more
> appropriate, and comment addition. These are changes that are not
> strictly necessary to switch to LZ4F. I do believe that are
> essential for code hygiene after the switch and they do belong
> on the same commit. 
> 

Thanks!

I agree the renames & cleanup are appropriate - it'd be silly to stick
to misleading naming etc. Would it make sense to split the patch into
two, to separate the renames and the switch to lz4f?

That'd make it the changes necessary for lz4f switch clearer.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Data is copied twice when specifying both child and parent table in publication

2023-03-28 Thread Jacob Champion

On Tue, Mar 28, 2023 at 2:59 AM wangw.f...@fujitsu.com
 wrote:
> The scenario of this bug is to subscribe to two publications at the same time,
> and these two publications publish parent table and child table respectively.
> And option via_root is specified in both publications or only in the 
> publication
> of the parent table.

Ah, reading the initial mail again, that makes sense. I came to this
thread with the alternative reproduction in mind (subscribing to one
publication with viaroot=true, and another publication with
viaroot=false) and misread the report accordingly... In the end, I'm
comfortable with the current level of coverage.

> > Would it
> > be enough to just replace that whole thing with gpt.attrs?
>
> Make sense.
> Changed as suggested.

LGTM, by inspection. Thanks!

--Jacob

Re: Non-superuser subscription owners

On Fri, 2023-03-24 at 00:17 -0700, Jeff Davis wrote:
> The other patch you posted seems like it makes a lot of progress in
> that direction, and I think that should go in first. That was one of
> the items I suggested previously[2], so thank you for working on
> that.

The above is not a hard objection.

I still hold the opinion that the non-superuser subscriptions work is
feels premature without the apply-as-table-owner work. It would be
great if the other patch ends up ready quickly, which would moot the
commit-ordering question.

Regards,
Jeff Davis

Re: running logical replication as the subscription owner

2023-03-28 Thread Robert Haas

On Tue, Mar 28, 2023 at 1:38 AM Jeff Davis  wrote:
> >  If they do have those things then in
> > theory they might be able to protect themselves, but in practice they
> > are unlikely to be careful enough. I judge that practically every
> > installation where table owners use triggers would be easily
> > compromised. Only the most security-conscious of table owners are
> > going to get this right.
>
> This is the interesting part.
>
> Commit 11da97024ab set the subscription process's search path as empty.
> It seems it was done to protect the subscription owner against users
> writing into the public schema. But after your apply-as-table-owner
> patch, it seems to also offer some protection for table owners against
> a malicious subscription owner, too.
>
> But I must be missing something obvious here -- if the subscription
> process has an empty search_path, what can an attacker do to trick it?

Oh, interesting. I hadn't realized we were doing that, and I do think
that narrows the vulnerability quite a bit.

But I still feel pretty uncomfortable with the idea of requiring only
INSERT/UPDATE/DELETE permissions on the target table. Let's suppose
that you create a table and attach a trigger to it which is SECURITY
INVOKER. Absent logical replication, you don't really need to worry
about whether that function is written securely -- it will run with
privileges of the person performing the DML, and not impact your
account at all. They have reason to be scared of you, but not the
reverse. However, if the people doing DML on the table can arrange to
perform that DML as you, then any security vulnerabilities in your
function potentially allow them to compromise your account. Now, there
might not be any, but there also might be some, depending on exactly
what your function does. And if logical replication into a table
requires only I/U/D permission then it's basically a back-door way to
run functions that someone could otherwise execute only as themselves
as the table owner, and that's scary.

Now, I don't know how much to worry about that. As you just pointed
out to me in some out-of-band discussion, this is only going to affect
a table owner who writes a trigger, makes it insecure in some way
other than failure to sanitize the search_path, and sets it ENABLE
ALWAYS TRIGGER or ENABLE REPLICA TRIGGER. And maybe we could say that
if you do that last part, you kind of need to think about the
consequences for logical replication. If so, we could document that
problem away. It would also affect someone who uses a default
expression or other means of associating executable code with the
table, and something like a default expression doesn't require any
explicit setting to make it apply in case of replication, so maybe the
risk of someone messing up is a bit higher.

But this definitely makes it more of a judgment call than I thought
initially. Functions that are vulnerable to search_path exploits are a
dime a dozen; other kinds of exploits are going to be less common, and
more dependent on exactly what the function is doing.

I'm still inclined to leave the patch checking for SET ROLE, because
after all, we're thinking of switching user identities to the table
owner, and checking whether we can SET ROLE is the most natural way of
doing that, and definitely doesn't leave room to escalate to any
privilege you don't already have. However, there might be an argument
that we ought to do something else, like have a REPLICATE privilege on
the table that the owner can grant to users that they trust to
replicate into that table. Because time is short, I'd like to leave
that idea for a future release. What I would like to change in the
patch in this release is to add a new subscription property along the
lines of what I proposed to Jelte in my earlier email: let's provide
an escape hatch that turns off the user-switching behavior. If we do
that, then anyone who feels themselves worse off after this patch can
switch back to the old behavior. Most people will be better off, I
believe, and the opportunity to make things still better in the future
is not foreclosed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: zstd compression for pg_dump

2023-03-28 Thread Kirk Wolak

On Tue, Mar 28, 2023 at 12:23 PM Tomas Vondra 
wrote:

> On 3/27/23 19:28, Justin Pryzby wrote:
> > On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote:
> >> On 3/16/23 05:50, Justin Pryzby wrote:
> >>> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote:
>  On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion <
> jchamp...@timescale.com> wrote:
> > I did some smoke testing against zstd's GitHub release on Windows. To
> ...
> OK. I don't have access to a Windows machine so I can't test that. Is it
> possible to disable the zstd threading, until we figure this out?
>
> Thomas since I appear to be one of the few windows users (I use both), can
I help?
I can test pg_dump... for you, easy to do.  I do about 5-10 pg_dumps a day
on windows while developing.

Also, I have an AWS instance I created to build PG/Win with readline back
in November.
I could give you access to that...  (you are not the only person who has
made this statement here).
I've made such instances available for other Open Source developers, to
support them.

 Obvi I would share connection credentials privately.

Regards, Kirk

Re: Add pg_walinspect function with block info columns

2023-03-28 Thread Peter Geoghegan

On Mon, Mar 27, 2023 at 7:47 PM Bharath Rupireddy
 wrote:
> Hm, agreed. Changed in the attached v7-0002 patch. We can as well
> write a case statement in the create function SQL to output forkname
> instead forknumber, but I'd stop doing that to keep in sync with
> pg_buffercache.

I just don't see much value in any textual representation of fork
name, however generated. In practice it's just not adding very much
useful information. It is mostly useful as a way of filtering block
references, which makes simple integers more natural.

> Oh, thanks for finding it out. Fixed in the attached v7-0001 patch. I
> also removed the "invalid fork number" error as users can figure that
> out if at all the fork number is wrong.

Pushed just now.

> On the ordering of the columns, I kept start_lsn, end_lsn and prev_lsn
> first and then the rel** columns (this rel** columns order follows
> pg_buffercache) and then block data related columns. Michael and
> Kyotaro are of the opinion that it's better to keep LSNs first to be
> consistent and also given that this function is WAL related, it makes
> sense to have LSNs first.

Right, but I didn't change that part in the revision of the patch I
posted. Those columns still came first, and were totally consistent
with the pg_get_wal_record_info function.

I think that there was a "mid air collision" here, where we both
posted patches that we each called v7 within minutes of each other.
Just to be clear, I ended up with a column order as described here in
my revision:

https://postgr.es/m/CAH2-WzmzO-AU4QSbnzzANBkrpg=4cuod3scvtv+7x65e+qk...@mail.gmail.com

It now occurs to me that "fpi_data" should perhaps be called
"block_fpi_data".  What do you think?

-- 
Peter Geoghegan

Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-28 Thread Thomas Munro

On Tue, Mar 28, 2023 at 5:15 PM Thomas Munro  wrote:
> I guess we should add a test of -Fp too, to keep it working?

Oops, that was already tested in existing tests, so I take that back.

Re: "variable not found in subplan target list"

2023-03-28 Thread Alvaro Herrera

So I'm back home and found a couple more weird errors in the log:

MERGE INTO public.idxpart2 as target_0
USING (select  1 
from 
public.xmltest2 as ref_0
inner join public.prt1_l_p1 as sample_0
inner join fkpart4.droppk as ref_1
on (sample_0.a = ref_1.a )
on (true)
limit 50) as subq_0
left join information_schema.transforms as ref_2
left join public.transition_table_status as sample_1
on (ref_2.transform_type is not NULL)
on (true)
ON target_0.a = sample_1.level 
WHEN MATCHED
THEN UPDATE set a = target_0.a;
ERROR:  mismatching PartitionPruneInfo found at part_prune_index 0
DETALLE:  plan node relids (b 1), pruneinfo relids (b 36)

This one is probably my fault, will look later.


select  
  pg_catalog.pg_stat_get_buf_fsync_backend() as c9 
from 
  public.tenk2 as ref_0
where (ref_0.stringu2 is NULL) 
  and (EXISTS (
select  1 from fkpart5.fk1 as ref_1
  where pg_catalog.current_date() < (select pg_catalog.max(filler3) 
from public.mcv_lists))) ;

ERROR:  subplan "InitPlan 1 (returns $1)" was not initialized
CONTEXTO:  parallel worker


select 1 as c0
from 
  (select  
  subq_0.c9 as c5, 
  subq_0.c8 as c9
from 
  public.iso8859_5_inputs as ref_0,
  lateral (select  
ref_1.ident as c2, 
ref_0.description as c8, 
ref_1.used_bytes as c9
  from 
pg_catalog.pg_backend_memory_contexts as ref_1
  where true
  ) as subq_0
where subq_0.c2 is not NULL) as subq_1
inner join pg_catalog.pg_class as sample_0
on (subq_1.c5 = public.int8alias1in(
  cast(case when subq_1.c9 is not NULL then null end
 as cstring)))
where true;
ERROR:  could not find commutator for operator 53286


There were quite a few of those "variable not found" ones, both
mentioning singular "targetlist" and others that said "targetlists".  I
reran them with your patch and they no longer error out, so I guess it's
all the same bug.


-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code.  Too bad I can't do this at work (Oracle 8/9)."   (Tom Allison)
   http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php

Re: logical decoding and replication of sequences, take 2

On 3/28/23 18:34, Masahiko Sawada wrote:
> On Mon, Mar 27, 2023 at 11:46 PM Tomas Vondra
>  wrote:
>>
>>
>>
>> On 3/27/23 03:32, Masahiko Sawada wrote:
>>> Hi,
>>>
>>> On Fri, Mar 24, 2023 at 7:26 AM Tomas Vondra
>>>  wrote:

 I merged the earlier "fixup" patches into the relevant parts, and left
 two patches with new tweaks (deducing the corrent "WAL" state from the
 current state read by copy_sequence), and the interlock discussed here.

>>>
>>> Apart from that, how does the publication having sequences work with
>>> subscribers who are not able to handle sequence changes, e.g. in a
>>> case where PostgreSQL version of publication is newer than the
>>> subscriber? As far as I tested the latest patches, the subscriber
>>> (v15)  errors out with the error 'invalid logical replication message
>>> type "Q"' when receiving a sequence change. I'm not sure it's sensible
>>> behavior. I think we should instead either (1) deny starting the
>>> replication if the subscriber isn't able to handle sequence changes
>>> and the publication includes that, or (2) not send sequence changes to
>>> such subscribers.
>>>
>>
>> I agree the "invalid message" error is not great, but it's not clear to
>> me how to do either (1). The trouble is we don't really know if the
>> publication contains (or will contain) sequences. I mean, what would
>> happen if the replication starts and then someone adds a sequence?
>>
>> For (2), I think that's not something we should do - silently discarding
>> some messages seems error-prone. If the publication includes sequences,
>> presumably the user wanted to replicate those. If they want to replicate
>> to an older subscriber, create a publication without sequences.
>>
>> Perhaps the right solution would be to check if the subscriber supports
>> replication of sequences in the output plugin, while attempting to write
>> the "Q" message. And error-out if the subscriber does not support it.
> 
> It might be related to this topic; do we need to bump the protocol
> version? The commit 64824323e57d introduced new streaming callbacks
> and bumped the protocol version. I think the same seems to be true for
> this change as it adds sequence_cb callback.
> 

It's not clear to me what should be the exact behavior?

I mean, imagine we're opening a connection for logical replication, and
the subscriber does not handle sequences. What should the publisher do?

(Note: The correct commit hash is 464824323e57d.)

I don't think the streaming is a good match for sequences, because of a
couple important differences ...

Firstly, streaming determines *how* the changes are replicated, not what
gets replicated. It doesn't (silently) filter out "bad" events that the
subscriber doesn't know how to apply. If the subscriber does not know
how to deal with streamed xacts, it'll still get the same changes
exactly per the publication definition.

Secondly, the default value is "streming=off", i.e. the subscriber has
to explicitly request streaming when opening the connection. And we
simply check it against the negotiated protocol version, i.e. the check
in pgoutput_startup() protects against subscriber requesting a protocol
v1 but also streaming=on.

I don't think we can/should do more check at this point - we don't know
what's included in the requested publications at that point, and I doubt
it's worth adding because we certainly can't predict if the publication
will be altered to include/decode sequences in the future.

Speaking of precedents, TRUNCATE is probably a better one, because it's
a new action and it determines *what* the subscriber can handle. But
that does exactly the thing we do for sequences - if you open a
connection from PG10 subscriber (truncate was added in PG11), and the
publisher decodes a truncate, subscriber will do:

2023-03-28 20:29:46.921 CEST [2357609] ERROR:  invalid logical
   replication message type "T"
2023-03-28 20:29:46.922 CEST [2356534] LOG:  worker process: logical
   replication worker for subscription 16390 (PID 2357609) exited with
   exit code 1

I don't see why sequences should do anything else. If you need to
replicate to such subscriber, create a publication that does not have
'sequence' in the publish option ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: "variable not found in subplan target list"

Alvaro Herrera  writes:
> So I'm back home and found a couple more weird errors in the log:

> ERROR:  mismatching PartitionPruneInfo found at part_prune_index 0
> DETALLE:  plan node relids (b 1), pruneinfo relids (b 36)

This one reproduces for me.

> select  
>   pg_catalog.pg_stat_get_buf_fsync_backend() as c9 
> from 
>   public.tenk2 as ref_0
> where (ref_0.stringu2 is NULL) 
>   and (EXISTS (
> select  1 from fkpart5.fk1 as ref_1
>   where pg_catalog.current_date() < (select pg_catalog.max(filler3) 
> from public.mcv_lists))) ;

> ERROR:  subplan "InitPlan 1 (returns $1)" was not initialized
> CONTEXTO:  parallel worker

Hmph, I couldn't reproduce that, not even with other settings of
debug_parallel_query.  Are you running it with non-default
planner parameters?

> select 1 as c0
> ...
> ERROR:  could not find commutator for operator 53286

I got a slightly different error:

ERROR:  missing support function 1(195306,195306) in opfamily 1976

where

regression=# select 195306::regtype;   
  regtype   

 int8alias1
(1 row)

So that one is related to the intentionally-somewhat-broken
int8 opclass configuration that equivclass.sql leaves behind.
I've always had mixed emotions about whether leaving that
set up that way was a good idea or not.  In principle nothing
really bad should happen, but it can lead to confusing errors
like this one.  Maybe it'd be better to roll that back?

regards, tom lane

Re: POC: Better infrastructure for automated testing of concurrency issues

On Wed, 31 Aug 2022 at 20:52, Alexander Korotkov  wrote:
>
> I'll continue work on this patch.  The rebased patch is attached.  It
> implements stop events as configure option (not runtime GUC option).

It looks like this patch isn't going to be ready this commitfest. And
it hasn't received much discussion since August 2022. If I'm wrong say
something but otherwise I'll mark it Returned With Feedback. It can be
resurrected (and moved to the next commitfest) when you're free to
work on it again.

-- 
Gregory Stark
As Commitfest Manager

Re: zstd compression for pg_dump

On 3/28/23 20:03, Kirk Wolak wrote:
> On Tue, Mar 28, 2023 at 12:23 PM Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
> 
> On 3/27/23 19:28, Justin Pryzby wrote:
> > On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote:
> >> On 3/16/23 05:50, Justin Pryzby wrote:
> >>> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote:
>  On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion
> mailto:jchamp...@timescale.com>> wrote:
> > I did some smoke testing against zstd's GitHub release on
> Windows. To
> ...
> OK. I don't have access to a Windows machine so I can't test that. Is it
> possible to disable the zstd threading, until we figure this out?
> 
> Thomas since I appear to be one of the few windows users (I use both),
> can I help?
> I can test pg_dump... for you, easy to do.  I do about 5-10 pg_dumps a
> day on windows while developing.
> 

Perhaps. But I'll leave the details up to Justin - it's his patch, and
I'm not sure how to verify the threading is OK.

I'd try applying this patch, build with --with-zstd and then run the
pg_dump TAP tests, and perhaps do some manual tests.

And perhaps do the same for --with-lz4 - there's a thread [1] suggesting
we don't detect lz4 stuff on Windows, so the TAP tests do nothing.

https://www.postgresql.org/message-id/zajl96n9zw84u...@msg.df7cb.de

> Also, I have an AWS instance I created to build PG/Win with readline
> back in November.
> I could give you access to that...  (you are not the only person who has
> made this statement here).
> I've made such instances available for other Open Source developers, to
> support them.
> 
>  Obvi I would share connection credentials privately.
> 

I'd rather leave the Windows stuff up to someone with more experience
with that platform. I have plenty of other stuff on my plate atm.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add LZ4 compression in pg_dump

2023-03-28 Thread Justin Pryzby

On Tue, Mar 28, 2023 at 06:40:03PM +0200, Tomas Vondra wrote:
> On 3/28/23 18:07, gkokola...@pm.me wrote:
> > --- Original Message ---
> > On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me 
> >  wrote:
> > 
> >> --- Original Message ---
> >> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
> >> tomas.von...@enterprisedb.com wrote:
> >>
> >>> This leaves the empty-data issue (which we have a fix for) and the
> >>> switch to LZ4F. And then the zstd part.
> >>
> >> Please expect promptly a patch for the switch to frames.
> > 
> > Please find the expected patch attached. Note that the bulk of the
> > patch is code unification, variable renaming to something more
> > appropriate, and comment addition. These are changes that are not
> > strictly necessary to switch to LZ4F. I do believe that are
> > essential for code hygiene after the switch and they do belong
> > on the same commit. 
> 
> Thanks!
> 
> I agree the renames & cleanup are appropriate - it'd be silly to stick
> to misleading naming etc. Would it make sense to split the patch into
> two, to separate the renames and the switch to lz4f?
> That'd make it the changes necessary for lz4f switch clearer.

I don't think so.  Did you mean separate commits only for review ?

The patch is pretty readable - the File API has just some renames, and
the compressor API is what's being replaced, which isn't going to be any
more clear.

@Georgeos: did you consider using a C union in LZ4State, to separate the
parts used by the different APIs ?

-- 
Justin

Re: Request for comment on setting binary format output per session

FYI I attached this thread to
https://commitfest.postgresql.org/42/3777 which I believe is the same
issue. I mistakenly had this listed as a CF entry with no discussion
for a long time due to that missing link.


--
Gregory Stark
As Commitfest Manager

Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

On Tue, 3 Jan 2023 at 14:16, Tom Lane  wrote:
>
> Vik Fearing  writes:
>
> > I don't think we should add that syntax until I do get it through the
> > committee, just in case they change something.
>
> Agreed.  So this is something we won't be able to put into v16;
> it'll have to wait till there's something solid from the committee.

I guess I'll mark this Rejected in the CF then. Who knows when the SQL
committee will look at this...

-- 
Gregory Stark
As Commitfest Manager

Re: running logical replication as the subscription owner

On Tue, 28 Mar 2023 at 18:13, Robert Haas  wrote:
>
> On Mon, Mar 27, 2023 at 6:08 PM Jelte Fennema  wrote:
> > For high privileged subscription owners,
> > we downgrade to the permissions of the table owner, but for low
> > privileged ones we care about permissions of the subscription owner
> > itself.
>
> Hmm. This is an interesting idea. A variant on this theme could be:
> what if we made this an explicit configuration option?

Sounds perfect to me. Let's do that. As long as the old no-superuser
tests continue to pass when disabling the new switch-to-table-owner
behaviour, that sounds totally fine to me. I think it's probably
easiest if you use the tests from my v2 patch when you add that
option, since that was by far the thing I spent most time on to get
right in the v2 patch.

On Tue, 28 Mar 2023 at 18:13, Robert Haas  wrote:
>
> On Mon, Mar 27, 2023 at 6:08 PM Jelte Fennema  wrote:
> > Attached is an updated version of your patch with what I had in mind
> > (admittedly it needed one more line than "just" the return to make it
> > work). But as you can see all previous tests for a lowly privileged
> > subscription owner that **cannot** SET ROLE to the table owner
> > continue to work as they did before. While still downgrading to the
> > table owners role when the subscription owner **can** SET ROLE to the
> > table owner.
> >
> > Obviously this needs some comments explaining what's going on and
> > probably some code refactoring and/or variable renaming, but I hope
> > it's clear what I meant now: For high privileged subscription owners,
> > we downgrade to the permissions of the table owner, but for low
> > privileged ones we care about permissions of the subscription owner
> > itself.
>
> Hmm. This is an interesting idea. A variant on this theme could be:
> what if we made this an explicit configuration option?
>
> I'm worried that if we just try to switch users and silently fall back
> to not doing so when we don't have enough permissions, the resulting
> behavior is going to be difficult to understand and troubleshoot. I'm
> thinking that maybe if you let people pick the behavior they want that
> becomes more comprehensible. It's also a nice insurance policy: say
> for the sake of argument we make switch-to-table-owner the new
> default. If that new behavior causes something to happen to somebody
> that they don't like, they can always turn it off, even if they are a
> highly privileged user who doesn't "need" to turn it off.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Re: [PATCH]Feature improvement for MERGE tab completion

It looks like this remaining work isn't going to happen this CF and
therefore this release. There hasn't been an update since January when
Dean Rasheed posted a review.

Unless there's any updates soon I'll move this on to the next
commitfest or mark it returned with feedback.

-- 
Gregory Stark
As Commitfest Manager

Re: [PATCH]Feature improvement for MERGE tab completion

Ah, another thread with a bouncing email address...
Please respond to to thread from this point to avoid bounces.



-- 
Gregory Stark
As Commitfest Manager

Re: Why mark empty pages all visible?

2023-03-28 Thread Andres Freund

Hi,

On 2023-03-27 21:51:09 -0700, Peter Geoghegan wrote:
> On Mon, Mar 27, 2023 at 9:32 PM Andres Freund  wrote:
> > > You haven't said anything about the leading cause of marking empty
> > > pages all-frozen, by far: lazy_vacuum_heap_page(). Should it also stop
> > > marking empty pages all-frozen?
> >
> > It's not obvious that it should - but it's not as clear a case as the
> > ConditionalLockBufferForCleanup() -> lazy_scan_new_or_empty() one. In the
> > latter, we know
> > a) that we don't have to do any work to be able to advance the horizon
> > b) we know that somebody else has the page pinned
> >
> > What's the point in marking it all-visible at that point? In quite likely be
> > from RelationGetBufferForTuple() having extended the relation and then 
> > briefly
> > needed to release the lock (to acquire the lock on otherBuffer or in
> > GetVisibilityMapPins()).
> 
> I think that there is significant value in avoiding special cases, on
> general principle. If we stopped doing this in
> lazy_scan_new_or_empty() we'd be inventing a new thing that cleanup
> locks are supposed to protect against. Maybe something like that would
> make sense, but if so then make that argument, and make it explicitly
> represented in the code.

I will probably make that argument - so far I was just trying to understand
the intent of the current code. There aren't really comments explaining why we
want to mark currently-pinned empty/new pages all-visible and enter them into
the FSM.

Historically we did *not* enter currently pinned empty/new pages into the FSM
/ VM. Afaics that's new as of 44fa84881fff.

The reason I'm looking at this is that there's a lot of complexity at the
bottom of RelationGetBufferForTuple(), related to needing to release the lock
on the newly extended page and then needing to recheck whether there still is
free space on the page. And that it's not complicated enough
(c.f. INSERT_FROZEN doing visibilitymap_pin() with the page locked).

As far as I can tell, if we went back to not entering new/empty pages into
either VM or FSM, we could rely on the page not getting filled while just
pinning, not locking it.

> > I don't follow. In the ConditionalLockBufferForCleanup() ->
> > lazy_scan_new_or_empty() case we are dealing with an new or empty
> > page. Whereas lazy_vacuum_heap_page() deals with a page that definitely
> > has dead tuples on it.  How are those two cases comparable?
> 
> It doesn't have dead tuples anymore, though.
> 
> ISTM that there is an issue here with the definition of an empty page.
> You're concerned about PageIsEmpty() pages.

And not just any PageIsEmpty() page, ones that are currently pinned.

I also do wonder whether the different behaviour of PageIsEmpty() and
PageIsNew() pages makes sense. The justification for not marking PageIsNew()
pages as all-visible holds just as true for empty pages. There's just as much
free space there.

Greetings,

Andres Freund

Re: Schema variables - new implementation for Postgres 15

2023-03-28 Thread Pavel Stehule

ne 26. 3. 2023 v 13:32 odesílatel Julien Rouhaud 
napsal:

> Hi,
>
> I just have a few minor wording improvements for the various comments /
> documentation you quoted.
>
> On Sun, Mar 26, 2023 at 08:53:49AM +0200, Pavel Stehule wrote:
> > út 21. 3. 2023 v 17:18 odesílatel Peter Eisentraut <
> > peter.eisentr...@enterprisedb.com> napsal:
> >
> > > - What is the purpose of struct Variable?  It seems very similar to
> > >FormData_pg_variable.  At least a comment would be useful.
> > >
> >
> > I wrote comment there:
> >
> >
> > /*
> >  * The Variable struct is based on FormData_pg_variable struct. Against
> >  * FormData_pg_variable it can hold node of deserialized expression used
> >  * for calculation of default value.
> >  */
>
> Did you mean "Unlike" rather than "Against"?
>

fixed


>
> > > 0002
> > >
> > > expr_kind_allows_session_variables() should have some explanation
> > > about criteria for determining which expression kinds should allow
> > > variables.
> > >
> >
> > I wrote comment there:
> >
> >  /*
> >   * Returns true, when expression of kind allows using of
> >   * session variables.
> > + * The session's variables can be used everywhere where
> > + * can be used external parameters. Session variables
> > + * are not allowed in DDL. Session's variables cannot be
> > + * used in constraints.
> > + *
> > + * The identifier can be parsed as an session variable
> > + * only in expression's kinds where session's variables
> > + * are allowed. This is the primary usage of this function.
> > + *
> > + * Second usage of this function is for decision if
> > + * an error message "column does not exist" or "column
> > + * or variable does not exist" should be printed. When
> > + * we are in expression, where session variables cannot
> > + * be used, we raise the first form or error message.
> >   */
>
> Maybe
>
> /*
>  * Returns true if the given expression kind is valid for session variables
>  * Session variables can be used everywhere where external parameters can
> be
>  * used.  Session variables are not allowed in DDL commands or in
> constraints.
>  *
>  * An identifier can be parsed as a session variable only for expression
> kinds
>  * where session variables are allowed. This is the primary usage of this
>  * function.
>  *
>  * Second usage of this function is to decide whether "column does not
> exist" or
>  * "column or variable does not exist" error message should be printed.
>  * When we are in an expression where session variables cannot be used, we
> raise
>  * the first form or error message.
>  */
>

changed


>
> > > session_variables_ambiguity_warning: There needs to be more
> > > information about this.  The current explanation is basically just,
> > > "warn if your query is confusing".  Why do I want that?  Why would I
> > > not want that?  What is the alternative?  What are some examples?
> > > Shouldn't there be a standard behavior without a need to configure
> > > anything?
> > >
> >
> > I enhanced this entry:
> >
> > +   
> > +The session variables can be shadowed by column references in a
> > query. This
> > +is an expected feature. The existing queries should not be
> broken
> > by creating
> > +any session variable, because session variables are shadowed
> > always if the
> > +identifier is ambiguous. The variables should be named without
> > possibility
> > +to collision with identifiers of other database objects (column
> > names or
> > +record field names). The warnings enabled by setting
> > session_variables_ambiguity_warning
> > +should help with finding identifier's collisions.
>
> Maybe
>
> Session variables can be shadowed by column references in a query, this is
> an
> expected behavior.  Previously working queries shouldn't error out by
> creating
> any session variable, so session variables are always shadowed if an
> identifier
> is ambiguous.  Variables should be referenced using an unambiguous
> identifier
> without any possibility for a collision with identifier of other database
> objects (column names or record fields names).  The warning messages
> emitted
> when enabling session_variables_ambiguity_warning can
> help
> finding such identifier collision.
>
> > +   
> > +   
> > +This feature can significantly increase size of logs, and then
> it
> > is
> > +disabled by default, but for testing or development
> environments it
> > +should be enabled.
>
> Maybe
>
> This feature can significantly increase log size, so it's disabled by
> default.
> For testing or development environments it's recommended to enable it if
> you
> use session variables.
>

replaced

Thank you very much for these language correctures

Regards

Pavel

p.s. I'll send updated patch after today or tomorrow - I have to fix broken
dependency check after rebase

Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-03-28 Thread Isaac Morland

On Mon, 19 Dec 2022 at 17:57, Corey Huinker  wrote:

>
> Attached is my work in progress to implement the changes to the CAST()
> function as proposed by Vik Fearing.
>
> CAST(expr AS typename NULL ON ERROR)
> will use error-safe functions to do the cast of expr, and will return
> NULL if the cast fails.
>
> CAST(expr AS typename DEFAULT expr2 ON ERROR)
> will use error-safe functions to do the cast of expr, and will return
> expr2 if the cast fails.
>

Is there any difference between NULL and DEFAULT NULL?

Re: Documentation for building with meson

2023-03-28 Thread samay sharma

Hi,

On Wed, Mar 15, 2023 at 4:28 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

>  > [PATCH v8 1/5] Make minor additions and corrections to meson docs
>
> The last hunk revealed that there is some mixing up between meson setup
> and meson configure.  This goes a bit further.  For example, earlier it
> says that to get a list of meson setup options, call meson configure
> --help and look at https://mesonbuild.com/Commands.html#configure, which
> are both wrong.  Also later throughout the text it uses one or the
> other.  I think this has the potential to be very confusing, and we
> should clean this up carefully.
>
> The text about additional meson test options maybe should go into the
> regress.sgml chapter?
>

I tried to make the meson setup and meson configure usage consistent. I've
removed the text for the test options.

>
>
>  > [PATCH v8 2/5] Add data layout options sub-section in installation
>   docs
>
> This makes sense.  Please double check your patch for correct title
> casing, vertical spacing of XML, to keep everything looking consistent.
>

Thanks for noticing. Made it consistent on both sides.

>
> This text isn't yours, but since your patch emphasizes it, I wonder if
> it could use some clarification:
>
> + These options affect how PostgreSQL lays out data on disk.
> + Note that changing these breaks on-disk database compatibility,
> + meaning you cannot use pg_upgrade to upgrade to
> + a build with different values of these options.
>
> This isn't really correct.  What breaking on-disk compatibility means is
> that you can't use a server compiled one way with a data directory
> initialized by binaries compiled another way.  pg_upgrade may well have
> the ability to upgrade between one or the other; that's up to pg_upgrade
> to figure out but not an intrinsic property.  (I wonder why pg_upgrade
> cares about the WAL block size.)
>

 Fixed.

>
>
>  > [PATCH v8 3/5] Remove Anti-Features section from Installation from
>   source docs
>
> Makes sense.  But is "--disable-thread-safety" really a developer
> feature?  I think not.
>
>
Moved to PostgreSQL features. Do you think there's a better place for it?


>
>  > [PATCH v8 4/5] Re-organize Miscellaneous section
>
> This moves the Miscellaneous section after Developer Features.  I think
> Developer Features should be last.
>
> Maybe should remove this section and add the options to the regular
> PostgreSQL Features section.
>

Yes, that makes sense. Made this change.

>
> Also consider the grouping in meson_options.txt, which is slightly
> different yet.


Removed Misc options section from meson_options.txt too.

>
>
>  > [PATCH v8 5/5] Change Short Version for meson installation guide
>
> +# create working directory
> +mkdir postgres
> +cd postgres
> +
> +# fetch source code
> +git clone https://git.postgresql.org/git/postgresql.git src
>
> This comes after the "Getting the Source" section, so at this point they
> already have the source and don't need to do "git clone" etc. again.
>
> +# setup and enter build directory (done only first time)
> +## Unix based platforms
> +meson setup build src --prefix=$PWD/install
> +
> +## Windows
> +meson setup build src --prefix=%cd%/install
>
> Maybe some people work this way, but to me the directory structures you
> create here are completely weird.
>

I'd like to discuss what you think is a good directory structure to work
with. I've mentioned some of the drawbacks I see with the current structure
for the short version. I know this structure can feel different but it
feeling weird is not ideal. Do you have a directory structure in mind which
is different but doesn't feel odd to you?


>
> +# Initialize a new database
> +../install/bin/initdb -D ../data
> +
> +# Start database
> +../install/bin/pg_ctl -D ../data/ -l logfile start
> +
> +# Connect to the database
> +../install/bin/psql -d postgres
>
> The terminology here needs to be tightened up.  You are using "database"
> here to mean three different things.
>

I'll address this together once we are aligned on the overall directory
structure etc.

There are a few reasons why I had done this. Some reasons Andres has
> described in his previous email and I'll add a few specific examples on why
> having the same section for both might not be a good idea.
>
>  * Having readline and zlib as required for building PostgreSQL is now
> wrong because they are not required for meson builds. Also, the name of the
> configs are different for make and meson and the current section only lists
> the make ones.
>  * There are many references to configure in that section which don't
> apply to meson.
>  * Last I checked Flex and Bison were always required to build via meson
> but not for make and the current section doesn't explain those differences.
>
> I spent a good amount of time thinking if we could have a single section,
> clarify these differences to make it correct and not confuse the users. I
> couldn't find a way to do all thre

Re: Schema variables - new implementation for Postgres 15

2023-03-28 Thread Pavel Stehule

Hi

ne 26. 3. 2023 v 19:53 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Sun, Mar 26, 2023 at 07:32:05PM +0800, Julien Rouhaud wrote:
> > Hi,
> >
> > I just have a few minor wording improvements for the various comments /
> > documentation you quoted.
>
> Talking about documentation I've noticed that the implementation
> contains few limitations, that are not mentioned in the docs. Examples
> are WITH queries:
>
> WITH x AS (LET public.svar = 100) SELECT * FROM x;
> ERROR:  LET not supported in WITH query
>

 The LET statement doesn't support the RETURNING clause, so using inside
CTE does not make any sense.

Do you have some tips, where this behaviour should be mentioned?


> and using with set-returning functions (haven't found any related tests).
>

There it is:

+CREATE VARIABLE public.svar AS int;
+-- should be ok
+LET public.svar = generate_series(1, 1);
+-- should fail
+LET public.svar = generate_series(1, 2);
+ERROR:  expression returned more than one row
+LET public.svar = generate_series(1, 0);
+ERROR:  expression returned no rows
+DROP VARIABLE public.svar;


>
> Another small note is about this change in the rowsecurity:
>
> /*
> -* For SELECT, UPDATE and DELETE, add security quals to enforce
> the USING
> -* policies.  These security quals control access to existing
> table rows.
> -* Restrictive policies are combined together using AND, and
> permissive
> -* policies are combined together using OR.
> +* For SELECT, LET, UPDATE and DELETE, add security quals to
> enforce the
> +* USING policies.  These security quals control access to
> existing table
> +* rows. Restrictive policies are combined together using AND, and
> +* permissive policies are combined together using OR.
>  */
>
> From this commentary one may think that LET command supports row level
> security, but I don't see it being implemented. A wrong commentary?
>

I don't think so.  The row level security should be supported. I tested it
on example from doc:

CREATE TABLE public.accounts (
manager text,
company text,
contact_email text
);

CREATE VARIABLE public.v AS text;

COPY public.accounts (manager, company, contact_email) FROM stdin;
t1role xxx t1r...@xxx.org
t2role yyy t2r...@yyy.org
\.

CREATE POLICY account_managers ON public.accounts USING ((manager =
CURRENT_USER));
ALTER TABLE public.accounts ENABLE ROW LEVEL SECURITY;

GRANT SELECT,INSERT ON TABLE public.accounts TO t1role;
GRANT SELECT,INSERT ON TABLE public.accounts TO t2role;

GRANT ALL ON VARIABLE public.v TO t1role;
GRANT ALL ON VARIABLE public.v TO t2role;


[pavel@localhost postgresql.master]$ psql
Assertions: on
psql (16devel)
Type "help" for help.

(2023-03-28 21:32:33) postgres=# set role to t1role;
SET
(2023-03-28 21:32:40) postgres=# select * from accounts ;
┌─┬─┬┐
│ manager │ company │ contact_email  │
╞═╪═╪╡
│ t1role  │ xxx │ t1r...@xxx.org │
└─┴─┴┘
(1 row)

(2023-03-28 21:32:45) postgres=# let v = (select company from accounts);
LET
(2023-03-28 21:32:58) postgres=# select v;
┌─┐
│  v  │
╞═╡
│ xxx │
└─┘
(1 row)

(2023-03-28 21:33:03) postgres=# set role to default;
SET
(2023-03-28 21:33:12) postgres=# set role to t2role;
SET
(2023-03-28 21:33:19) postgres=# select * from accounts ;
┌─┬─┬┐
│ manager │ company │ contact_email  │
╞═╪═╪╡
│ t2role  │ yyy │ t2r...@yyy.org │
└─┴─┴┘
(1 row)

(2023-03-28 21:33:22) postgres=# let v = (select company from accounts);
LET
(2023-03-28 21:33:26) postgres=# select v;
┌─┐
│  v  │
╞═╡
│ yyy │
└─┘
(1 row)


Regards

Pavel

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2023-03-28 Thread stephane tachoires

Hi,

Patch v15-0001-ALTER-TABLE-MERGE-PARTITIONS-command.patch
Apply nicely.
One warning on meson compile (configure -Dssl=openssl -Dldap=enabled 
-Dauto_features=enabled -DPG_TEST_EXTRA='ssl,ldap,kerberos' -Dbsd_auth=disabled 
-Dbonjour=disabled -Dpam=disabled -Dpltcl=disabled -Dsystemd=disabled 
-Dzstd=disabled  -Db_coverage=true)

../../src/pgmergesplit/src/test/modules/test_ddl_deparse/test_ddl_deparse.c: In 
function ‘get_altertable_subcmdinfo’:
../../src/pgmergesplit/src/test/modules/test_ddl_deparse/test_ddl_deparse.c:112:17:
 warning: enumeration value ‘AT_MergePartitions’ not handled in switch 
[-Wswitch]
  112 | switch (subcmd->subtype)
  | ^~
Should be the same with 0002...

meson test perfect, patch coverage is very good.

Patch v15-0002-ALTER-TABLE-SPLIT-PARTITION-command.patch
Doesn't apply on 326a33a289c7ba2dbf45f17e610b7be98dc11f67

Patch v15-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch
Apply with one warning  1 line add space error (translate from french "warning: 
1 ligne a ajouté des erreurs d'espace").
v15-0003-Documentation-for-ALTER-TABLE-SPLIT-PARTITION-ME.patch:54: trailing 
whitespace.
  One of the new partitions partition_name1, 
Comment are ok for me. A non native english speaker.
Perhaps you could add some remarks in ddl.html and alter-ddl.html

Stéphane

The new status of this patch is: Waiting on Author

Re: POC: Better infrastructure for automated testing of concurrency issues

2023-03-28 Thread Alexander Korotkov

On Tue, Mar 28, 2023 at 9:44 PM Gregory Stark (as CFM)
 wrote:
> On Wed, 31 Aug 2022 at 20:52, Alexander Korotkov  wrote:
> >
> > I'll continue work on this patch.  The rebased patch is attached.  It
> > implements stop events as configure option (not runtime GUC option).
>
> It looks like this patch isn't going to be ready this commitfest. And
> it hasn't received much discussion since August 2022. If I'm wrong say
> something but otherwise I'll mark it Returned With Feedback. It can be
> resurrected (and moved to the next commitfest) when you're free to
> work on it again.

I'm good with that.

--
Regards,
Alexander Korotkov

Re: Some revises in adding sorting path

2023-03-28 Thread David Rowley

On Tue, 21 Feb 2023 at 22:02, Richard Guo  wrote:
> Looking at the codes now I have some concern that what we do in
> create_ordered_paths for partial paths may have already been done in
> generate_useful_gather_paths, especially when query_pathkeys is equal to
> sort_pathkeys.  Not sure if this is a problem.  And the comment there
> mentions generate_gather_paths(), but ISTM we should mention what
> generate_useful_gather_paths has done.

I think you need to write some tests for this. I've managed to come up
with something to get that code to perform a Sort, but I've not
managed to get it to perform an incremental sort.

create or replace function parallel_safe_volatile(a int) returns int
as $$ begin return a; end; $$ parallel safe volatile language plpgsql;
create table abc(a int, b int, c int);
insert into abc select x,y,z from generate_Series(1,100)x,
generate_Series(1,100)y, generate_Series(1,100)z;
set parallel_tuple_cost=0;

Without making those parallel paths, we get:

postgres=# explain select * from abc where a=1 order by
a,b,parallel_safe_volatile(c);
   QUERY PLAN

 Sort  (cost=13391.49..13417.49 rows=10400 width=16)
   Sort Key: b, (parallel_safe_volatile(c))
   ->  Gather  (cost=1000.00..12697.58 rows=10400 width=16)
 Workers Planned: 2
 ->  Parallel Seq Scan on abc  (cost=0.00..11697.58 rows=4333 width=16)
   Filter: (a = 1)
(6 rows)

but with, the plan is:

postgres=# explain select * from abc where a=1 order by
a,b,parallel_safe_volatile(c);
   QUERY PLAN

 Gather Merge  (cost=12959.35..13060.52 rows=8666 width=16)
   Workers Planned: 2
   ->  Sort  (cost=11959.32..11970.15 rows=4333 width=16)
 Sort Key: b, (parallel_safe_volatile(c))
 ->  Parallel Seq Scan on abc  (cost=0.00..11697.58 rows=4333 width=16)
   Filter: (a = 1)
(6 rows)

I added the parallel safe and volatile function so that
get_useful_pathkeys_for_relation() wouldn't include all of the
query_pathkeys.

If you write some tests for this code, it will be useful to prove that
it actually does something, and also that it does not break again in
the future. I don't really want to just blindly copy the pattern used
in 3c6fc5820 for creating incremental sort paths if it's not useful
here. It would be good to see tests that make an Incremental Sort path
using the code you're changing.

Same for the 0003 patch.

I'll mark this as waiting on author while you work on that.

David

Re: Why mark empty pages all visible?

2023-03-28 Thread Peter Geoghegan

On Tue, Mar 28, 2023 at 12:01 PM Andres Freund  wrote:
> I will probably make that argument - so far I was just trying to understand
> the intent of the current code. There aren't really comments explaining why we
> want to mark currently-pinned empty/new pages all-visible and enter them into
> the FSM.

I don't think that not being able to immediately get a cleanup lock on
a page signifies much of anything. I never have.

> Historically we did *not* enter currently pinned empty/new pages into the FSM
> / VM. Afaics that's new as of 44fa84881fff.

Of course that's true, but I don't know why that history is
particularly important. Either way, holding a pin was never supposed
to work as an interlock against a page being concurrently set
all-visible, or having its space recorded in the FSM. It's true that
my work on VACUUM has shaken out a couple of bugs where we
accidentally relied on that being true. But those were all due to the
change in lazy_vacuum_heap_page() made in Postgres 14 -- not the
addition of lazy_scan_new_or_empty() in Postgres 15.

I actually think that I might agree with the substance of much of what
you're saying, but at the same time I don't think that you're defining
the problem in a way that's particularly helpful. I gather that you
*don't* want to do anything about the lazy_vacuum_heap_page behavior
with setting empty pages all-visible (just the lazy_scan_new_or_empty
behavior). So clearly this isn't really about marking empty pages
all-visible, with or without a cleanup lock. It's actually about
something rather more specific: the interactions with
RelationGetBufferForTuple.

I actually agree that VACUUM is way too unconcerned about interfering
with concurrent activity in terms of how it manages free space in the
FSM. But this seems like just about the least important example of
that (outside the context of your RelationGetBufferForTuple work). The
really important case (that VACUUM gets wrong) all involve recently
dead tuples. But I don't think that you want to talk about that right
now. You want to talk about RelationGetBufferForTuple.

> The reason I'm looking at this is that there's a lot of complexity at the
> bottom of RelationGetBufferForTuple(), related to needing to release the lock
> on the newly extended page and then needing to recheck whether there still is
> free space on the page. And that it's not complicated enough
> (c.f. INSERT_FROZEN doing visibilitymap_pin() with the page locked).
>
> As far as I can tell, if we went back to not entering new/empty pages into
> either VM or FSM, we could rely on the page not getting filled while just
> pinning, not locking it.

What you're essentially arguing for is inventing a new rule that makes
the early lifetime of a page (what we currently call a PageIsEmpty()
page, and new pages) special, to avoid interference from VACUUM. I
have made similar arguments myself on quite a few occasions, so I'm
actually sympathetic. I just think that you should own it. And no, I'm
not just reflexively defending my work in 44fa84881fff; I actually
think that framing the problem as a case of restoring a previous
behavior is confusing and ahistorical. If there was a useful behavior
that was lost, then it was quite an accidental behavior all along. The
difference matters because now you have to reconcile what you're
saying with the lazy_vacuum_heap_page no-cleanup-lock behavior added
in 14.

I think that you must be arguing for making the early lifetime of a
heap page special to VACUUM, since AFAICT you want to change VACUUM's
behavior with strictly PageIsEmpty/PageIsNew pages only -- and *not*
with pages that have one remaining LP_UNUSED item, but are otherwise
empty (which could be set all-visible/all-frozen in either the first
or second heap pass, even if we disabled the lazy_scan_new_or_empty()
behavior you're complaining about). You seem to want to distinguish
between very new pages (that also happen to be empty), and old pages
that happen to be empty. Right?

> I also do wonder whether the different behaviour of PageIsEmpty() and
> PageIsNew() pages makes sense. The justification for not marking PageIsNew()
> pages as all-visible holds just as true for empty pages. There's just as much
> free space there.

What you say here makes a lot of sense to me. I'm just not sure what
I'd prefer to do about it.

-- 
Peter Geoghegan

Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-03-28 Thread Corey Huinker

On Tue, Mar 28, 2023 at 2:53 PM Gregory Stark (as CFM) 
wrote:

> On Tue, 3 Jan 2023 at 14:16, Tom Lane  wrote:
> >
> > Vik Fearing  writes:
> >
> > > I don't think we should add that syntax until I do get it through the
> > > committee, just in case they change something.
> >
> > Agreed.  So this is something we won't be able to put into v16;
> > it'll have to wait till there's something solid from the committee.
>
> I guess I'll mark this Rejected in the CF then. Who knows when the SQL
> committee will look at this...
>
> --
> Gregory Stark
> As Commitfest Manager
>

Yes, for now. I'm in touch with the pg-people on the committee and will
resume work when there's something to act upon.

Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-03-28 Thread Corey Huinker

On Tue, Mar 28, 2023 at 3:25 PM Isaac Morland 
wrote:

> On Mon, 19 Dec 2022 at 17:57, Corey Huinker 
> wrote:
>
>>
>> Attached is my work in progress to implement the changes to the CAST()
>> function as proposed by Vik Fearing.
>>
>> CAST(expr AS typename NULL ON ERROR)
>> will use error-safe functions to do the cast of expr, and will return
>> NULL if the cast fails.
>>
>> CAST(expr AS typename DEFAULT expr2 ON ERROR)
>> will use error-safe functions to do the cast of expr, and will return
>> expr2 if the cast fails.
>>
>
> Is there any difference between NULL and DEFAULT NULL?
>

What I think you're asking is: is there a difference between these two
statements:

SELECT CAST(my_string AS integer NULL ON ERROR) FROM my_table;

SELECT CAST(my_string AS integer DEFAULT NULL ON ERROR) FROM my_table;

And as I understand it, the answer would be no, there is no practical
difference. The first case is just a convenient shorthand, whereas the
second case tees you up for a potentially complex expression. Before you
ask, I believe the ON ERROR syntax could be made optional. As I implemented
it, both cases create a default expression which then typecast to integer,
and in both cases that expression would be a const-null, so the optimizer
steps would very quickly collapse those steps into a plain old constant.

Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0.

2023-03-28 Thread Robert Haas

On Mon, Mar 27, 2023 at 4:52 PM Peter Geoghegan  wrote:
> This is fine, as far as it goes. Obviously it fixes the immediate problem.

OK, I've committed and back-patched this fix to v14, just like the
erroneous commit that created the issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Support logical replication of DDLs

2023-03-28 Thread Zheng Li

On Sun, Mar 26, 2023 at 5:22 PM Tom Lane  wrote:

> I spent some time looking through this thread to try to get a sense
> of the state of things, and I came away quite depressed.  The patchset
> has ballooned to over 2MB, which is a couple orders of magnitude
> larger than anyone could hope to meaningfully review from scratch.
> Despite that, it seems that there are fundamental semantics issues
> remaining, not to mention clear-and-present security dangers, not
> to mention TODO comments all over the code.

Thanks for looking into this!

> I'm also less than sold on the technical details, specifically
> the notion of "let's translate utility parse trees into JSON and
> send that down the wire".  You can probably make that work for now,
> but I wonder if it will be any more robust against cross-version
> changes than just shipping the outfuncs.c representation.  (Perhaps
> it can be made more robust than the raw parse trees, but I see no
> evidence that anyone's thought much about how.)

I explored the idea of using the outfuncs.c representation in [1] and
found this existing format is not designed to be portable between
different major versions. So it can't be directly used for replication
without
serious modification. I think the DDL deparser is a necessary tool if
we want to be able to handle cross-version DDL syntax differences by
providing the capability to machine-edit the JSON representation.

> And TBH, I don't think that I quite believe the premise in the
> first place.  The whole point of using logical rather than physical
> replication is that the subscriber installation(s) aren't exactly like
> the publisher.  Given that, how can we expect that automated DDL
> replication is going to do the right thing often enough to be a useful
> tool rather than a disastrous foot-gun?  The more you expand the scope
> of what gets replicated, the worse that problem becomes --- for
> example, I don't buy for one second that "let's replicate roles"
> is a credible solution for the problems that come from the roles
> not being the same on publisher and subscriber.

I agree that a full fledged DDL deparser and DDL replication is too
big of a task for one patch. I think we may consider approaching this
feature in the following ways:
1. Phased development and testing as discussed in other emails.
Probably support table commands first (as they are the most common
DDLs), then the other commands in multiple phases.
2. Provide a subscription option to receive the DDL change, raise a
notice and to skip applying the change. The users can listen to the
DDL notice and implement application logic to apply the change if
needed. The idea is we can start gathering user feedback by providing
a somewhat useful feature (compared to doing nothing about DDLs), but
also avoid heading straight into the potential footgun situation
caused by automatically applying any mal-formatted DDLs.
3. As cross-version DDL syntax differences are expected to be uncommon
(in real workload), maybe we can think about other options to handle
such edge cases instead of fully automating it? For example, what
about letting the user specify how a DDL should be replicated on the
subscriber by explicitly providing two versions of DDL commands in
some way?

> I'm not sure how we get from here to a committable and useful feature,
> but I don't think we're close to that yet, and I'm not sure that minor
> iterations on a 2MB patchset will accomplish much.

About 1 MB of the patch are testing output files for the DDL deparser
(postgres/src/test/modules/test_ddl_deparse_regress/expected/).

Regards,
Zane

[1] 
https://www.postgresql.org/message-id/CAAD30U%2Boi6e6Vh_zAzhuXzkqUhagmLGD%2B_iyn2N9w_sNRKsoag%40mail.gmail.com

Re: Add LZ4 compression in pg_dump

On 3/28/23 18:07, gkokola...@pm.me wrote:
> 
> --- Original Message ---
> On Friday, March 24th, 2023 at 10:30 AM, gkokola...@pm.me  
> wrote:
> 
>>
>> --- Original Message ---
>> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra 
>> tomas.von...@enterprisedb.com wrote:
>>
>>> This leaves the empty-data issue (which we have a fix for) and the
>>> switch to LZ4F. And then the zstd part.
>>
>> Please expect promptly a patch for the switch to frames.
> 
> Please find the expected patch attached. Note that the bulk of the
> patch is code unification, variable renaming to something more
> appropriate, and comment addition. These are changes that are not
> strictly necessary to switch to LZ4F. I do believe that are
> essential for code hygiene after the switch and they do belong
> on the same commit. 
> 

I think the patch is fine, but I'm wondering if the renames shouldn't go
a bit further. It removes references to LZ4File struct, but there's a
bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_
prefix? We don't have GzipFile either.

Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but
then we probably should not define LZ4_compressor_init ...

Also, maybe the comments shouldn't use "File API" when compress_io.c
calls that "Compressed stream API".

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Why mark empty pages all visible?

2023-03-28 Thread Andres Freund

Hi,

On 2023-03-28 13:05:19 -0700, Peter Geoghegan wrote:
> On Tue, Mar 28, 2023 at 12:01 PM Andres Freund  wrote:
> > I will probably make that argument - so far I was just trying to understand
> > the intent of the current code. There aren't really comments explaining why 
> > we
> > want to mark currently-pinned empty/new pages all-visible and enter them 
> > into
> > the FSM.
>
> I don't think that not being able to immediately get a cleanup lock on
> a page signifies much of anything. I never have.

Why is that? It's as clear a signal of concurrent activity on the buffer
you're going to get.

> > Historically we did *not* enter currently pinned empty/new pages into the 
> > FSM
> > / VM. Afaics that's new as of 44fa84881fff.
>
> Of course that's true, but I don't know why that history is
> particularly important.

It's interesting to understand *why* we are doing what we are. I think it'd
make sense to propose changing how things work around this, but I just don't
feel like I have a good enough grasp for why we do some of the things we
do. And given there's not a lot of comments around it and some of the comments
that do exist are inconsistent with themselves, looking at the history seems
like the next best thing?

> I actually think that I might agree with the substance of much of what
> you're saying, but at the same time I don't think that you're defining
> the problem in a way that's particularly helpful.

Likely because the goals of the existing code aren't clear to me. So I don't
feel like I have a firm grasp...

> I gather that you *don't* want to do anything about the
> lazy_vacuum_heap_page behavior with setting empty pages all-visible (just
> the lazy_scan_new_or_empty behavior).

Not in the short-medium term, at least. In the long term I do suspect we might
want to do something about it.  We have a *crapton* of contention in the FSM
and caused by the FSM in bulk workloads. With my relation extension patch
disabling the FSM nearly doubles concurrent load speed.

At the same time, the fact that we might loose knowledge about all the
existing free space in case of promotion or crash and never rediscover that
space (because the pages are frozen), seems decidedly not great.

I don't know what the path forward is, but it seems somewhat clear that we
ought to do something. I suspect having a not crash-safe FSM isn't really
acceptable anymore - it probably is fine to not persist a *reduction* in free
space, but we can't permanently loose track of free space, no matter how many
crashes.

I know that you strongly dislike the way the FSM works, although I forgot some
of the details.

One thing I am fairly certain about is that using the FSM to tell other
backends about newly bulk extended pages is not a good solution, even though
we're stuck with it for the moment.

> I actually agree that VACUUM is way too unconcerned about interfering
> with concurrent activity in terms of how it manages free space in the
> FSM. But this seems like just about the least important example of
> that (outside the context of your RelationGetBufferForTuple work). The
> really important case (that VACUUM gets wrong) all involve recently
> dead tuples. But I don't think that you want to talk about that right
> now. You want to talk about RelationGetBufferForTuple.

That's indeed the background. By now I'd also like to add a few comments
explaining why we do what we currently do, because I don't find all of it
obvious.

> > The reason I'm looking at this is that there's a lot of complexity at the
> > bottom of RelationGetBufferForTuple(), related to needing to release the 
> > lock
> > on the newly extended page and then needing to recheck whether there still 
> > is
> > free space on the page. And that it's not complicated enough
> > (c.f. INSERT_FROZEN doing visibilitymap_pin() with the page locked).
> >
> > As far as I can tell, if we went back to not entering new/empty pages into
> > either VM or FSM, we could rely on the page not getting filled while just
> > pinning, not locking it.
>
> What you're essentially arguing for is inventing a new rule that makes
> the early lifetime of a page (what we currently call a PageIsEmpty()
> page, and new pages) special, to avoid interference from VACUUM. I
> have made similar arguments myself on quite a few occasions, so I'm
> actually sympathetic. I just think that you should own it. And no, I'm
> not just reflexively defending my work in 44fa84881fff; I actually
> think that framing the problem as a case of restoring a previous
> behavior is confusing and ahistorical. If there was a useful behavior
> that was lost, then it was quite an accidental behavior all along. The
> difference matters because now you have to reconcile what you're
> saying with the lazy_vacuum_heap_page no-cleanup-lock behavior added
> in 14.

I really don't have a position to own yet, not on firm enough ground.

> I think that you must be arguing for making the early lifetime of a
> heap page special

Re: zstd compression for pg_dump

2023-03-28 Thread Jacob Champion

On Wed, Mar 15, 2023 at 9:50 PM Justin Pryzby  wrote:
> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote:
> > It looks like pg_dump's meson.build is missing dependencies on zstd
> > (meson couldn't find the headers in the subproject without them).
>
> I saw that this was added for LZ4, but I hadn't added it for zstd since
> I didn't run into an issue without it.  Could you check that what I've
> added works for your case ?

I thought I replied to this, sorry -- your newest patchset builds
correctly with subprojects, so the new dependency looks good to me.
Thanks!

> > Hm. Best I can tell, the CloneArchive() machinery is supposed to be
> > handling safety for this, by isolating each thread's state. I don't feel
> > comfortable pronouncing this new addition safe or not, because I'm not
> > sure I understand what the comments in the format-specific _Clone()
> > callbacks are saying yet.
>
> My line of reasoning for unix is that pg_dump forks before any calls to
> zstd.  Nothing zstd does ought to affect the pg_dump layer.  But that
> doesn't apply to pg_dump under windows.  This is an opened question.  If
> there's no solid answer, I could disable/ignore the option (maybe only
> under windows).

To (maybe?) move this forward a bit, note that pg_backup_custom's
_Clone() function makes sure that there is no active compressor state
at the beginning of the new thread. pg_backup_directory's
implementation has no such provision. And I don't think it can,
because the parent thread might have concurrently set one up -- see
the directory-specific implementation of _CloseArchive(). Perhaps we
should just NULL out those fields after the copy, instead?

To illustrate why I think this is tough to characterize: if I've read
the code correctly, the _Clone() and CloneArchive() implementations
are running concurrently with code that is actively modifying the
ArchiveHandle and the lclContext. So safety is only ensured to the
extent that we keep track of which fields threads are allowed to
touch, and I don't have that mental model.

--Jacob

Re: Add pg_walinspect function with block info columns

On Tue, Mar 28, 2023 at 11:15:17AM -0700, Peter Geoghegan wrote:
> On Mon, Mar 27, 2023 at 7:47 PM Bharath Rupireddy
>  wrote:
>> Hm, agreed. Changed in the attached v7-0002 patch. We can as well
>> write a case statement in the create function SQL to output forkname
>> instead forknumber, but I'd stop doing that to keep in sync with
>> pg_buffercache.
> 
> I just don't see much value in any textual representation of fork
> name, however generated. In practice it's just not adding very much
> useful information. It is mostly useful as a way of filtering block
> references, which makes simple integers more natural.

I disagree with this argument.  Personally, I have a *much* better
experience with textual representation because there is no need to
cross-check the internals of the code in case you don't remember what
a given number means in an enum or in a set of bits, especially if
you're in a hurry of looking at a production or customer deployment.
In short, it makes for less mistakes because you don't have to think
about some extra mapping between some integers and what they actually
mean through text.  The clauses you'd apply for a group by on the
forks, or for a filter with IN clauses don't change, they're just made
easier to understand for the common user, and that includes
experienced people.  We'd better think about that like the text[]
arrays we use for the flag values, like the FPI flags, or why we've
introduced text[] for the HEAP_* flags in the heap functions of
pageinspect.

There's even more consistency with pageinspect in using a fork name,
where we can pass down a fork name to get a raw page.
--
Michael

signature.asc
Description: PGP signature

Re: running logical replication as the subscription owner

On Tue, 2023-03-28 at 13:55 -0400, Robert Haas wrote:
> Oh, interesting. I hadn't realized we were doing that, and I do think
> that narrows the vulnerability quite a bit.

It's good to know I wasn't missing something obvious.

> bsent logical replication, you don't really need to worry
> about whether that function is written securely -- it will run with
> privileges of the person performing the DML, and not impact your
> account at all.

That's not strictly true. See the example at the bottom of this email.

The second trigger is SECURITY INVOKER, and it captures the leaked
secret number that was stored in the tuple by an earlier SECURITY
DEFINER trigger.

Perhaps I'm being pedantic, but my point is that SECURITY INVOKER is
not a magical shield that protects users from themselves without bound.
It protects users against some kinds of attacks, sure, but there's a
limit to what it has to offer.

>  They have reason to be scared of you, but not the
> reverse. However, if the people doing DML on the table can arrange to
> perform that DML as you, then any security vulnerabilities in your
> function potentially allow them to compromise your account.

OK, but I'd like to be clear that you've moved from your prior
statement:

  "in theory they might be able to protect themselves, but in practice
they are unlikely to be careful enough"

To something very different, where it seems that we can't think of a
concrete problem even for not-so-careful users.

The dangerous cases seem to be something along the lines of a security-
invoker trigger function that builds and executes arbirary SQL based on
the row contents. And then the table owner would then still need to set
ENABLE ALWAYS TRIGGER.

Do we really want to take that case on as our security responsibility?

> It would also affect someone who uses a default
> expression or other means of associating executable code with the
> table, and something like a default expression doesn't require any
> explicit setting to make it apply in case of replication, so maybe
> the
> risk of someone messing up is a bit higher.

Perhaps, but I still don't understand that case. Unless I'm missing
something, the table owner would have to write a pretty weird default
expression or check constraint or whatever to end up with something
dangerous.

> But this definitely makes it more of a judgment call than I thought
> initially.

And I'm fine if the judgement is that we just require SET ROLE to be
conservative and make sure we don't over-promise in 16. That's a
totally reasonable thing: it's easier to loosen restrictions later than
to tighten them. Furthermore, just because you and I can't think of
exploitable problems doesn't mean they don't exist.

I have found this discussion enlightening, so thank you for going
through these details with me.

> I'm still inclined to leave the patch checking for SET ROLE

+0.5. I want the patch to go in either way, and can carry on the
discussion later and improve things more for 17.

But I want to say generally that I believe we spend too much effort
trying to protect unreasonable users from themselves, which interferes
with our ability to protect reasonable users and solve real use cases.

> However, there might be an argument
> that we ought to do something else, like have a REPLICATE privilege

Sounds cool. Not sure I have an opinion yet, but probably 17 material.

> What I would like to change in the
> patch in this release is to add a new subscription property along the
> lines of what I proposed to Jelte in my earlier email: let's provide
> an escape hatch that turns off the user-switching behavior.

I believe switching to the table owner (as done in your patch) is the
right best practice, and should be the default.

I'm fine with an escape hatch here to ease migrations or whatever. But
please do it in an unobtrusive way such that we (as hackers) can mostly
forget that the non-switching behavior exists. At least for me, our
system is already pretty complex without two kinds of subscription
security models.

Regards,
Jeff Davis

- example follows --

  -- user foo
  CREATE TABLE secret(I INT);
  INSERT INTO secret VALUES(42);
  CREATE TABLE r(i INT, secret INT);
  CREATE OR REPLACE FUNCTION a_func() RETURNS TRIGGER
SECURITY DEFINER LANGUAGE plpgsql AS
  $$
BEGIN
   SELECT i INTO NEW.secret FROM secret;
   RETURN NEW;
END;
  $$;
  CREATE OR REPLACE FUNCTION z_func() RETURNS TRIGGER
SECURITY INVOKER LANGUAGE plpgsql AS
  $$
BEGIN
  IF NEW.secret <> abs(NEW.secret) THEN
RAISE EXCEPTION 'no negative secret numbers allowed';
  END IF;
  RETURN NEW;
END;
  $$;
  CREATE TRIGGER a_trigger BEFORE INSERT ON r
FOR EACH ROW EXECUTE PROCEDURE a_func();
  CREATE TRIGGER z_trigger BEFORE INSERT ON r
FOR EACH ROW EXECUTE PROCEDURE z_func();
  GRANT INSERT ON r TO bar;
  GRANT USAGE ON SCHEMA foo TO bar;

  -- user bar
  CREATE OR REPLACE FUNCTION bar.abs(i INT4) RETURNS INT4
LA

Re: Request for comment on setting binary format output per session

On Tue, 2023-03-28 at 10:22 -0400, Dave Cramer wrote:
> OK, IIUC what you are proposing here is that there would be a
> separate pool for 
> database, user, and OIDs. This doesn't seem too flexible. For
> instance if I create a UDT and then want it to be returned 
> as binary then I have to reconfigure the pool to be able to accept a
> new list of OID's.

There are two ways that I could imagine the connection pool working:

1. Accept whatever clients connect, and pass along the binary_formats
setting to the outbound (server) connection. The downside here is that
if you have many different clients (or different versions) that have
different binary_formats settings, then it creates too many pools and
doesn't share well enough.

2. Some kind of configuration setting (or maybe it can be done
automatically) that organizes based on a common subset of binary
formats that many clients can understand.

These can evolve once the protocol extension is in place.

Regards,
Jeff Davis

Re: Why mark empty pages all visible?

2023-03-28 Thread Peter Geoghegan

On Tue, Mar 28, 2023 at 3:29 PM Andres Freund  wrote:
> Why is that? It's as clear a signal of concurrent activity on the buffer
> you're going to get.

Not immediately getting a cleanup lock in VACUUM is informative to the
extent that you only care about what is happening that very
nanosecond. If you look at which pages it happens to in detail, what
you seem to end up with is a whole bunch of noise, which (on its own)
tells you exactly nothing about what VACUUM really ought to be doing
with those pages. In almost all cases we could get a cleanup lock by
waiting for one millisecond and retrying.

I suspect that the cleanup lock thing might be a noisy, unreliable
proxy for the condition that you actually care about, in the context
of your work on relation extension. I bet there is a better signal to
go on, if you look for one.

> It's interesting to understand *why* we are doing what we are. I think it'd
> make sense to propose changing how things work around this, but I just don't
> feel like I have a good enough grasp for why we do some of the things we
> do. And given there's not a lot of comments around it and some of the comments
> that do exist are inconsistent with themselves, looking at the history seems
> like the next best thing?

I think that I know why Heikki had no comments about PageIsEmpty()
pages when the VM first went in, back in 2009: because it was just so
obvious that you'd treat them the same as any other initialized page,
it didn't seem to warrant a comment at all. The difference between a
page with 0 tuples and 1 tuple is the same difference between a page
with 1 tuple and a page with 2 tuples. A tiny difference (one extra
tuple), of no particular consequence.

I think that you don't see it that way now because you're focussed on
the hio.c view of things. That code looked very different back in
2009, and in any case is very far removed from vacuumlazy.c.

I can tell you what I was thinking of with lazy_scan_new_or_empty: I
hate special cases. I will admit to being a zealot about it.

> > I gather that you *don't* want to do anything about the
> > lazy_vacuum_heap_page behavior with setting empty pages all-visible (just
> > the lazy_scan_new_or_empty behavior).
>
> Not in the short-medium term, at least. In the long term I do suspect we might
> want to do something about it.  We have a *crapton* of contention in the FSM
> and caused by the FSM in bulk workloads. With my relation extension patch
> disabling the FSM nearly doubles concurrent load speed.

I've seen the same effect myself. There is no question that that's a
big problem.

I think that the problem is that the system doesn't have any firm
understanding of pages as things that are owned by particular
transactions and/or backends, at least to a limited, scoped extent.
It's all really low level, when it actually needs to be high level and
take lots of context that comes from the application into account.

> At the same time, the fact that we might loose knowledge about all the
> existing free space in case of promotion or crash and never rediscover that
> space (because the pages are frozen), seems decidedly not great.

Unquestionably.

> I don't know what the path forward is, but it seems somewhat clear that we
> ought to do something. I suspect having a not crash-safe FSM isn't really
> acceptable anymore - it probably is fine to not persist a *reduction* in free
> space, but we can't permanently loose track of free space, no matter how many
> crashes.

Strongly agreed. It's a terrible false economy. If we bit the bullet
and made relation extension and the FSM crash safe, we'd gain so much
more than we'd lose.

> One thing I am fairly certain about is that using the FSM to tell other
> backends about newly bulk extended pages is not a good solution, even though
> we're stuck with it for the moment.

Strongly agreed.

> > I think that you must be arguing for making the early lifetime of a
> > heap page special to VACUUM, since AFAICT you want to change VACUUM's
> > behavior with strictly PageIsEmpty/PageIsNew pages only -- and *not*
> > with pages that have one remaining LP_UNUSED item, but are otherwise
> > empty (which could be set all-visible/all-frozen in either the first
> > or second heap pass, even if we disabled the lazy_scan_new_or_empty()
> > behavior you're complaining about). You seem to want to distinguish
> > between very new pages (that also happen to be empty), and old pages
> > that happen to be empty. Right?
>
> I think that might be worthwhile, yes. The retry code in
> RelationGetBufferForTuple() is quite hairy and almost impossible to test. If
> we can avoid the complexity, at a fairly bound cost (vacuum needing to
> re-visit new/empty pages if they're currently pinned), it'd imo be more that
> worth the price.

Short term, you could explicitly say that PageIsEmpty() means that the
page is qualitatively different to other empty pages that were left
behind by VACUUM's second phase.

> But perhaps the better p

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

On Tue, Mar 28, 2023 at 05:43:26PM +0900, Kyotaro Horiguchi wrote:
> No. Fine by me, except that "block read requests" seems to suggest
> kernel read() calls, maybe because it's not clear whether "block"
> refers to our buffer blocks or file blocks to me.. If it is generally
> clear, I'm fine with the proposal.

Okay.  Would somebody like to draft a patch?
--
Michael


signature.asc
Description: PGP signature

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

On Tue, Mar 28, 2023 at 02:56:28PM +0900, Michael Paquier wrote:
> Hmm.  This is a good point.  It is true that the patch feels
> incomplete on this side.  I don't see why we could not be flexible,
> and allow a value of 0 in a partitioned table's relam to mean that we
> would pick up the database default in this case when a partition is
> is created on it.  This would have the advantage to be consistent with
> older versions where we fallback on the default.  We cannot be
> completely consistent with the reltablespace of the leaf partitions
> unfortunately, as relam should always be set if a relation has
> storage.  And allowing a value of 0 means that there are likely other
> tricky cases with dumps?
> 
> Another thing: would it make sense to allow an empty string in
> default_table_access_method so as we'd always fallback to a database
> default?

FYI, I am not sure that I will be able to look more at this patch by
the end of the commit fest, and there are quite more points to
consider.  Perhaps at this stage we'd better mark it as returned with
feedback?  I understand that I've arrived late at this party :/
--
Michael


signature.asc
Description: PGP signature

Re: Should vacuum process config file reload more often

2023-03-28 Thread Melanie Plageman

On Tue, Mar 28, 2023 at 4:21 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 27 Mar 2023 14:12:03 -0400, Melanie Plageman 
>  wrote in
> > So, I've attached an alternate version of the patchset which takes the
> > approach of having one commit which only enables cost-based delay GUC
> > refresh for VACUUM and another commit which enables it for autovacuum
> > and makes the changes to balancing variables.
> >
> > I still think the commit which has workers updating their own
> > wi_cost_delay in vacuum_delay_point() is a bit weird. It relies on no one
> > else emulating our bad behavior and reading from wi_cost_delay without a
> > lock and on no one else deciding to ever write to wi_cost_delay (even
> > though it is in shared memory [this is the same as master]). It is only
> > safe because our process is the only one (right now) writing to
> > wi_cost_delay, so when we read from it without a lock, we know it isn't
> > being written to. And everyone else takes a lock when reading from
> > wi_cost_delay right now. So, it seems...not great.
> >
> > This approach also introduces a function that is only around for
> > one commit until the next commit obsoletes it, which seems a bit silly.
>
> (I'm not sure what this refers to, though..) I don't think it's silly
> if a later patch removes something that the preceding patches
> introdcued, as long as that contributes to readability.  Untimately,
> they will be merged together on committing.

I was under the impression that reviewers thought config reload and
worker balance changes should be committed in separate commits.

Either way, the ephemeral function is not my primary concern. I felt
more uncomfortable with increasing how often we update a double in
shared memory which is read without acquiring a lock.

> > Basically, I think it is probably better to just have one commit
> > enabling guc refresh for VACUUM and then another which correctly
> > implements what is needed for autovacuum to do the same.
> > Attached v9 does this.
> >
> > I've provided both complete versions of both approaches (v9 and v8).
>
> I took a look at v9 and have a few comments.
>
> 0001:
>
> I don't believe it is necessary, as mentioned in the commit
> message. It apperas that we are resetting it at the appropriate times.

VacuumCostBalance must be zeroed out when we disable vacuum cost.
Previously, we did not reenable VacuumCostActive once it was disabled,
but now that we do, I think it is good practice to always zero out
VacuumCostBalance when we disable vacuum cost. I will squash this commit
into the one introducing VacuumCostInactive, though.

>
> 0002:
>
> I felt a bit uneasy on this. It seems somewhat complex (and makes the
> succeeding patches complex),

Even if we introduced a second global variable to indicate that failsafe
mode has been engaged, we would still require the additional checks
of VacuumCostInactive.

> has confusing names,

I would be happy to rename the values of the enum to make them less
confusing. Are you thinking "force" instead of "locked"?
maybe:
VACUUM_COST_FORCE_INACTIVE and
VACUUM_COST_INACTIVE
?

> and doesn't seem like self-contained.

By changing the variable from VacuumCostActive to VacuumCostInactive, I
have kept all non-vacuum code from having to distinguish between it
being inactive due to failsafe mode or due to user settings.

> I think it'd be simpler to add a global boolean (maybe
> VacuumCostActiveForceDisable or such) that forces VacuumCostActive to
> be false and set VacuumCostActive using a setter function that follows
> the boolean.

I think maintaining an additional global variable is more brittle than
including the information in a single variable.

> 0003:
>
> +* Reload the configuration file if requested. This allows changes to
> +* vacuum_cost_limit and vacuum_cost_delay to take effect while a 
> table is
> +* being vacuumed or analyzed. Analyze should not reload configuration
> +* file if it is in an outer transaction, as GUC values shouldn't be
> +* allowed to refer to some uncommitted state (e.g. database objects
> +* created in this transaction).
>
> I'm not sure GUC reload is or should be related to transactions. For
> instance, work_mem can be changed by a reload during a transaction
> unless it has been set in the current transaction. I don't think we
> need to deliberately suppress changes in variables caused by realods
> during transactions only for analzye. If analyze doesn't like changes
> to certain GUC variables, their values should be snapshotted before
> starting the process.

Currently, we only reload the config file in top-level statements. We
don't reload the configuration file from within a nested transaction
command. BEGIN starts a transaction but not a transaction command. So
BEGIN; ANALYZE; probably wouldn't violate this rule. But it is simpler
to just forbid reloading when it is not a top-level transaction command.
I have updated the comment to reflect this.

> 0004:
> -

Re: Add LZ4 compression in pg_dump