date:20230309

Re: Track IO times in pg_stat_io

2023-03-09 Thread Drouvot, Bertrand


Hi,

On 3/9/23 1:34 AM, Andres Freund wrote:

Hi,

On 2023-03-08 12:55:34 +0100, Drouvot, Bertrand wrote:

On 3/7/23 7:47 PM, Andres Freund wrote:

On 2023-03-07 13:43:28 -0500, Melanie Plageman wrote:
No, I don't think we can do that. It can be enabled on a per-session basis.


Oh right. So it's even less clear to me to get how one would make use of those 
new *_time fields, given that:

- pg_stat_io is "global" across all sessions. So, even if one session is doing some 
"testing" and needs to turn track_io_timing on, then it
is even not sure it's only reflecting its own testing (as other sessions may 
have turned it on too).


I think for 17 we should provide access to per-existing-connection pg_stat_io
stats, and also provide a database aggregated version. Neither should be
particularly hard.



+1 that would be great.


I don't think it's particularly useful to use the time to calculate "per IO"
costs - they can vary *drastically* due to kernel level buffering.


Exactly and I think that's the reason why it could be useful. I think that 
could help (with frequent enough sampling)
to try to identify when the IOs are served by the page cache or not (if one 
knows his infra well enough).

One could say (for example, depending on his environment) that if the read_time 
> 4ms then the IO is served by spindle disks (if any)
and if <<< ms then by the page cache.

What I mean is that one could try to characterized their IOs based on threshold 
that they could define.

Adding/reporting histograms in the game would be even better: something we 
could look for for 17?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pg_upgrade and logical replication

2023-03-09 Thread Julien Rouhaud

Hi,

On Thu, Mar 09, 2023 at 12:05:36PM +0530, Amit Kapila wrote:
> On Wed, Mar 8, 2023 at 12:26 PM Julien Rouhaud  wrote:
> >
> > Is there something that can be done for pg16? I was thinking that having a
> > fix for the normal and easy case could be acceptable: only allowing
> > pg_upgrade to optionally, and not by default, preserve the subscription
> > relations IFF all subscriptions only have tables in ready state. Different
> > states should be transient, and it's easy to check as a user beforehand and
> > also easy to check during pg_upgrade, so it seems like an acceptable
> > limitations (which I personally see as a good sanity check, but YMMV). It
> > could be lifted in later releases if wanted anyway.
> >
> > It's unclear to me whether this limited scope would also require to
> > preserve the replication origins, but having looked at the code I don't
> > think it would be much of a problem as the local LSN doesn't have to be
> > preserved.
> >
>
> I think we need to preserve replication origins as they help us to
> determine the WAL location from where to start the streaming after the
> upgrade. If we don't preserve those then from which location will the
> subscriber start streaming?

It would start from the slot's information on the publisher side, but I guess
there's no guarantee that this will be accurate in all cases.

> We don't want to replicate the WAL which
> has already been sent.

Yeah I agree.  I added support to also preserve the subscription's replication
origin information, a new --preserve-subscription-state (better naming welcome)
documented option for pg_upgrade to optionally ask for this new mode, and a
similar (but undocumented) option for pg_dump that only works with
--binary-upgrade and added a check in pg_upgrade that all relations are in 'r'
(ready) mode.  Patch v2 attached.
>From 0a77ac305243e0f58dbfce6bb7c8cf062b45d4f4 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 22 Feb 2023 09:19:32 +0800
Subject: [PATCH v2] Optionally preserve the full subscription's state during
 pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

Similarly, the subscription's replication origin are needed to ensure
that we don't replicate anything twice.

To fix this problem, this patch teaches pg_dump in binary upgrade mode to emit
additional commands to be able to restore the content of pg_subscription_rel,
and addition LSN parameter in the subscription creation to restore the
underlying replication origin remote LSN.  The LSN parameter is only accepted
in CREATE SUBSCRIPTION in binary upgrade mode.

The new ALTER SUBSCRIPTION subcommand, usable only during binary upgrade, has
the following syntax:

ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 'X/Y'])

The relation is identified by its oid, as it's preserved during pg_upgrade.
The lsn is optional, and defaults to NULL / InvalidXLogRecPtr.

This mode is optional and not enabled by default.  A new
--preserve-subscription-state option is added to pg_upgrade to use it.  For
now, pg_upgrade will check that all the subscription relations are in 'r'
(ready) state, and will error out if any subscription relation in any database
has a different state, logging the list of problematic databases with the
number of problematic relation in each.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
---
 doc/src/sgml/ref/pgupgrade.sgml |  13 +++
 src/backend/commands/subscriptioncmds.c |  67 +-
 src/backend/parser/gram.y   |  11 +++
 src/bin/pg_dump/pg_backup.h |   2 +
 src/bin/pg_dump/pg_dump.c   | 114 +++-
 src/bin/pg_dump/pg_dump.h   |  13 +++
 src/bin/pg_upgrade/check.c  |  54 +++
 src/bin/pg_upgrade/dump.c   |   3 +-
 src/bin/pg_upgrade/option.c |   7 ++
 src/bin/pg_upgrade/pg_upgrade.h |   1 +
 src/include/nodes/parsenodes.h  |   3 +-
 11 files changed, 283 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c685..aef3b8a8b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -240,6 +240,19 @@ PostgreSQL documentation
   
  
 
+ 
+  --preserve-subscription-state
+  
+   
+Fully preserve the logical subscription state if any.  That include the
+underlying replication origin with their remote LSN

Re: ICU locale validation / canonicalization

2023-03-09 Thread Peter Eisentraut


On 28.02.23 06:57, Jeff Davis wrote:

On Mon, 2023-02-20 at 15:23 -0800, Jeff Davis wrote:


New patch attached. The new patch also includes a GUC that (when
enabled) validates that the collator is actually found.


New patch attached.

Now it always preserves the exact locale string during pg_upgrade, and
does not attempt to canonicalize it. Before it was trying to be clever
by determining if the language tag was finding the same collator as the
original string -- I didn't find a problem with that, but it just
seemed a bit too clever. So, only newly-created locales and databases
have the ICU locale string canonicalized to a language tag.

Also, I added a SQL function pg_icu_language_tag() that can convert
locale strings to language tags, and check whether they exist or not.


This patch appears to do about three things at once, and it's not clear 
exactly where the boundaries are between them and which ones we might 
actually want.  And I think the terminology also gets mixed up a bit, 
which makes following this harder.


1. Canonicalizing the locale string.  This is presumably what 
uloc_canonicalize() does, which the patch doesn't actually use.  What 
are examples of what this does?  Does the patch actually do this?


2. Converting the locale string to BCP 47 format.  This converts 
'de@collation=phonebook' to 'de-u-co-phonebk'.  This is what 
uloc_getLanguageTag() does.


3. Validating the locale string, to reject faulty input.

What are the relationships between these?

I don't understand how the validation actually happens in your patch. 
Does uloc_getLanguageTag() do the validation also?


Can you do canonicalization without converting to language tag?

Can you do validation of un-canonicalized locale names?

What is the guidance for the use of the icu_locale_validation GUC?

The description throws in yet another term: "validates that ICU locale 
strings are well-formed".  What is "well-formed"?  How does that relate 
to the other concepts?


Personally, I'm not on board with this behavior:

=> CREATE COLLATION test (provider = icu, locale = 
'de@collation=phonebook');
NOTICE:  0: using language tag "de-u-co-phonebk" for locale 
"de@collation=phonebook"


I mean, maybe that is a thing we want to do somehow sometime, to migrate 
people to the "new" spellings, but the old ones aren't wrong.  So this 
should be a separate consideration, with an option, and it would require 
various updates in the documentation.  It also doesn't appear to address 
how to handle ICU before version 54.


But, see earlier questions, are these three things all connected somehow?

Re: Allow tests to pass in OpenSSL FIPS mode

2023-03-09 Thread Peter Eisentraut


On 08.03.23 10:37, Daniel Gustafsson wrote:

The comment in question was missed in 55fe26a4b58, but I agree that it's a
false claim given the OpenSSL implementation so removing or at least mimicking
the comments in cryptohash_openssl.c would be better.


I have fixed these comments to match cryptohash_openssl.c.

Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-03-09 Thread Alvaro Herrera

On 2023-Jan-22, Karl O. Pinc wrote:

> Actually, this CSS, added to doc/src/sgml/stylesheet.css,
> makes the column spacing look pretty good:
> 
> /* Adequate spacing between columns in a simplelist non-inline table */
> .simplelist td { padding-left: 2em; padding-right: 2em; }

Okay, this looks good to me too.  However, for it to actually work, we
need to patch the corresponding CSS file in the pgweb repository too.
I'll follow up in the other mailing list.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-03-09 Thread Kyotaro Horiguchi

At Thu, 9 Mar 2023 11:00:46 +0530, Amit Kapila  wrote 
in 
> On Wed, Mar 8, 2023 at 9:20 AM Masahiko Sawada  wrote:
> >
> > On Wed, Mar 8, 2023 at 3:30 AM Nathan Bossart  
> > wrote:
> > >
> >
> > >  IMO the current set of
> > > trade-offs (e.g., unavoidable bloat and WAL buildup) would make this
> > > feature virtually unusable for a lot of workloads, so it's probably worth
> > > exploring an alternative approach.
> >
> > It might require more engineering effort for alternative approaches
> > such as one I proposed but the feature could become better from the
> > user perspective. I also think it would be worth exploring it if we've
> > not yet.
> >
> 
> Fair enough. I think as of now most people think that we should
> consider alternative approaches for this feature. The two ideas at a

If we can notify subscriber of the transaction start time, will that
solve the current problem?  If not, or if it is not possible, +1 to
look for other solutions.

> high level are that the apply worker itself first flushes the decoded
> WAL (maybe only when time-delay is configured) or have a separate
> walreceiver process as we have for standby. I think we need to analyze
> the pros and cons of each of those approaches and see if they would be
> useful even for other things on the apply side.

My understanding of the requirements here is that the publisher should
not hold changes, the subscriber should not hold data reads, and all
transactions including two-phase ones should be applied at once upon
committing.  Both sides need to respond to the requests from the other
side.  We expect apply-delay of several hours or more. My thoughts
considering the requirements are as follows:

If we expect delays of several hours or more, I don't think it's
feasible to stack received changes in the process memory. So, if
apply-delay is in effect, I think it would be better to process
transactions through files regardless of process configuration.

I'm not sure whether we should have a separate process for protocol
processing. On one hand, it would simplify the protocol processing
part, but on the other hand, changes would always have to be applied
through files.  If we plan to integrate the paths with and without
apply-delay by the file-passing method, this might work. If we want to
maintain the current approach when not applying apply-delay, I think
we would have to implement it in a single process, but I feel the
protocol processing part could become complicated.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Move defaults toward ICU in 16?

2023-03-09 Thread Peter Eisentraut


On 08.03.23 06:55, Jeff Davis wrote:

On Fri, 2023-03-03 at 21:45 -0800, Jeff Davis wrote:

    0002: update template0 in new cluster (as described above)


I think 0002 is about ready and I plan to commit it soon unless someone
has more comments.


0002 seems fine to me.


I'm holding off on 0001 for now, because you objected. But I still
think 0001 is a good idea so I'd like to hear more before I withdraw
it.


Let's come back to that after dealing with the other two.


    0003: default initdb to use ICU


This is also about ready, and I plan to commit this soon after 0002.


This seems mostly ok to me.  I have a few small comments.

+default, ICU obtains the ICU locale from the ICU default collator.

This seems to be a fancy way of saying, the default ICU locale will be 
set to something that approximates what you have set your operating 
system to.  Which is what we want, I think.  Can we say this in a more 
user-friendly way?


+static void
+check_icu_locale()

should be check_icu_locale(void)

+   if (U_ICU_VERSION_MAJOR_NUM >= 54)
+   {

If we're going to add more of these mysterious version checks, could we 
add a comment about what they are for?


However, I suspect what this chunk is doing is some sort of 
canonicalization/language-tag conversion, which per the other thread, I 
have some questions about.


How about for this patch, we skip this part and just do the else branch

+   icu_locale = pg_strdup(default_locale);

and then put the canonicalization business into the canonicalization 
patch set?

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-03-09 Thread Amit Kapila

On Thu, Mar 9, 2023 at 2:56 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 9 Mar 2023 11:00:46 +0530, Amit Kapila  
> wrote in
> > On Wed, Mar 8, 2023 at 9:20 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Mar 8, 2023 at 3:30 AM Nathan Bossart  
> > > wrote:
> > > >
> > >
> > > >  IMO the current set of
> > > > trade-offs (e.g., unavoidable bloat and WAL buildup) would make this
> > > > feature virtually unusable for a lot of workloads, so it's probably 
> > > > worth
> > > > exploring an alternative approach.
> > >
> > > It might require more engineering effort for alternative approaches
> > > such as one I proposed but the feature could become better from the
> > > user perspective. I also think it would be worth exploring it if we've
> > > not yet.
> > >
> >
> > Fair enough. I think as of now most people think that we should
> > consider alternative approaches for this feature. The two ideas at a
>
> If we can notify subscriber of the transaction start time, will that
> solve the current problem?
>

I don't think that will solve the current problem because the problem
is related to confirming back the flush LSN (commit LSN) to the
publisher which we do only after we commit the delayed transaction.
Due to this, we are not able to advance WAL(restart_lsn)/XMIN on the
publisher which causes an accumulation of WAL and does not allow the
vacuum to remove deleted rows. Do you have something else in mind
which makes you think that it can solve the problem?

-- 
With Regards,
Amit Kapila.

Re: improving user.c error messages

2023-03-09 Thread Peter Eisentraut


On 20.02.23 23:58, Nathan Bossart wrote:

Similarly -- this is an existing issue but we might as well look at it -- in
something like

 must be superuser or a role with privileges of the
 pg_write_server_files role

the phrase "a role with the privileges of that other role" seems ambiguous.
Doesn't it really mean you must be a member of that role?


Membership alone is not sufficient.  You must also inherit the privileges
of the role via the INHERIT option.  I thought about making this something
like

must have the INHERIT option on role %s

but I'm not sure that's accurate either.  That wording makes it sound lіke
you need to be granted membership to the role directly WITH INHERIT OPTION,
but what you really need is membership, direct or indirect, with an INHERIT
chain up to the role in question.  However, it looks like "must have the
ADMIN option on role %s" is used to mean something similar, so perhaps I am
overthinking it.


For now, I've reworded these as "must inherit privileges of".


I don't have a good mental model of all this role inheritance, 
personally, but I fear that this change makes the messages more jargony 
and less clear.  Maybe the original wording was good enough.


A couple of other thoughts:

"admin option" is sort of a natural language term, I think, so we don't 
need to parametrize it as "%s option".  Also, there are no other 
"options" in this context, I think.


A general thought: It seems we currently don't have any error messages 
that address the user like "You must do this".  Do we want to go there? 
Should we try for a more impersonal wording like


"You must have the %s attribute to create roles."

"Current user must have the %s attribute to create roles."

"%s attribute is required to create roles."

By the way, I'm not sure what the separation between 0001 and 0002 is 
supposed to be.

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı

Hi Peter,



> 1.
> In my previous review [1] (comment #1) I wrote that only some of the
> "should" were misleading and gave examples where to change. But I
> didn't say that *every* usage of that word was wrong, so your global
> replace of "should" to "must" has modified a couple of places in
> unexpected ways.
>
> Details are in subsequent review comments below -- see #2b, #3, #5.
>

Ah, that was my mistake. Thanks for thorough review on this.


>
> ==
> doc/src/sgml/logical-replication.sgml
>
> 2.
> A published table must have a “replica identity” configured in order
> to be able to replicate UPDATE and DELETE operations, so that
> appropriate rows to update or delete can be identified on the
> subscriber side. By default, this is the primary key, if there is one.
> Another unique index (with certain additional requirements) can also
> be set to be the replica identity. If the table does not have any
> suitable key, then it can be set to replica identity “full”, which
> means the entire row becomes the key. When replica identity “full” is
> specified, indexes can be used on the subscriber side for searching
> the rows. Candidate indexes must be btree, non-partial, and have at
> least one column reference (i.e. cannot consist of only expressions).
> These restrictions on the non-unique index properties adheres to some
> of the restrictions that are enforced for primary keys. Internally, we
> follow a similar approach for supporting index scans within logical
> replication scope. If there are no such suitable indexes, the search
> on the subscriber side can be very inefficient, therefore replica
> identity “full” must only be used as a fallback if no other solution
> is possible. If a replica identity other than “full” is set on the
> publisher side, a replica identity comprising the same or fewer
> columns must also be set on the subscriber side. See REPLICA IDENTITY
> for details on how to set the replica identity. If a table without a
> replica identity is added to a publication that replicates UPDATE or
> DELETE operations then subsequent UPDATE or DELETE operations will
> cause an error on the publisher. INSERT operations can proceed
> regardless of any replica identity.
>
> ~~
>
> 2a.
> My previous review [1] (comment #2)  was not fixed quite as suggested.
>
> Please change:
> "adheres to" --> "adhere to"
>
>
Oops, it seems I only got the "to" part of your suggestion and missed "s".

Done now.


> ~~
>
> 2b. should/must
>
> This should/must change was OK as it was before, because here it is only
> advice.
>
> Please change this back how it was:
> "must only be used as a fallback" --> "should only be used as a fallback"
>

Thanks, changed.


>
> ==
> src/backend/executor/execReplication.c
>
> 3. build_replindex_scan_key
>
>  /*
>   * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key'
> that
>   * is setup to match 'rel' (*NOT* idxrel!).
>   *
> - * Returns whether any column contains NULLs.
> + * Returns how many columns must be used for the index scan.
> + *
>
> ~
>
> This should/must change does not seem quite right.
>
> SUGGESTION (reworded)
> Returns how many columns to use for the index scan.
>

Fixed.

(I wish we had a simpler process to incorporate such
comments.)



>
> ~~~
>
> 4. build_replindex_scan_key
>
> >
> > Based on the discussions below, I kept as-is. I really don't want to do
> unrelated
> > changes in this patch, as I also got several feedback for not doing it,
> >
>
> Hmm, although this code pre-existed I don’t consider this one as
> "unrelated changes" because the patch introduced the new "if
> (!AttributeNumberIsValid(table_attno))" which changed things.  As I
> wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
> assignment *after* the potential 'continue' otherwise there is every
> chance that the assignment is just redundant. And if you move the
> assignment where it belongs, then you might as well declare the
> variable in the more appropriate place at the same time – i.e. with
> 'opfamily' declaration. Anyway, I've given my reason a couple of times
> now, so if you don't want to change it I won't about it debate
> anymore.
>

Alright, given both you and Amit [1] agree on this, I'll follow that.



>
> ==
> src/backend/replication/logical/relation.c
>
> 5. FindUsableIndexForReplicaIdentityFull
>
> + * XXX: There are no fundamental problems for supporting non-btree
> indexes.
> + * We mostly need to relax the limitations in
> RelationFindReplTupleByIndex().
> + * For partial indexes, the required changes are likely to be larger. If
> + * none of the tuples satisfy the expression for the index scan, we must
> + * fall-back to sequential execution, which might not be a good idea in
> some
> + * cases.
>
> The should/must change (the one in the XXX comment) does not seem quite
> right.
>
> SUGGESTION
> "we must fall-back to sequential execution" --> "we fallback to
> sequential execution"
>

fixed, thanks.


>
> ==
> .../subs

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı

Hi Vignesh C,


> > Hmm, can you please elaborate more on this? The declaration
> > and assignment are already on different lines.
> >
> > ps: pgindent changed this line a bit. Does that look better?
>
> I thought of changing it to something like below:
> bool isUsableIndex;
> Oid idxoid = lfirst_oid(lc);
> Relation indexRelation = index_open(idxoid, AccessShareLock);
> IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);
>
> isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);
>
>
Alright, this looks slightly better. I did a small change to your
suggestion, basically kept  *lfirst_oid *
as the first statement in the loop.

I'll attach the changes on v38 in the next e-mail.


Thanks,
Onder KALACI

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı

Hi Amit, all


> >
>
> This new test takes ~9s on my machine whereas most other tests in
> subscription/t take roughly 2-5s. I feel we should try to reduce the
> test timing without sacrificing much of the functionality or code
> coverage.


Alright, that is reasonable.


> I think if possible we should try to reduce setup/teardown
> cost for each separate test by combining them where possible. I have a
> few comments on tests which also might help to optimize these tests.
>
> 1.
> +# Testcase start: SUBSCRIPTION USES INDEX
> +#
> +# Basic test where the subscriber uses index
> +# and only updates 1 row and deletes
> +# 1 other row
> ...
> ...
> +# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
> +#
> +# Basic test where the subscriber uses index
> +# and updates 50 rows
>
> ...
> +# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
> +#
> +# Basic test where the subscriber uses index
> +# and deletes 200 rows
>
> I think to a good extent these tests overlap each other. I think we
> can have just one test for the index with multiple columns that
> updates multiple rows and have both updates and deletes.
>

Alright, dropped *SUBSCRIPTION USES INDEX*, expanded
*SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS* with an UPDATE
that touches multiple rows


> 2.
> +# Testcase start: SUBSCRIPTION DOES NOT USE PARTIAL INDEX
> ...
> ...
> +# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS
>
> Instead of having separate tests where we do all setups again, I think
> it would be better if we create both the indexes in one test and show
> that none of them is used.
>

Makes sense


>
> 3.
> +# now, the update could either use the test_replica_id_full_idy or
> test_replica_id_full_idy index
> +# it is not possible for user to control which index to use
>
> The name of the second index is wrong in the above comment.
>

thanks, fixed


>
> 4.
> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
>
> As we have removed enable_indexscan check, you should remove this test.
>

Hmm, I think my rebase problems are causing confusion here, which v38 fixes.

In the first commit, we have ENABLE_INDEXSCAN checks. In the second commit,
I changed the same test to use enable_replica_identity_full_index_scan.

If we are going to only consider the first patch to get into the master
branch,
I can probably drop the test. In that case, I'm not sure what is our
perspective
on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code
(hence the test)?


>
> 5. In general, the line length seems to vary a lot for different
> multi-line comments. Though we are not strict in that it will look
> better if there is consistency in that (let's have ~80 cols line
> length for each comment in a single line).
>
>
Went over the tests, and made ~80 cols. There is one exception, in the first
commit, the test for enable_indexcan is still shorter, but I failed to make
that
properly. I'll try to fix that as well, but I didn't want to block the
progress due to
that.

Also, you have not noted, but I think* SUBSCRIPTION USES INDEX WITH
MULTIPLE COLUMNS*
already covers  *SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.*

So, I changed the first one to *SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS
AND COLUMNS*
and dropped the second one. Let me know if it does not make sense to you.
If I try, there are few more
opportunities to squeeze in some more tests together, but those would start
to complicate readability.

Attached v38.

Thanks,
Onder KALACI


v38_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


v38_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data

RE: Support logical replication of DDLs

2023-03-09 Thread houzj.f...@fujitsu.com

On Thur, Mar 9, 2023 10:27 AM Wang, Wei/王 威 
> On Mon, Mar 6, 2023 18:17 PM Wang, Wei/王 威 
> wrote:
> > On Mon, Mar 6, 2023 14:34 AM Ajin Cherian  wrote:
> > > Changes are in patch 1 and patch 2
> >
> > Thanks for updating the patch set.
> >
> > Here are some comments:
> 
> Here are some more comments for v-75-0002* patch:

Thanks for your comments.

> 1. In the function deparse_AlterRelation
> + if ((sub->address.objectId != relId &&
> +  sub->address.objectId != InvalidOid) &&
> + !(subcmd->subtype == AT_AddConstraint &&
> +   subcmd->recurse) &&
> + istable)
> + continue;
> I think when we execute the command "ALTER TABLE ... CLUSTER ON" 
> (subtype is AT_ClusterOn), this command will be skipped for parsing. I 
> think we need to parse this command here.
> 
> I think we are skipping some needed parsing due to this check, such as 
> [1].#1 and the AT_ClusterOn command mentioned above. After reading the 
> thread, I think the purpose of this check is to fix the bug in [2] 
> (see the last point in [2]).
> I think maybe we could modify this check to `continue` when
> sub->address.objectId and relId are inconsistent and 
> sub->sub->address.objectId is a
> child (inherited or partition) table. What do you think?

Fixed as suggested.

> ~~~
> 
> 2. In the function deparse_CreateStmt
> I think when we execute the following command:
> `CREATE TABLE tbl (a int GENERATED ALWAYS AS (1) STORED);` the 
> deparsed result is :
> `CREATE  TABLE  public.tbl (a pg_catalog.int4 STORAGE plain 
> GENERATED ALWAYS AS 1 STORED);` I think the parentheses around 
> generation_expr(I mean `1`) are missing, which would cause a syntax 
> error.

Fixed.

> ~~~
> 
> 3. In the function deparse_IndexStmt
> I think we missed parsing of options [NULLS NOT DISTINCT] in the 
> following
> command:
> ```
> CREATE UNIQUE INDEX ... ON table_name ... NULLS NOT DISTINCT; ``` I 
> think we could check this option via node->nulls_not_distinct.

Fixed.

Best Regards,
Hou zj

Re: Raising the SCRAM iteration count

2023-03-09 Thread Daniel Gustafsson

> On 9 Mar 2023, at 08:09, Michael Paquier  wrote:
> 
> On Wed, Mar 08, 2023 at 05:21:20PM +0900, Michael Paquier wrote:
>> On Wed, Mar 08, 2023 at 09:07:36AM +0100, Daniel Gustafsson wrote:
>>> AFAIK a TAP test with psql_interactive is the only way to do this so that's
>>> what I've implemented.
> 
> I cannot think of a better idea than what you have here, so I am
> marking this patch as ready for committer.  

Thanks for review!

> I am wondering how stable a logic based on a timer of 5s would be..

Actually that was a bug, it should be using the default timeout and restarting
for each operation to ensure that even overloaded hosts wont time out unless
something is actually broken/incorrect.  I've fixed that in the attached rev
and also renamed the password in the regress test from "raisediterationcount"
as it's now lowering the count in the test.

Unless there objections to this version I plan to commit that during this CF.

--
Daniel Gustafsson

v8-0001-Make-SCRAM-iteration-count-configurable.patch
Description: Binary data

Re: Add standard collation UNICODE

2023-03-09 Thread Peter Eisentraut


On 08.03.23 19:25, Jeff Davis wrote:

Why is "unicode" only provided for the UTF-8 encoding? For "ucs_basic"
that makes some sense, because the implementation only works in UTF-8.
But here we are using ICU, and the "und" locale should work for any
ICU-supported encoding. I suggest that we use collencoding=-1 for
"unicode", and the docs can just add a note next to "ucs_basic" that it
only works for UTF-8, because that's the weird case.


make sense


For the docs, I suggest that you clarify that "ucs_basic" has the same
behavior as the C locale does *in the UTF-8 encoding*. Not all users
might pick up on the subtlety that the C locale has different behaviors
in different encodings.


Ok, word-smithed a bit more.

How about this patch version?

From a8e33d010f60cceb9442123bd0531451875df313 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Mar 2023 11:14:28 +0100
Subject: [PATCH v2] Add standard collation UNICODE

Discussion: 
https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d...@enterprisedb.com
---
 doc/src/sgml/charset.sgml | 31 ---
 src/bin/initdb/initdb.c   | 10 +++---
 2 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 3032392b80..12fabb7372 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -659,9 +659,34 @@ Standard Collations

 

-Additionally, the SQL standard collation name ucs_basic
-is available for encoding UTF8.  It is equivalent
-to C and sorts by Unicode code point.
+Additionally, two SQL standard collation names are available:
+
+
+ 
+  unicode
+  
+   
+This collation sorts using the Unicode Collation Algorithm with the
+Default Unicode Collation Element Table.  It is available in all
+encodings.  ICU support is required to use this collation.  (This
+collation has the same behavior as the ICU root locale; see .)
+   
+  
+ 
+
+ 
+  ucs_basic
+  
+   
+This collation sorts by Unicode code point.  It is only available for
+encoding UTF8.  (This collation has the same
+behavior as the libc locale specification C in
+UTF8 encoding.)
+   
+  
+ 
+

   
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 5e3c6a27c4..d303cc5609 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1486,10 +1486,14 @@ static void
 setup_collation(FILE *cmdfd)
 {
/*
-* Add an SQL-standard name.  We don't want to pin this, so it doesn't 
go
-* in pg_collation.h.  But add it before reading system collations, so
-* that it wins if libc defines a locale named ucs_basic.
+* Add SQL-standard names.  We don't want to pin these, so they don't go
+* in pg_collation.dat.  But add them before reading system collations, 
so
+* that they win if libc defines a locale with the same name.
 */
+   PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, 
collowner, collprovider, collisdeterministic, collencoding, colliculocale)"
+ "VALUES 
(pg_nextoid('pg_catalog.pg_collation', 'oid', 
'pg_catalog.pg_collation_oid_index'), 'unicode', 'pg_catalog'::regnamespace, 
%u, '%c', true, -1, 'und');\n\n",
+ BOOTSTRAP_SUPERUSERID, COLLPROVIDER_ICU);
+
PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, 
collowner, collprovider, collisdeterministic, collencoding, collcollate, 
collctype)"
  "VALUES 
(pg_nextoid('pg_catalog.pg_collation', 'oid', 
'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, 
%u, '%c', true, %d, 'C', 'C');\n\n",
  BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, 
PG_UTF8);

base-commit: 36ea345f8fa616fd9b40576310e54145aa70c1a1
-- 
2.39.2

Re: Allow tests to pass in OpenSSL FIPS mode

2023-03-09 Thread Michael Paquier

On Thu, Mar 09, 2023 at 10:01:14AM +0100, Peter Eisentraut wrote:
> I have fixed these comments to match cryptohash_openssl.c.

Missed that, thanks for the fix.
--
Michael


signature.asc
Description: PGP signature

Re: A bug with ExecCheckPermissions

2023-03-09 Thread Alvaro Herrera

I didn't like very much this business of setting the perminfoindex
directly to '2' and '1'.  It looks ugly with no explanation.  What do
you think of creating the as we go along and set each index
correspondingly, as in the attached?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)
>From f0041bf1ac2f0c727d6b8495a63a548240c3b9c5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 9 Mar 2023 11:33:18 +0100
Subject: [PATCH] fix ExecCheckPermissions call in RI_Initial_Check

---
 src/backend/executor/execMain.c | 22 
 src/backend/utils/adt/ri_triggers.c | 40 -
 2 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b32f419176..6230f5e3d4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -577,6 +577,28 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
 	ListCell   *l;
 	bool		result = true;
 
+#ifdef USE_ASSERT_CHECKING
+	Bitmapset  *indexset = NULL;
+
+	/* Check that rteperminfos is consistent with rangeTable */
+	foreach(l, rangeTable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);
+
+		if (rte->perminfoindex != 0)
+		{
+			/* Sanity checks */
+			(void) getRTEPermissionInfo(rteperminfos, rte);
+			/* Many-to-one mapping not allowed */
+			Assert(!bms_is_member(rte->perminfoindex, indexset));
+			indexset = bms_add_member(indexset, rte->perminfoindex);
+		}
+	}
+
+	/* All rteperminfos are referenced */
+	Assert(bms_num_members(indexset) == list_length(rteperminfos));
+#endif
+
 	foreach(l, rteperminfos)
 	{
 		RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 375b17b9f3..b4e789899e 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1373,10 +1373,11 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
 	char		pkattname[MAX_QUOTED_NAME_LEN + 3];
 	char		fkattname[MAX_QUOTED_NAME_LEN + 3];
-	RangeTblEntry *pkrte;
-	RangeTblEntry *fkrte;
+	RangeTblEntry *rte;
 	RTEPermissionInfo *pk_perminfo;
 	RTEPermissionInfo *fk_perminfo;
+	List	   *rtes = NIL;
+	List	   *perminfos = NIL;
 	const char *sep;
 	const char *fk_only;
 	const char *pk_only;
@@ -1394,25 +1395,29 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	 *
 	 * XXX are there any other show-stopper conditions to check?
 	 */
-	pkrte = makeNode(RangeTblEntry);
-	pkrte->rtekind = RTE_RELATION;
-	pkrte->relid = RelationGetRelid(pk_rel);
-	pkrte->relkind = pk_rel->rd_rel->relkind;
-	pkrte->rellockmode = AccessShareLock;
-
 	pk_perminfo = makeNode(RTEPermissionInfo);
 	pk_perminfo->relid = RelationGetRelid(pk_rel);
 	pk_perminfo->requiredPerms = ACL_SELECT;
-
-	fkrte = makeNode(RangeTblEntry);
-	fkrte->rtekind = RTE_RELATION;
-	fkrte->relid = RelationGetRelid(fk_rel);
-	fkrte->relkind = fk_rel->rd_rel->relkind;
-	fkrte->rellockmode = AccessShareLock;
+	perminfos = lappend(perminfos, pk_perminfo);
+	rte = makeNode(RangeTblEntry);
+	rte->rtekind = RTE_RELATION;
+	rte->relid = RelationGetRelid(pk_rel);
+	rte->relkind = pk_rel->rd_rel->relkind;
+	rte->rellockmode = AccessShareLock;
+	rte->perminfoindex = list_length(perminfos);
+	rtes = lappend(rtes, rte);
 
 	fk_perminfo = makeNode(RTEPermissionInfo);
 	fk_perminfo->relid = RelationGetRelid(fk_rel);
 	fk_perminfo->requiredPerms = ACL_SELECT;
+	perminfos = lappend(perminfos, fk_perminfo);
+	rte = makeNode(RangeTblEntry);
+	rte->rtekind = RTE_RELATION;
+	rte->relid = RelationGetRelid(fk_rel);
+	rte->relkind = fk_rel->rd_rel->relkind;
+	rte->rellockmode = AccessShareLock;
+	rte->perminfoindex = list_length(perminfos);
+	rtes = lappend(rtes, rte);
 
 	for (int i = 0; i < riinfo->nkeys; i++)
 	{
@@ -1425,8 +1430,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 		fk_perminfo->selectedCols = bms_add_member(fk_perminfo->selectedCols, attno);
 	}
 
-	if (!ExecCheckPermissions(list_make2(fkrte, pkrte),
-			  list_make2(fk_perminfo, pk_perminfo), false))
+	if (!ExecCheckPermissions(rtes, perminfos, false))
 		return false;
 
 	/*
@@ -1436,9 +1440,9 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	 */
 	if (!has_bypassrls_privilege(GetUserId()) &&
 		((pk_rel->rd_rel->relrowsecurity &&
-		  !object_ownercheck(RelationRelationId, pkrte->relid, GetUserId())) ||
+		  !object_ownercheck(RelationRelationId, RelationGetRelid(pk_rel), GetUserId())) ||
 		 (fk_rel->rd_rel->relrowsecurity &&
-		  !object_ownercheck(RelationRelationId, fkrte->relid, GetUserId()
+		  !object_ownercheck(RelationRelationId,

Re: Small omission in type_sanity.sql

2023-03-09 Thread Alvaro Herrera

On 2023-Jan-27, Andres Freund wrote:

> On 2023-01-27 20:39:04 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > Tom, is there a reason we run the various sanity tests early-ish in the
> > > schedule? It does seem to reduce their effectiveness a bit...
> > 
> > Originally, those tests were mainly needed to sanity-check the
> > hand-maintained initial catalog data, so it made sense to run them
> > early.
> 
> It's also kinda useful to have some basic validity testing early on, because
> if there's something wrong with the catalog values, it'll cause lots of issues
> later.

We can just list the tests twice in the schedule ...

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: A bug with ExecCheckPermissions

2023-03-09 Thread Amit Langote

Hi Alvaro,

On Thu, Mar 9, 2023 at 7:39 PM Alvaro Herrera  wrote:
> I didn't like very much this business of setting the perminfoindex
> directly to '2' and '1'.  It looks ugly with no explanation.  What do
> you think of creating the as we go along and set each index
> correspondingly, as in the attached?

Agree it looks cleaner and self-explanatory that way.  Thanks.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Amit Kapila

On Thu, Mar 9, 2023 at 3:26 PM Önder Kalacı  wrote:
>
>>
>> 4.
>> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
>>
>> As we have removed enable_indexscan check, you should remove this test.
>
>
> Hmm, I think my rebase problems are causing confusion here, which v38 fixes.
>

I think it is still not fixed in v38 as the test is still present in 0001.

> In the first commit, we have ENABLE_INDEXSCAN checks. In the second commit,
> I changed the same test to use enable_replica_identity_full_index_scan.
>
> If we are going to only consider the first patch to get into the master 
> branch,
> I can probably drop the test. In that case, I'm not sure what is our 
> perspective
> on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code
> (hence the test)?
>

I am not sure what we are going to do on this because I feel we need
some storage option as you have in 0002 patch but you and Andres
thinks that is not required. So, we can discuss that a bit more after
0001 is committed but if there is no agreement then we need to
probably drop it. Even if drop it, I don't think using enable_index
makes sense. I think for now you don't need to send 0002 patch, let's
first try to get 0001 patch and then we can discuss about 0002.

>>
>>
>> 5. In general, the line length seems to vary a lot for different
>> multi-line comments. Though we are not strict in that it will look
>> better if there is consistency in that (let's have ~80 cols line
>> length for each comment in a single line).
>>
>
> Went over the tests, and made ~80 cols. There is one exception, in the first
> commit, the test for enable_indexcan is still shorter, but I failed to make 
> that
> properly. I'll try to fix that as well, but I didn't want to block the 
> progress due to
> that.
>
> Also, you have not noted, but I think SUBSCRIPTION USES INDEX WITH MULTIPLE 
> COLUMNS
> already covers  SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.
>
> So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND 
> COLUMNS
> and dropped the second one. Let me know if it does not make sense to you. If 
> I try, there are few more
> opportunities to squeeze in some more tests together, but those would start 
> to complicate readability.
>

I still want to reduce the test time and will think about it. Which of
the other tests do you think can be combined?

BTW, did you consider updating the patch based on my yesterday's email [1]?

One minor comment:
+# now, create index and see that the index is used
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for creating index test_replica_id_full_idx";
+
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
+$node_publisher->wait_for_catchup($appname);
+
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates one row via index";
+
+
+# now, create index on column y as well
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX test_replica_id_full_idy ON test_replica_id_full(y)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idy';}
+) or die "Timed out while waiting for creating index test_replica_id_full_idy";

It appears you are using inconsistent spacing. It may be better to use
a single empty line wherever required.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BoM_v-b_WDHZmqCyVHU2oD4j3vF9YcH9xVHj%3DzAfy4og%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Amit Kapila

On Thu, Mar 9, 2023 at 4:50 PM Amit Kapila  wrote:
>
> On Thu, Mar 9, 2023 at 3:26 PM Önder Kalacı  wrote:
> >
> >
> > So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS 
> > AND COLUMNS
> > and dropped the second one. Let me know if it does not make sense to you. 
> > If I try, there are few more
> > opportunities to squeeze in some more tests together, but those would start 
> > to complicate readability.
> >
>
> I still want to reduce the test time and will think about it. Which of
> the other tests do you think can be combined?
>

Some of the ideas I can think of are as follows:

1. Combine "SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS"
and "SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS" such that after
verifying updates and deletes of the first test, we can drop some of
the columns on both publisher and subscriber, then use alter
subscription ... refresh publication command and then do the steps of
the second test. Note that we don't add tables after initial setup,
only changing schema.

2. We can also combine "Some NULL values" and "PUBLICATION LACKS THE
COLUMN ON THE SUBS INDEX" as both use the same schema. After the first
test, we need to drop the existing index and create a new index on the
subscriber node.

3. General comment
+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");

I think here you are updating 20 rows not 200. So, the comment seems
wrong to me.

Please think more and see if we can combine some other tests like
"Unique index that is not primary key or replica identity" and the
test we will have after comment#2 above.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı

Hi,

Amit Kapila , 8 Mar 2023 Çar, 14:42 tarihinde şunu
yazdı:

> On Wed, Mar 8, 2023 at 4:51 PM Önder Kalacı  wrote:
> >
> >
> >>
> >> I just share this case and then we
> >> can discuss should we pick the index which only contain the extra
> columns on the
> >> subscriber.
> >>
> >
> > I think its performance implications come down to the discussion on [1].
> Overall, I prefer
> > avoiding adding any additional complexity in the code for some edge
> cases. The code
> > can handle this sub-optimal user pattern, with a sub-optimal performance.
> >
>
> It is fine to leave this and Hou-San's case if they make the patch
> complex. However, it may be better to give it a try and see if this or
> other regression/optimization can be avoided without adding much
> complexity to the patch. You can prepare a top-up patch and then we
> can discuss it.
>
>
>
Alright, I did some basic prototypes for the problems mentioned, just to
show
that these problems can be solved without too much hassle. But, the patchees
are not complete, some tests fail, no comments / tests exist, some values
should be
cached etc.  Mostly sharing as a heads up and sharing the progress given I
have not
responded to this specific mail. I'll update these when I have some extra
time after
replying to the 0001 patch.

 Thanks,
Onder


wip_optimize_for_non_pkey_non_ri_unique_index.patch
Description: Binary data


wip_for_optimize_index_column_match.diff
Description: Binary data

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-09 Thread Önder Kalacı

Hi Amit,


> >
> >>
> >> 4.
> >> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
> >>
> >> As we have removed enable_indexscan check, you should remove this test.
> >
> >
> > Hmm, I think my rebase problems are causing confusion here, which v38
> fixes.
> >
>
> I think it is still not fixed in v38 as the test is still present in 0001.
>

Ah, yes, sorry again for the noise. v39 will drop that.


>
> > In the first commit, we have ENABLE_INDEXSCAN checks. In the second
> commit,
> > I changed the same test to use enable_replica_identity_full_index_scan.
> >
> > If we are going to only consider the first patch to get into the master
> branch,
> > I can probably drop the test. In that case, I'm not sure what is our
> perspective
> > on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code
> > (hence the test)?
> >
>
> I am not sure what we are going to do on this because I feel we need
> some storage option as you have in 0002 patch but you and Andres
> thinks that is not required. So, we can discuss that a bit more after
> 0001 is committed but if there is no agreement then we need to
> probably drop it. Even if drop it, I don't think using enable_index
> makes sense. I think for now you don't need to send 0002 patch, let's
> first try to get 0001 patch and then we can discuss about 0002.
>

sounds good, when needed I'll rebase 0002.


> >
> > Also, you have not noted, but I think SUBSCRIPTION USES INDEX WITH
> MULTIPLE COLUMNS
> > already covers  SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.
> >
> > So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE
> ROWS AND COLUMNS
> > and dropped the second one. Let me know if it does not make sense to
> you. If I try, there are few more
> > opportunities to squeeze in some more tests together, but those would
> start to complicate readability.
> >
>
> I still want to reduce the test time and will think about it. Which of
> the other tests do you think can be combined?
>
>
I'll follow your suggestion in the next e-mail [2], and focus on further
improvements.

BTW, did you consider updating the patch based on my yesterday's email [1]?
>
>
Yes, replied to that one just now with some wip commits [1]


> It appears you are using inconsistent spacing. It may be better to use
> a single empty line wherever required.
>
>
Sure, let me fix those.

attached v39.

[1]
https://www.postgresql.org/message-id/CACawEhXGnk6v7UOHrxuJjjybHvvq33Zv666ouy4UzjPfJM6tBw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAA4eK1LSYWrthA3xjbrZvZVmwuha10HtM3-QRrVMD7YBt4t3pg%40mail.gmail.com


v39_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data

Re: Avoid multiple SetLatch() calls in procsignal_sigusr1_handler()

2023-03-09 Thread Drouvot, Bertrand


Hi,

On 2/28/23 4:30 PM, Bharath Rupireddy wrote:

Hi,

Most of the multiplexed SIGUSR1 handlers are setting latch explicitly
when the procsignal_sigusr1_handler() can do that for them at the end.


Right.


These multiplexed handlers are currently being used as SIGUSR1
handlers, not as independent handlers, so no problem if SetLatch() is
removed from them. 


Agree, they are only used in procsignal_sigusr1_handler().


A few others do it right by saying /* latch will be
set by procsignal_sigusr1_handler */.


Yeap, so do HandleProcSignalBarrierInterrupt() and 
HandleLogMemoryContextInterrupt().


Although, calling SetLatch() in
quick succession does no harm (it just returns if the latch was
previously set), it seems unnecessary.



Agree.


I'm attaching a patch that avoids multiple SetLatch() calls.

Thoughts?



I agree with the idea behind the patch. The thing
that worry me a bit is that the changed functions are defined
as external and so may produce an impact outside of core pg and I'm
not sure that's worth it.

Otherwise the patch LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-03-09 Thread Justin Pryzby

On Tue, Mar 07, 2023 at 10:18:44AM -0800, Andres Freund wrote:
> Hi,
> 
> On 2023-03-06 15:21:14 -0500, Melanie Plageman wrote:
> > Good point. Attached is what you suggested. I committed the transaction
> > before the drop table so that the statistics would be visible when we
> > queried pg_stat_io.
> 
> Pushed, thanks for the report, analysis and fix, Tom, Horiguchi-san, Melanie.

There's a 2nd portion of the test that's still flapping, at least on
cirrusci.

The issue that Tom mentioned is at:
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;

But what I've seen on cirrusci is at:
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;

https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/5355168397524992/log/src/test/recovery/tmp_check/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/6142435751886848/testrun/build/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress

It'd be neat if cfbot could show a histogram of test failures, although
I'm not entirely sure what granularity would be most useful: the test
that failed (027_regress) or the way it failed (:after_write >
:before_writes).  Maybe it's enough to show the test, with links to its
recent failures.

-- 
Justin

Re: Refactor calculations to use instr_time

2023-03-09 Thread Nazir Bilal Yavuz

Hi,

There was a warning while applying the patch, v5 is attached.

Regards,
Nazir Bilal Yavuz
Microsoft
From dcd49e48a0784a95b8731df1c6ee7c3a612a8529 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 9 Mar 2023 15:35:38 +0300
Subject: [PATCH v5] Refactor instr_time calculations

In instr_time.h it is stated that:

* When summing multiple measurements, it's recommended to leave the
* running sum in instr_time form (ie, use INSTR_TIME_ADD or
* INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end.

So, this patch aims to refactor 'PendingWalStats' to use 'instr_time' instead
of 'PgStat_Counter' while accumulating 'wal_write_time' and 'wal_sync_time'.

Also, some calculations are refactored to use 'INSTR_TIME_ACCUM_DIFF' instead
of using 'INSTR_TIME_SUBTRACT' and 'INSTR_TIME_ADD'.
---
 src/backend/access/transam/xlog.c   |  6 ++---
 src/backend/storage/file/buffile.c  |  6 ++---
 src/backend/utils/activity/pgstat_wal.c | 31 +
 src/include/pgstat.h| 17 +-
 src/tools/pgindent/typedefs.list|  1 +
 5 files changed, 37 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 543d4d897ae..46821ad6056 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2206,8 +2206,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 	instr_time	duration;
 
 	INSTR_TIME_SET_CURRENT(duration);
-	INSTR_TIME_SUBTRACT(duration, start);
-	PendingWalStats.wal_write_time += INSTR_TIME_GET_MICROSEC(duration);
+	INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, start);
 }
 
 PendingWalStats.wal_write++;
@@ -8204,8 +8203,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		instr_time	duration;
 
 		INSTR_TIME_SET_CURRENT(duration);
-		INSTR_TIME_SUBTRACT(duration, start);
-		PendingWalStats.wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
+		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start);
 	}
 
 	PendingWalStats.wal_sync++;
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 0a51624df3b..e55f86b675e 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -469,8 +469,7 @@ BufFileLoadBuffer(BufFile *file)
 	if (track_io_timing)
 	{
 		INSTR_TIME_SET_CURRENT(io_time);
-		INSTR_TIME_SUBTRACT(io_time, io_start);
-		INSTR_TIME_ADD(pgBufferUsage.temp_blk_read_time, io_time);
+		INSTR_TIME_ACCUM_DIFF(pgBufferUsage.temp_blk_read_time, io_time, io_start);
 	}
 
 	/* we choose not to advance curOffset here */
@@ -544,8 +543,7 @@ BufFileDumpBuffer(BufFile *file)
 		if (track_io_timing)
 		{
 			INSTR_TIME_SET_CURRENT(io_time);
-			INSTR_TIME_SUBTRACT(io_time, io_start);
-			INSTR_TIME_ADD(pgBufferUsage.temp_blk_write_time, io_time);
+			INSTR_TIME_ACCUM_DIFF(pgBufferUsage.temp_blk_write_time, io_time, io_start);
 		}
 
 		file->curOffset += bytestowrite;
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index e8598b2f4e0..58daae3fbd6 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -21,7 +21,7 @@
 #include "executor/instrument.h"
 
 
-PgStat_WalStats PendingWalStats = {0};
+PgStat_PendingWalStats PendingWalStats = {0};
 
 /*
  * WAL usage counters saved from pgWALUsage at the previous call to
@@ -70,7 +70,7 @@ bool
 pgstat_flush_wal(bool nowait)
 {
 	PgStatShared_Wal *stats_shmem = &pgStatLocal.shmem->wal;
-	WalUsage	diff = {0};
+	WalUsage	wal_usage_diff = {0};
 
 	Assert(IsUnderPostmaster || !IsPostmasterEnvironment);
 	Assert(pgStatLocal.shmem != NULL &&
@@ -88,25 +88,26 @@ pgstat_flush_wal(bool nowait)
 	 * Calculate how much WAL usage counters were increased by subtracting the
 	 * previous counters from the current ones.
 	 */
-	WalUsageAccumDiff(&diff, &pgWalUsage, &prevWalUsage);
-	PendingWalStats.wal_records = diff.wal_records;
-	PendingWalStats.wal_fpi = diff.wal_fpi;
-	PendingWalStats.wal_bytes = diff.wal_bytes;
+	WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevWalUsage);
 
 	if (!nowait)
 		LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
 	else if (!LWLockConditionalAcquire(&stats_shmem->lock, LW_EXCLUSIVE))
 		return true;
 
-#define WALSTAT_ACC(fld) stats_shmem->stats.fld += PendingWalStats.fld
-	WALSTAT_ACC(wal_records);
-	WALSTAT_ACC(wal_fpi);
-	WALSTAT_ACC(wal_bytes);
-	WALSTAT_ACC(wal_buffers_full);
-	WALSTAT_ACC(wal_write);
-	WALSTAT_ACC(wal_sync);
-	WALSTAT_ACC(wal_write_time);
-	WALSTAT_ACC(wal_sync_time);
+#define WALSTAT_ACC(fld, var_to_add) \
+	(stats_shmem->stats.fld += var_to_add.fld)
+#define WALLSTAT_ACC_INSTR_TIME_TYPE(fld) \
+	(stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
+	WALSTAT_ACC(wal_records, wal_usage_diff);
+	WALSTAT_ACC(wal_fpi, wal_usage_diff);
+	WALSTAT_ACC(wal_bytes, wal_usage_diff);
+	WALSTAT_ACC(wal_buffers_full, P

Re: SQL/JSON revisited

2023-03-09 Thread Peter Eisentraut


On 08.03.23 22:40, Andrew Dunstan wrote:
There are probably some fairly easy opportunities to reduce the number 
of non-terminals introduced here (e.g. I think json_aggregate_func could 
possibly be expanded in place without introducing 
json_object_aggregate_constructor and json_array_aggregate_constructor). 
I'm going to make an attempt at that, at least to pick some low hanging 
fruit. But in the end I think we are going to be left with a significant 
expansion of the grammar rules, more or less forced on us by the way the 
SQL Standards Committee rather profligately invents new ways of 
contorting the grammar.


I browsed these patches, and I agree that the grammar is the thing that 
sticks out as something that could be tightened up a bit.  Try to reduce 
the number of different symbols, and check that the keywords are all in 
alphabetical order.


There are also various bits of code that are commented out, in some 
cases because they can't be implemented, in some cases without 
explanation.  I think these should all be removed.  Otherwise, whoever 
needs to touch this code next would be under some sort of obligation to 
keep the commented-out code up to date with surrounding changes, which 
would be awkward.  We can find better ways to explain missing 
functionality and room for improvement.


Also, perhaps we can find better names for the new test files.  Like, 
what does "sqljson.sql" mean, as opposed to, say, "json.sql"?  Maybe 
something like "json_functions", "json_expressions", etc. would be 
clearer.  (Starting it with "json" would also group the files better.)


These both seem like things not worth holding up progress for, and I 
think it would be good to get these patches committed as soon as 
possible. My intention is to commit them (after some grammar 
adjustments) plus their documentation in the next few days.


If possible, the documentation for each incremental part should be part 
of that patch, not a separate all-in-one patch.

Re: Should vacuum process config file reload more often

2023-03-09 Thread Masahiko Sawada

On Thu, Mar 9, 2023 at 4:47 PM John Naylor  wrote:
>
>
> On Thu, Mar 9, 2023 at 12:42 AM Jim Nasby  wrote:
> >
> > Doesn't the dead tuple space grow as needed? Last I looked we don't 
> > allocate up to 1GB right off the bat.
>
> Incorrect.
>
> > Of course, if the patch that eliminates the 1GB vacuum limit gets committed 
> > the situation will be even worse.
>
> If you're referring to the proposed tid store, I'd be interested in seeing a 
> reproducible test case with a m_w_m over 1GB where it makes things worse than 
> the current state of affairs.

And I think that the tidstore makes it easy to react to
maintenance_work_mem changes. We don't need to enlarge it and just
update its memory limit at an appropriate time.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Add shared buffer hits to pg_stat_io

2023-03-09 Thread Melanie Plageman

On Wed, Mar 8, 2023 at 2:23 PM Andres Freund  wrote:
>
> On 2023-03-08 13:44:32 -0500, Melanie Plageman wrote:
> > However, I am concerned that, while unlikely, this could be flakey.
> > Something could happen to force all of those blocks out of shared
> > buffers (even though they were just read in) before we hit them.
>
> You could make the test query a simple nested loop self-join, that'll prevent
> the page being evicted, because it'll still be pinned on the outer side, while
> generating hits on the inner side.

Good idea. v3 attached.
From e15bd63d662780898ef6fd584608acb13a12dbe1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 25 Feb 2023 14:37:25 -0500
Subject: [PATCH v3 1/2] Reorder pgstatfuncs-local enum

---
 src/backend/utils/adt/pgstatfuncs.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index b61a12382b..1112bc2904 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1276,16 +1276,16 @@ pgstat_get_io_op_index(IOOp io_op)
 	{
 		case IOOP_EVICT:
 			return IO_COL_EVICTIONS;
+		case IOOP_EXTEND:
+			return IO_COL_EXTENDS;
+		case IOOP_FSYNC:
+			return IO_COL_FSYNCS;
 		case IOOP_READ:
 			return IO_COL_READS;
 		case IOOP_REUSE:
 			return IO_COL_REUSES;
 		case IOOP_WRITE:
 			return IO_COL_WRITES;
-		case IOOP_EXTEND:
-			return IO_COL_EXTENDS;
-		case IOOP_FSYNC:
-			return IO_COL_FSYNCS;
 	}
 
 	elog(ERROR, "unrecognized IOOp value: %d", io_op);
-- 
2.37.2

From b648727abe315375fbf13d314fd3b9398a75b26f Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 25 Feb 2023 14:36:06 -0500
Subject: [PATCH v3 2/2] Track shared buffer hits in pg_stat_io

Count new IOOP_HITs and add "hits" column to pg_stat_io.

Reviewed-by: Bertrand Drouvot 
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_beMa9Hzih40%3DXPYqhDVz6tsgUGTrhZXRo%3Dunp%2Bszb%3DUA%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml   | 11 
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/storage/buffer/bufmgr.c| 38 ++
 src/backend/storage/buffer/localbuf.c  | 11 ++--
 src/backend/utils/activity/pgstat_io.c |  2 +-
 src/backend/utils/adt/pgstatfuncs.c|  3 ++
 src/include/catalog/pg_proc.dat|  6 ++--
 src/include/pgstat.h   |  1 +
 src/include/storage/buf_internals.h|  2 +-
 src/test/regress/expected/rules.out|  3 +-
 src/test/regress/expected/stats.out| 34 +--
 src/test/regress/sql/stats.sql | 21 --
 12 files changed, 90 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d0..8b34ca60bc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3855,6 +3855,17 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   
+hits bigint
+   
+   
+The number of times a desired block was found in a shared buffer.
+   
+  
+ 
+
  
   

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 34ca0e739f..87bbbdfcb3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1126,6 +1126,7 @@ SELECT
b.writes,
b.extends,
b.op_bytes,
+   b.hits,
b.evictions,
b.reuses,
b.fsyncs,
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0a05577b68..05fd3c9a2a 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -472,7 +472,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 			   ForkNumber forkNum,
 			   BlockNumber blockNum,
 			   BufferAccessStrategy strategy,
-			   bool *foundPtr, IOContext *io_context);
+			   bool *foundPtr, IOContext io_context);
 static void FlushBuffer(BufferDesc *buf, SMgrRelation reln,
 		IOObject io_object, IOContext io_context);
 static void FindAndDropRelationBuffers(RelFileLocator rlocator,
@@ -850,13 +850,14 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	if (isLocalBuf)
 	{
 		/*
-		 * LocalBufferAlloc() will set the io_context to IOCONTEXT_NORMAL. We
-		 * do not use a BufferAccessStrategy for I/O of temporary tables.
+		 * We do not use a BufferAccessStrategy for I/O of temporary tables.
 		 * However, in some cases, the "strategy" may not be NULL, so we can't
 		 * rely on IOContextForStrategy() to set the right IOContext for us.
 		 * This may happen in cases like CREATE TEMPORARY TABLE AS...
 		 */
-		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found, &io_context);
+		io_context = IOCONTEXT_NORMAL;
+		io_object = IOOBJECT_TEMP_RELATION;
+		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
 		if (found)
 			pgBufferUsage.lo

Re: buildfarm + meson

2023-03-09 Thread Andrew Dunstan



On 2023-03-08 We 17:23, Andrew Dunstan wrote:



On 2023-03-08 We 14:22, Andres Freund wrote:

Hi,

On 2023-03-02 17:35:26 -0500, Andrew Dunstan wrote:

On 2023-03-02 Th 17:06, Andres Freund wrote:

Hi

On 2023-03-02 17:00:47 -0500, Andrew Dunstan wrote:

On 2023-03-01 We 16:32, Andres Freund wrote:

This is now working
on my MSVC test rig (WS2019, VS2019, Strawberry Perl), including TAP tests.
I do get a whole lot of annoying messages like this:

Unknown TAP version. The first line MUST be `TAP version `. Assuming
version 12.

The newest minor version has fixed that, it was a misunderstanding about /
imprecision in the tap 14 specification.


Unfortunately, meson v 1.0.1 appears to be broken on Windows, I had to
downgrade back to 1.0.0.

Is it possible that you're using a PG checkout from a few days ago? A
hack I used was invalidated by 1.0.1, but I fixed that already.

CI is running with 1.0.1:
https://cirrus-ci.com/task/5806561726038016?logs=configure#L8


No, running against PG master tip. I'll get some details - it's not too hard
to switch back and forth.

Any more details?



I was held up by difficulties even with meson 1.0.0 (the test modules 
stuff). Now I again have a clean build with meson 1.0.0 on Windows as 
a baseline I will get back to trying meson 1.0.1.





OK, I have now got a clean run using meson 1.0.1 / MSVC. Not sure what 
made the difference. One change I did make was to stop using "--backend 
vs" and thus use the ninja backend even for MSVC. That proved necessary 
to run the new install-test-files target which failed miserably with 
"--backend vs". Not sure if we have it documented, but if not it should 
be that you need to use the ninja backend on all platforms.


At this stage I think I'm prepared to turn this loose on a couple of my 
buildfarm animals, and if nothing goes awry for the remainder of the 
month merge the dev/meson branch and push a new release.


There is still probably a little polishing to do, especially w.r.t. log 
file artefacts.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: meson: Non-feature feature options

2023-03-09 Thread Peter Eisentraut


On 03.03.23 11:01, Nazir Bilal Yavuz wrote:

On Fri, 3 Mar 2023 at 12:16, Peter Eisentraut
 wrote:


On 02.03.23 11:41, Nazir Bilal Yavuz wrote:

I am kind of confused. I added these checks for considering other SSL
implementations in the future, for this reason I have two nested if
checks. The top one is for checking if we need to search an SSL
library and the nested one is for checking if we need to search this
specific SSL library. What do you think?


I suppose that depends on how you envision integrating other SSL
libraries into this logic.  It's not that important right now; if the
structure makes sense to you, that's fine.

Please send an updated patch with the small changes that have been
mentioned.



The updated patch is attached.


This seems to work well.

One flaw, the "External libraries" summary shows something like

 ssl: YES 3.0.7

It would be nice if it showed "openssl".

How about we just hardcode "openssl" here instead?  We could build that 
array dynamically, of course, but maybe we leave that until we actually 
have a need?

Re: meson: Non-feature feature options

2023-03-09 Thread Daniel Gustafsson

> On 9 Mar 2023, at 14:45, Peter Eisentraut  
> wrote:

> How about we just hardcode "openssl" here instead?  We could build that array 
> dynamically, of course, but maybe we leave that until we actually have a need?

At least for 16 keeping it hardcoded is an entirely safe bet so +1 for leaving
additional complexity for when needed.

--
Daniel Gustafsson

Re: meson: Non-feature feature options

2023-03-09 Thread Nazir Bilal Yavuz

Hi,

On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson  wrote:
>
> > On 9 Mar 2023, at 14:45, Peter Eisentraut 
> >  wrote:
>
> > How about we just hardcode "openssl" here instead?  We could build that 
> > array dynamically, of course, but maybe we leave that until we actually 
> > have a need?
>
> At least for 16 keeping it hardcoded is an entirely safe bet so +1 for leaving
> additional complexity for when needed.

We already have the 'ssl_library' variable. Can't we use that instead
of hardcoding 'openssl'? e.g:

summary(
  {
'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl,
  },
  section: 'External libraries',
  list_sep: ', ',
)

And it will output:
ssl: YES 3.0.8, (openssl)

I don't think that using 'ssl_library' will increase the complexity.

Regards,
Nazir Bilal Yavuz
Microsoft

Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-03-09 Thread Robert Haas

On Wed, Mar 8, 2023 at 5:44 PM Jacob Champion  wrote:
>
> On Wed, Mar 8, 2023 at 11:40 AM Robert Haas  wrote:
> > On Wed, Mar 8, 2023 at 2:30 PM Jacob Champion  
> > wrote:
> > > I don't think I necessarily like that option better than SASL-style,
> > > but hopefully that clarifies it somewhat?
> >
> > Hmm, yeah, I guess that's OK.
>
> Okay, cool.
>
> > I still don't love it, though. It feels
> > more solid to me if the proxy can actually block the connections
> > before they even happen, without having to rely on a server
> > interaction to figure out what is permissible.
>
> Sure. I don't see a way for the proxy to figure that out by itself,
> though, going back to my asymmetry argument from before. Only the
> server truly knows, at time of HBA processing, whether the proxy
> itself has authority. If the proxy knew, it wouldn't be confused.
>
> > I don't know what you mean by SASL-style, exactly.
>
> That's the one where the server explicitly names all forms of
> authentication, including the ambient ones (ANONYMOUS, EXTERNAL,
> etc.), and requires the client to choose one before running any
> actions on their behalf. That lets the require_auth machinery work for
> this case, too.
>
> --Jacob



-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: meson: Non-feature feature options

2023-03-09 Thread Daniel Gustafsson

> On 9 Mar 2023, at 15:12, Nazir Bilal Yavuz  wrote:
> 
> Hi,
> 
> On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson  wrote:
>> 
>>> On 9 Mar 2023, at 14:45, Peter Eisentraut 
>>>  wrote:
>> 
>>> How about we just hardcode "openssl" here instead?  We could build that 
>>> array dynamically, of course, but maybe we leave that until we actually 
>>> have a need?
>> 
>> At least for 16 keeping it hardcoded is an entirely safe bet so +1 for 
>> leaving
>> additional complexity for when needed.
> 
> We already have the 'ssl_library' variable. Can't we use that instead
> of hardcoding 'openssl'? e.g:
> 
> summary(
>  {
>'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl,
>  },
>  section: 'External libraries',
>  list_sep: ', ',
> )
> 
> And it will output:
> ssl: YES 3.0.8, (openssl)
> 
> I don't think that using 'ssl_library' will increase the complexity.

That seems like a good idea.

--
Daniel Gustafsson

Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-03-09 Thread Robert Haas

On Wed, Mar 8, 2023 at 5:44 PM Jacob Champion  wrote:
> Sure. I don't see a way for the proxy to figure that out by itself,
> though, going back to my asymmetry argument from before. Only the
> server truly knows, at time of HBA processing, whether the proxy
> itself has authority. If the proxy knew, it wouldn't be confused.

That seems like a circular argument. If you call the problem the
confused deputy problem then the issue must indeed be that the deputy
is confused, and needs to talk to someone else to get un-confused. But
why is the deputy necessarily confused in the first place? Our deputy
is confused because our code to decide whether to proxy a connection
or not is super-dumb, but if there's an intrinsic reason it can't be
smarter, I don't understand what it is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: meson: Non-feature feature options

2023-03-09 Thread Peter Eisentraut


On 09.03.23 15:12, Nazir Bilal Yavuz wrote:

Hi,

On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson  wrote:



On 9 Mar 2023, at 14:45, Peter Eisentraut  
wrote:



How about we just hardcode "openssl" here instead?  We could build that array 
dynamically, of course, but maybe we leave that until we actually have a need?


At least for 16 keeping it hardcoded is an entirely safe bet so +1 for leaving
additional complexity for when needed.


We already have the 'ssl_library' variable. Can't we use that instead
of hardcoding 'openssl'? e.g:

summary(
   {
 'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl,
   },
   section: 'External libraries',
   list_sep: ', ',
)

And it will output:
ssl: YES 3.0.8, (openssl)

I don't think that using 'ssl_library' will increase the complexity.


Then we might as well use ssl_library as the key, like:

{
   ...
   'selinux': selinux,
   ssl_library: ssl,
   'systemd': systemd,
   ...
}

Re: Track IO times in pg_stat_io

2023-03-09 Thread Imseih (AWS), Sami

> >>> Now I've a second thought: what do you think about resetting the related 
> >>> number
> >>> of operations and *_time fields when enabling/disabling track_io_timing? 
> >>> (And mention it in the doc).
> >>>
> >>> That way it'd prevent bad interpretation (at least as far the time per 
> >>> operation metrics are concerned).
> >>>
> >>> Thinking that way as we'd loose some (most?) benefits of the new *_time 
> >>> columns
> >>> if one can't "trust" their related operations and/or one is not sampling 
> >>> pg_stat_io frequently enough (to discard the samples
> >>> where the track_io_timing changes occur).
> >>>
> >>> But well, resetting the operations could also lead to bad interpretation 
> >>> about the operations...
> >>>
> >>> Not sure about which approach I like the most yet, what do you think?
> >>
> >> Oh, this is an interesting idea. I think you are right about the
> >> synchronization issues making the statistics untrustworthy and, thus,
> >> unuseable.
> > 
> > No, I don't think we can do that. It can be enabled on a per-session basis.

> Oh right. So it's even less clear to me to get how one would make use of 
> those new *_time fields, given that:

> - pg_stat_io is "global" across all sessions. So, even if one session is 
> doing some "testing" and needs to turn track_io_timing on, then it
> is even not sure it's only reflecting its own testing (as other sessions may 
> have turned it on too).

> - There is the risk mentioned above of bad interpretations for the "time per 
> operation" metrics.

> - Even if there is frequent enough sampling of it pg_stat_io, one does not 
> know which samples contain track_io_timing changes (at the cluster or session 
> level).

As long as track_io_timing can be toggled, blk_write_time could lead to wrong 
conclusions.
I think it may be helpful to track the blks_read when track_io_timing is enabled
Separately.

blks_read will be as is and give the overall blks_read, while a new column
blks_read_with_timing will only report on blks_read with track_io_timing 
enabled.

blks_read_with_timing should never be larger than blks_read.

This will then make the blks_read_time valuable if it's looked at with
the blks_read_with_timing column.


Regards,

-- 

Sami Imseih
Amazon Web Services (AWS)

Re: Improve logging when using Huge Pages

2023-03-09 Thread Stephen Frost

Greetings,

* Nathan Bossart (nathandboss...@gmail.com) wrote:
> On Wed, Feb 15, 2023 at 10:13:17AM -0800, Nathan Bossart wrote:
> > On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote:
> >> On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote:
> >>> I'm curious why you chose to make this a string instead of an enum.  There
> >>> might be little practical difference, but since there are only three
> >>> possible values, I wonder if it'd be better form to make it an enum.
> >> 
> >> It takes more code to write as an enum - see 002.txt.  I'm not convinced
> >> this is better.
> >> 
> >> But your comment made me fix its , and reconsider the strings,
> >> which I changed to active={unknown/true/false} rather than {unk/on/off}.
> >> It could also be active={unknown/yes/no}...
> > 
> > I think unknown/true/false is fine.  I'm okay with using a string if no one
> > else thinks it should be an enum (or a bool).
> 
> There's been no response for this, so I guess we can proceed with a string
> GUC.

Just happened to see this and I'm not really a fan of this being a
string when it's pretty clear that's not what it actually is.

> +Reports whether huge pages are in use by the current instance.
> +See  for more information.
> 
> I still think we should say "server" in place of "current instance" here.

We certainly use 'server' a lot more in config.sgml than we do
'instance'.  "currently running server" might be closer to how we
describe a running PG system in other parts (we talk about "currently
running server processes", "while the server is running", "When running
a standby server", "when the server is running"; "instance" is used much
less and seems to more typically refer to 'state of files on disk' in my
reading vs. 'actively running process' though there's some of each).

> + {"huge_pages_active", PGC_INTERNAL, PRESET_OPTIONS,
> + gettext_noop("Indicates whether huge pages are in 
> use."),
> + NULL,
> + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | 
> GUC_RUNTIME_COMPUTED
> + },
> 
> I don't think we need to use GUC_RUNTIME_COMPUTED.  'postgres -C' seems to
> always report "unknown" for this GUC, so all this would do is cause that
> command to error unnecessarily when the server is running.

... or we could consider adjusting things to actually try the mmap() and
find out if we'd end up with huge pages or not.  Naturally we'd only
want to do that if the server isn't running though and erroring if it is
would be perfectly correct.  Either that or just refusing to provide it
by an error or other approach (see below) seems entirely reasonable.

> It might be worth documenting exactly what "unknown" means.  IIUC you'll
> only ever see "on" or "off" via SHOW or pg_settings, which doesn't seem
> tremendously obvious.

If we could get rid of that case and just make this a boolean, that
seems like it'd really be the best answer.

To that end- perhaps this isn't appropriate as a GUC at all?  Why not
have this just be a system information function?  Functionally it really
provides the same info- if the server is running then you get back
either true or false, if it's not then you can't call it but that's
hardly different from an unknown or error result.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: meson: Non-feature feature options

2023-03-09 Thread Nazir Bilal Yavuz

Hi,

On Thu, 9 Mar 2023 at 17:18, Peter Eisentraut
 wrote:
>
> On 09.03.23 15:12, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson  wrote:
> >>
> >>> On 9 Mar 2023, at 14:45, Peter Eisentraut 
> >>>  wrote:
> >>
> >>> How about we just hardcode "openssl" here instead?  We could build that 
> >>> array dynamically, of course, but maybe we leave that until we actually 
> >>> have a need?
> >>
> >> At least for 16 keeping it hardcoded is an entirely safe bet so +1 for 
> >> leaving
> >> additional complexity for when needed.
> >
> > We already have the 'ssl_library' variable. Can't we use that instead
> > of hardcoding 'openssl'? e.g:
> >
> > summary(
> >{
> >  'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl,
> >},
> >section: 'External libraries',
> >list_sep: ', ',
> > )
> >
> > And it will output:
> > ssl: YES 3.0.8, (openssl)
> >
> > I don't think that using 'ssl_library' will increase the complexity.
>
> Then we might as well use ssl_library as the key, like:
>
> {
> ...
> 'selinux': selinux,
> ssl_library: ssl,
> 'systemd': systemd,
> ...
> }
>

There will be a problem if ssl is not found. It will output 'none: NO'
because 'ssl_library' is initialized as 'none' for now.  We can
initialize 'ssl_library' as 'ssl' but I am not sure that is a good
idea.

Regards,
Nazir Bilal Yavuz
Microsoft

Re: [PATCH] Align GSS and TLS error handling in PQconnectPoll()

2023-03-09 Thread Stephen Frost

Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Fri, Feb 17, 2023 at 09:01:43AM -0800, Jacob Champion wrote:
> > On Thu, Feb 16, 2023 at 10:59 PM Michael Paquier  
> > wrote:
> >> I am adding Stephen Frost
> >> in CC to see if he has any comments about all this part of the logic
> >> with gssencmode.
> > 
> > Sounds good.
> 
> Hearing nothing on this part, perhaps we should just move on and
> adjust the behavior on HEAD?  Thats seems like one step in the good
> direction.  If this brews right, we could always discuss a backpatch
> at some point, if necessary.
> 
> Thoughts from others?

I agree with matching how SSL is handled here and in a review of the
patch proposed didn't see any issues with it.  Seems like it's probably
something that should also be back-patched and it doesn't look terribly
risky to do so, is there a specific risk that you see?

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-03-09 Thread Aleksander Alekseev

Hi,

> 1. Use internal 64 bit page numbers in SLRUs without changing segments naming.
> 2. Use the larger segment file names in async.c, to lift the current 8
> GB limit on the max number of pending notifications.
> [...]
>
> Where our main focus for PG16 is going to be 1 and 2, and we may try
> to deliver 6 and 7 too if time will permit.
>
> Maxim and I agreed (offlist) that I'm going to submit 1 and 2. The
> patch 1 is ready, please see the attachment. I'm currently working on
> 2 and going to submit it in a bit. It seems to be relatively
> straightforward but I don't want to rush things and want to make sure
> I didn't miss something.

PFA the patch v57 which now includes both 1 and 2.

-- 
Best regards,
Aleksander Alekseev


v57-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch
Description: Binary data


v57-0002-Use-larger-segment-file-names-for-pg_notify.patch
Description: Binary data

Re: Disable rdns for Kerberos tests

2023-03-09 Thread Stephen Frost

Greetings,

* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> On 25 February 2023 00:50:30 EET, Stephen Frost  wrote:
> >Thanks for reviewing!  Comments added and updated the commit message.
> >
> >Unless there's anything else, I'll push this early next week.
> 
> s/capture portal/captive portal/. Other than that, looks good to me.

Push, thanks again!

Stephen


signature.asc
Description: PGP signature

Re: HOT chain validation in verify_heapam()

2023-03-09 Thread Himanshu Upadhyaya

On Wed, Mar 8, 2023 at 7:30 PM Himanshu Upadhyaya <
upadhyaya.himan...@gmail.com> wrote:
Please find the v11 patch with all these changes.

-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com
From f2b262e95fe07dddfec994f20a6d2e76bc12b410 Mon Sep 17 00:00:00 2001
From: Himanshu Upadhyaya 
Date: Thu, 9 Mar 2023 21:18:58 +0530
Subject: [PATCH v11] Implement HOT chain validation in verify_heapam()

Himanshu Upadhyaya, reviewed by Robert Haas, Aleksander Alekseev, Andres Freund.
Some revisions by Robert Haas.
---
 contrib/amcheck/verify_heapam.c   | 291 +-
 src/bin/pg_amcheck/t/004_verify_heapam.pl | 288 -
 2 files changed, 562 insertions(+), 17 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 4fcfd6df72..9cd4a795a0 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -150,7 +150,9 @@ typedef struct HeapCheckContext
 } HeapCheckContext;
 
 /* Internal implementation */
-static void check_tuple(HeapCheckContext *ctx);
+static void check_tuple(HeapCheckContext *ctx,
+		bool *xmin_commit_status_ok,
+		XidCommitStatus *xmin_commit_status);
 static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx,
 			  ToastedAttribute *ta, int32 *expected_chunk_seq,
 			  uint32 extsize);
@@ -160,7 +162,9 @@ static void check_toasted_attribute(HeapCheckContext *ctx,
 	ToastedAttribute *ta);
 
 static bool check_tuple_header(HeapCheckContext *ctx);
-static bool check_tuple_visibility(HeapCheckContext *ctx);
+static bool check_tuple_visibility(HeapCheckContext *ctx,
+   bool *xmin_commit_status_ok,
+   XidCommitStatus *xmin_commit_status);
 
 static void report_corruption(HeapCheckContext *ctx, char *msg);
 static void report_toast_corruption(HeapCheckContext *ctx,
@@ -399,9 +403,16 @@ verify_heapam(PG_FUNCTION_ARGS)
 	for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++)
 	{
 		OffsetNumber maxoff;
+		OffsetNumber predecessor[MaxOffsetNumber];
+		OffsetNumber successor[MaxOffsetNumber];
+		bool		lp_valid[MaxOffsetNumber];
+		bool		xmin_commit_status_ok[MaxOffsetNumber];
+		XidCommitStatus	xmin_commit_status[MaxOffsetNumber];
 
 		CHECK_FOR_INTERRUPTS();
 
+		memset(predecessor, 0, sizeof(OffsetNumber) * MaxOffsetNumber);
+
 		/* Optionally skip over all-frozen or all-visible blocks */
 		if (skip_option != SKIP_PAGES_NONE)
 		{
@@ -433,6 +444,12 @@ verify_heapam(PG_FUNCTION_ARGS)
 		for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
 			 ctx.offnum = OffsetNumberNext(ctx.offnum))
 		{
+			BlockNumber	nextblkno;
+			OffsetNumber nextoffnum;
+
+			successor[ctx.offnum] = InvalidOffsetNumber;
+			lp_valid[ctx.offnum] = false;
+			xmin_commit_status_ok[ctx.offnum] = false;
 			ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
 
 			/* Skip over unused/dead line pointers */
@@ -469,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS)
 	report_corruption(&ctx,
 	  psprintf("line pointer redirection to unused item at offset %u",
 			   (unsigned) rdoffnum));
+
+/*
+ * Record the fact that this line pointer has passed basic
+ * sanity checking, and also the offset number to which it
+ * points.
+ */
+lp_valid[ctx.offnum] = true;
+successor[ctx.offnum] = rdoffnum;
 continue;
 			}
 
@@ -502,11 +527,237 @@ verify_heapam(PG_FUNCTION_ARGS)
 			}
 
 			/* It should be safe to examine the tuple's header, at least */
+			lp_valid[ctx.offnum] = true;
 			ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
 			ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
 
 			/* Ok, ready to check this next tuple */
-			check_tuple(&ctx);
+			check_tuple(&ctx,
+		&xmin_commit_status_ok[ctx.offnum],
+		&xmin_commit_status[ctx.offnum]);
+
+			/*
+			 * If the CTID field of this tuple seems to point to another tuple
+			 * on the same page, record that tuple as the successor of this
+			 * one.
+			 */
+			nextblkno = ItemPointerGetBlockNumber(&(ctx.tuphdr)->t_ctid);
+			nextoffnum = ItemPointerGetOffsetNumber(&(ctx.tuphdr)->t_ctid);
+			if (nextblkno == ctx.blkno && nextoffnum != ctx.offnum)
+successor[ctx.offnum] = nextoffnum;
+		}
+
+		/*
+		 * Update chain validation. Check each line pointer that's got a valid
+		 * successor against that successor.
+		 */
+		ctx.attnum = -1;
+		for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+			 ctx.offnum = OffsetNumberNext(ctx.offnum))
+		{
+			ItemId		curr_lp;
+			ItemId		next_lp;
+			HeapTupleHeader curr_htup;
+			HeapTupleHeader next_htup;
+			TransactionId curr_xmin;
+			TransactionId curr_xmax;
+			TransactionId next_xmin;
+			OffsetNumber nextoffnum = successor[ctx.offnum];
+
+			/*
+			 * The current line pointer may not have a successor, either
+			 * because it's not valid or because it didn't point to anything.
+			 * In either case, we have to give up.
+			 *
+			 * If the current line pointer does point

Re: Add shared buffer hits to pg_stat_io

2023-03-09 Thread Drouvot, Bertrand


Hi,

On 3/9/23 2:23 PM, Melanie Plageman wrote:

On Wed, Mar 8, 2023 at 2:23 PM Andres Freund  wrote:


On 2023-03-08 13:44:32 -0500, Melanie Plageman wrote:

However, I am concerned that, while unlikely, this could be flakey.
Something could happen to force all of those blocks out of shared
buffers (even though they were just read in) before we hit them.


You could make the test query a simple nested loop self-join, that'll prevent
the page being evicted, because it'll still be pinned on the outer side, while
generating hits on the inner side.


Good idea. v3 attached.


Thanks! The added test looks good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Disable rdns for Kerberos tests

2023-03-09 Thread Tom Lane

Stephen Frost  writes:
> Push, thanks again!

Why'd you only change HEAD?  Isn't the test equally fragile in the
back branches?

regards, tom lane

Re: Add LZ4 compression in pg_dump

2023-03-09 Thread Justin Pryzby

On Wed, Mar 01, 2023 at 05:39:54PM +0100, Tomas Vondra wrote:
> On 2/27/23 05:49, Justin Pryzby wrote:
> > On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
> >> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> >>> I have some fixes (attached) and questions while polishing the patch for
> >>> zstd compression.  The fixes are small and could be integrated with the
> >>> patch for zstd, but could be applied independently.
> >>
> >> One more - WriteDataToArchiveGzip() says:
> > 
> > One more again.
> > 
> > The LZ4 path is using non-streaming mode, which compresses each block
> > without persistent state, giving poor compression for -Fc compared with
> > -Fp.  If the data is highly compressible, the difference can be orders
> > of magnitude.
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
> > 12351763
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
> > 21890708
> > 
> > That's not true for gzip:
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
> > 2118869
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
> > 2115832
> > 
> > The function ought to at least use streaming mode, so each block/row
> > isn't compressioned in isolation.  003 is a simple patch to use
> > streaming mode, which improves the -Fc case:
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
> > 15178283
> > 
> > However, that still flushes the compression buffer, writing a block
> > header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
> > still outputs ~10% *more* data than with no compression at all.  And
> > that's for compressible data.
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
> > 12890296
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
> > 11890296
> > 
> > I think this should use the LZ4F API with frames, which are buffered to
> > avoid outputting a header for every single row.  The LZ4F format isn't
> > compatible with the LZ4 format, so (unlike changing to the streaming
> > API) that's not something we can change in a bugfix release.  I consider
> > this an Opened Item.
> > 
> > With the LZ4F API in 004, -Fp and -Fc are essentially the same size
> > (like gzip).  (Oh, and the output is three times smaller, too.)
> > 
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
> > 4155448
> > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
> > 4156548
> 
> Thanks. Those are definitely interesting improvements/optimizations!
> 
> I suggest we track them as a separate patch series - please add them to
> the CF app (I guess you'll have to add them to 2023-07 at this point,
> but we can get them in, I think).

Thanks for looking.  I'm not sure if I'm the best person to write/submit
the patch to implement that for LZ4.  Georgios, would you want to take
on this change ?

I think that needs to be changed for v16, since 1) LZ4F works so much
better like this, and 2) we can't change it later without breaking
compatibility of the dumpfiles by changing the header with another name
other than "lz4".  Also, I imagine we'd want to continue supporting the
ability to *restore* a dumpfile using the old(current) format, which
would be untestable code unless we also preserved the ability to write
it somehow (like -Z lz4-old).

One issue is that LZ4F_createCompressionContext() and
LZ4F_compressBegin() ought to be called in InitCompressorLZ4().  It
seems like it might *need* to be called there to avoid exactly the kind
of issue that I reported with empty LOs with gzip.  But
InitCompressorLZ4() isn't currently passed the ArchiveHandle, so can't
write the header.  And LZ4CompressorState has a simple char *buf, and
not an more elaborate data structure like zlib.  You could work around
that by keeping storing the "len" of the existing buffer, and flushing
it in EndCompressorLZ4(), but that adds needless complexity to the Write
and End functions.  Maybe the Init function should be passed the AH.

-- 
Justin

Re: Improve logging when using Huge Pages

2023-03-09 Thread Justin Pryzby

On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Nathan Bossart (nathandboss...@gmail.com) wrote:
> > On Wed, Feb 15, 2023 at 10:13:17AM -0800, Nathan Bossart wrote:
> > > On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote:
> > >> On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote:
> > >>> I'm curious why you chose to make this a string instead of an enum.  
> > >>> There
> > >>> might be little practical difference, but since there are only three
> > >>> possible values, I wonder if it'd be better form to make it an enum.
> > >> 
> > >> It takes more code to write as an enum - see 002.txt.  I'm not convinced
> > >> this is better.
> > >> 
> > >> But your comment made me fix its , and reconsider the strings,
> > >> which I changed to active={unknown/true/false} rather than {unk/on/off}.
> > >> It could also be active={unknown/yes/no}...
> > > 
> > > I think unknown/true/false is fine.  I'm okay with using a string if no 
> > > one
> > > else thinks it should be an enum (or a bool).
> > 
> > There's been no response for this, so I guess we can proceed with a string
> > GUC.
> 
> Just happened to see this and I'm not really a fan of this being a
> string when it's pretty clear that's not what it actually is.

I don't understand what you mean by that.
Why do you say it isn't a string ?

> > +Reports whether huge pages are in use by the current instance.
> > +See  for more information.
> > 
> > I still think we should say "server" in place of "current instance" here.
> 
> We certainly use 'server' a lot more in config.sgml than we do
> 'instance'.  "currently running server" might be closer to how we
> describe a running PG system in other parts (we talk about "currently
> running server processes", "while the server is running", "When running
> a standby server", "when the server is running"; "instance" is used much
> less and seems to more typically refer to 'state of files on disk' in my
> reading vs. 'actively running process' though there's some of each).

I called it "instance" since the GUC has no meaning when it's not
running.  I'm fine to rename it to "running server".

> > +   {"huge_pages_active", PGC_INTERNAL, PRESET_OPTIONS,
> > +   gettext_noop("Indicates whether huge pages are in 
> > use."),
> > +   NULL,
> > +   GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | 
> > GUC_RUNTIME_COMPUTED
> > +   },
> > 
> > I don't think we need to use GUC_RUNTIME_COMPUTED.  'postgres -C' seems to
> > always report "unknown" for this GUC, so all this would do is cause that
> > command to error unnecessarily when the server is running.
> 
> ... or we could consider adjusting things to actually try the mmap() and
> find out if we'd end up with huge pages or not.

That seems like a bad idea, since it might work one moment and fail one
moment later.  If we could tell in advance whether it was going to work,
we wouldn't be here, and probably also wouldn't have invented
huge_pages=true.

> > It might be worth documenting exactly what "unknown" means.  IIUC you'll
> > only ever see "on" or "off" via SHOW or pg_settings, which doesn't seem
> > tremendously obvious.
> 
> If we could get rid of that case and just make this a boolean, that
> seems like it'd really be the best answer.
> 
> To that end- perhaps this isn't appropriate as a GUC at all?  Why not
> have this just be a system information function?  Functionally it really
> provides the same info- if the server is running then you get back
> either true or false, if it's not then you can't call it but that's
> hardly different from an unknown or error result.

We talked about making it a function ~6 weeks ago.

Is there an agreement to use a function, instead ?

-- 
Justin

Re: Track IO times in pg_stat_io

2023-03-09 Thread Melanie Plageman

Hi, v4 attached addresses these review comments.

On Tue, Mar 7, 2023 at 1:39 PM Andres Freund  wrote:
> On 2023-03-06 11:30:13 -0500, Melanie Plageman wrote:
> > > As pgstat_bktype_io_stats_valid() is called only in Assert(), I think 
> > > that would be a good idea
> > > to also check that if counts are not Zero then times are not Zero.
> >
> > Yes, I think adding some validation around the relationship between
> > counts and timing should help prevent developers from forgetting to call
> > pg_stat_count_io_op() when calling pgstat_count_io_time() (as relevant).
> >
> > However, I think that we cannot check that if IO counts are non-zero
> > that IO times are non-zero, because the user may not have
> > track_io_timing enabled. We can check that if IO times are not zero, IO
> > counts are not zero. I've done this in the attached v3.
>
> And even if track_io_timing is enabled, the timer granularity might be so low
> that we *still* get zeroes.
>
> I wonder if we should get rid of pgStatBlockReadTime, pgStatBlockWriteTime,

And then have pg_stat_reset_shared('io') reset pg_stat_database IO
stats?

> > @@ -1000,11 +1000,27 @@ ReadBuffer_common(SMgrRelation smgr, char 
> > relpersistence, ForkNumber forkNum,
> >
> >   if (isExtend)
> >   {
> > + instr_time  io_start,
> > + io_time;
> > +
> >   /* new buffers are zero-filled */
> >   MemSet((char *) bufBlock, 0, BLCKSZ);
> > +
> > + if (track_io_timing)
> > + INSTR_TIME_SET_CURRENT(io_start);
> > + else
> > + INSTR_TIME_SET_ZERO(io_start);
> > +
>
> I wonder if there's an argument for tracking this in the existing IO stats as
> well. But I guess we've lived with this for a long time...

Not sure I want to include that in this patchset.

> > @@ -2981,16 +2998,16 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, 
> > IOObject io_object,
> >* When a strategy is not in use, the write can only be a "regular" 
> > write
> >* of a dirty shared buffer (IOCONTEXT_NORMAL IOOP_WRITE).
> >*/
> > - pgstat_count_io_op(IOOBJECT_RELATION, io_context, IOOP_WRITE);
> > -
> >   if (track_io_timing)
> >   {
> >   INSTR_TIME_SET_CURRENT(io_time);
> >   INSTR_TIME_SUBTRACT(io_time, io_start);
> >   
> > pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
> >   INSTR_TIME_ADD(pgBufferUsage.blk_write_time, io_time);
> > + pgstat_count_io_time(IOOBJECT_RELATION, io_context, 
> > IOOP_WRITE, io_time);
> >   }
>
> I think this needs a bit of cleanup - pgstat_count_buffer_write_time(),
> pgBufferUsage.blk_write_time++, pgstat_count_io_time() is a bit excessive. We
> might not be able to reduce the whole duplication at this point, but at least
> it should be a bit more centralized.

So, in the attached v4, I've introduced pgstat_io_start() and
pgstat_io_end(...). The end IO function takes the IOObject, IOOp, and
IOContext, in addition to the start_time, so that we know which
pgBufferUsage field to increment and which pgstat_count_buffer_*_time()
to call.

I will note that calling this function now causes pgBufferUsage and
pgStatBlock*Time to be incremented in a couple of places that they were
not before. I think those might have been accidental omissions, so I
think it is okay.

The exception is pgstat_count_write_time() being only called for
relations in shared buffers and not temporary relations while
pgstat_count_buffer_read_time() is called for temporary relations and
relations in shared buffers. I left that behavior as is, though it seems
like it is wrong.

I added pgstat_io_start() to pgstat.c -- not sure if it is best there.

I could separate it into a commit that does this refactoring of the
existing counting (without adding pgstat_count_io_time()) and then
another that adds pgstat_count_io_time(). I hesitated to do that until I
knew that the new functions were viable.

> > + pgstat_count_io_op(IOOBJECT_RELATION, io_context, IOOP_WRITE);
> >   pgBufferUsage.shared_blks_written++;
> >
> >   /*
> > @@ -3594,6 +3611,9 @@ FlushRelationBuffers(Relation rel)
> >
> >   if (RelationUsesLocalBuffers(rel))
> >   {
> > + instr_time  io_start,
> > + io_time;
> > +
> >   for (i = 0; i < NLocBuffer; i++)
> >   {
> >   uint32  buf_state;
> > @@ -3616,6 +3636,11 @@ FlushRelationBuffers(Relation rel)
> >
> >   PageSetChecksumInplace(localpage, 
> > bufHdr->tag.blockNum);
> >
> > + if (track_io_timing)
> > + INSTR_TIME_SET_CURRENT(io_start);
> > + else
> > + INSTR_TIME_SET_ZERO(io_start);
> > +
> >   smgrwrite(RelationGetSmgr(rel),
> >

Re: [PATCH] Add pretty-printed XML output option

2023-03-09 Thread Tom Lane

While reviewing this patch, I started to wonder why we don't eliminate
the maintenance hassle of xml_1.out by putting in a short-circuit
at the top of the test, similar to those in some other scripts:

/* skip test if XML support not compiled in */
SELECT 'one'::xml;
\if :ERROR
\quit
\endif

(and I guess xmlmap.sql could get the same treatment).

The only argument I can think of against it is that the current
approach ensures we produce a clean error (and not, say, a crash)
for all xml.c entry points not just xml_in.  I'm not sure how much
that's worth though.  The compiler/linker would tell us if we miss
compiling out every reference to libxml2.

Thoughts?

regards, tom lane

Re: Infinite Interval

2023-03-09 Thread Ashutosh Bapat

Hi Joseph,

Thanks for working on the patch. Sorry for taking so long to review
this patch. But here's it finally, review of code changes

 static pg_tz *FetchDynamicTimeZone(TimeZoneAbbrevTable *tbl, const datetkn *tp,
-   DateTimeErrorExtra *extra);
+   DateTimeErrorExtra * extra);

There are a lot of these diffs. PG code doesn't leave an extra space between
variable name and *.


 /* Handle the integer part */
-if (!int64_multiply_add(val, scale, &itm_in->tm_usec))
+if (pg_mul_add_s64_overflow(val, scale, &itm_in->tm_usec))

I think this is a good change, since we are moving the function to int.h where
it belongs. We could separate these kind of changes into another patch for easy
review.

+
+result->day = days;
+if (pg_mul_add_s32_overflow(weeks, 7, &result->day))
+ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("interval out of range")));

I think such changes are also good, but probably a separate patch for ease of
review.

 secs = rint(secs * USECS_PER_SEC);
-result->time = hours mj* ((int64) SECS_PER_HOUR * USECS_PER_SEC) +
-mins * ((int64) SECS_PER_MINUTE * USECS_PER_SEC) +
-(int64) secs;
+
+result->time = secs;
+if (pg_mul_add_s64_overflow(mins, ((int64) SECS_PER_MINUTE *
USECS_PER_SEC), &result->time))
+ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("interval out of range")));
+if (pg_mul_add_s64_overflow(hours, ((int64) SECS_PER_HOUR *
USECS_PER_SEC), &result->time))
+ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("interval out of range")));

shouldn't this be
secs = rint(secs);
result->time = 0;
pg_mul_add_s64_overflow(secs, USECS_PER_SEC, &result->time) to catch
overflow error early?

+if TIMESTAMP_IS_NOBEGIN
+(dt2)

Better be written as if (TIMESTAMP_IS_NOBEGIN(dt2))? There are more corrections
like this.

+if (INTERVAL_NOT_FINITE(result))
+ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("interval out of range")));

Probably, I added these kind of checks. But I don't remember if those are
defensive checks or whether it's really possible that the arithmetic above
these lines can yield an non-finite interval.


+else
+{
+result->time = -interval->time;
+result->day = -interval->day;
+result->month = -interval->month;
+
+if (INTERVAL_NOT_FINITE(result))
+ereport(ERROR,
+(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("interval out of range")));

If this error ever gets to the user, it could be confusing. Can we elaborate by
adding context e.g. errcontext("while negating an interval") or some such?

-
-result->time = -interval->time;
-/* overflow check copied from int4um */
-if (interval->time != 0 && SAMESIGN(result->time, interval->time))
-ereport(ERROR,
-(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
- errmsg("interval out of range")));
-result->day = -interval->day;
-if (interval->day != 0 && SAMESIGN(result->day, interval->day))
-ereport(ERROR,
-(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
- errmsg("interval out of range")));
-result->month = -interval->month;
-if (interval->month != 0 && SAMESIGN(result->month, interval->month))
-ereport(ERROR,
-(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
- errmsg("interval out of range")));
+interval_um_internal(interval, result);

Shouldn't we incorporate these checks in interval_um_internal()? I don't think
INTERVAL_NOT_FINITE() covers all of those.

+/*
+ * Subtracting two infinite intervals with different signs results in an
+ * infinite interval with the same sign as the left operand. Subtracting
+ * two infinte intervals with the same sign results in an error.
+ */

I think we need someone to validate these assumptions and similar assumptions
in interval_pl(). Googling gives confusing results in some cases. I have not
looked for IEEE standard around this specificallly.

+if (INTERVAL_NOT_FINITE(interval))
+{
+doubler = NonFiniteIntervalPart(type, val, lowunits,
+  INTERVAL_IS_NOBEGIN(interval),
+  false);
+
+if (r)

I see that this code is very similar to the corresponding code in timestamp and
timestamptz, so it's bound to be correct. But I always thought float equality
is unreliable. if (r) is equivalent to if (r == 0.0) so it will not work as
intended. But may be (float) 0.0 is a special value for which equality holds
true.

+static inline boo

Re: Improve logging when using Huge Pages

2023-03-09 Thread Alvaro Herrera

On 2023-Mar-09, Justin Pryzby wrote:

> On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote:

> > > +Reports whether huge pages are in use by the current instance.
> > > +See  for more information.
> > > 
> > > I still think we should say "server" in place of "current instance" here.
> > 
> > We certainly use 'server' a lot more in config.sgml than we do
> > 'instance'.  "currently running server" might be closer to how we
> > describe a running PG system in other parts (we talk about "currently
> > running server processes", "while the server is running", "When running
> > a standby server", "when the server is running"; "instance" is used much
> > less and seems to more typically refer to 'state of files on disk' in my
> > reading vs. 'actively running process' though there's some of each).
> 
> I called it "instance" since the GUC has no meaning when it's not
> running.  I'm fine to rename it to "running server".

I'd rather make all these other places use "instance" instead.  We used
to consider these terms interchangeable, but since we introduced the
glossary to unify the terminology, they are no longer supposed to be.
A server (== a machine) can contain many instances, and each individual
instance in the server could be using huge pages or not.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."

Re: Add LZ4 compression in pg_dump

2023-03-09 Thread Tomas Vondra




On 3/9/23 17:15, Justin Pryzby wrote:
> On Wed, Mar 01, 2023 at 05:39:54PM +0100, Tomas Vondra wrote:
>> On 2/27/23 05:49, Justin Pryzby wrote:
>>> On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
 On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> I have some fixes (attached) and questions while polishing the patch for
> zstd compression.  The fixes are small and could be integrated with the
> patch for zstd, but could be applied independently.

 One more - WriteDataToArchiveGzip() says:
>>>
>>> One more again.
>>>
>>> The LZ4 path is using non-streaming mode, which compresses each block
>>> without persistent state, giving poor compression for -Fc compared with
>>> -Fp.  If the data is highly compressible, the difference can be orders
>>> of magnitude.
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
>>> 12351763
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
>>> 21890708
>>>
>>> That's not true for gzip:
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
>>> 2118869
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
>>> 2115832
>>>
>>> The function ought to at least use streaming mode, so each block/row
>>> isn't compressioned in isolation.  003 is a simple patch to use
>>> streaming mode, which improves the -Fc case:
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
>>> 15178283
>>>
>>> However, that still flushes the compression buffer, writing a block
>>> header, for every row.  With a single-column table, pg_dump -Fc -Z lz4
>>> still outputs ~10% *more* data than with no compression at all.  And
>>> that's for compressible data.
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
>>> 12890296
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
>>> 11890296
>>>
>>> I think this should use the LZ4F API with frames, which are buffered to
>>> avoid outputting a header for every single row.  The LZ4F format isn't
>>> compatible with the LZ4 format, so (unlike changing to the streaming
>>> API) that's not something we can change in a bugfix release.  I consider
>>> this an Opened Item.
>>>
>>> With the LZ4F API in 004, -Fp and -Fc are essentially the same size
>>> (like gzip).  (Oh, and the output is three times smaller, too.)
>>>
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
>>> 4155448
>>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
>>> 4156548
>>
>> Thanks. Those are definitely interesting improvements/optimizations!
>>
>> I suggest we track them as a separate patch series - please add them to
>> the CF app (I guess you'll have to add them to 2023-07 at this point,
>> but we can get them in, I think).
> 
> Thanks for looking.  I'm not sure if I'm the best person to write/submit
> the patch to implement that for LZ4.  Georgios, would you want to take
> on this change ?
> 
> I think that needs to be changed for v16, since 1) LZ4F works so much
> better like this, and 2) we can't change it later without breaking
> compatibility of the dumpfiles by changing the header with another name
> other than "lz4".  Also, I imagine we'd want to continue supporting the
> ability to *restore* a dumpfile using the old(current) format, which
> would be untestable code unless we also preserved the ability to write
> it somehow (like -Z lz4-old).
> 

I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to
lz4f, doesn't that mean it (e.g. restore) won't work on systems that
only have older lz4 version? What would/should happen if we take backup
compressed with lz4f, an then try restoring it on an older system where
lz4 does not support lz4f?

Maybe if lz4f format is incompatible with regular lz4, we should treat
it as a separate compression method 'lz4f'?

I'm mostly afk until the end of the week, but I tried searching for lz4f
info - the results are not particularly enlightening, unfortunately.

AFAICS this only applies to lz4f stuff. Or would the streaming mode be a
breaking change too?

> One issue is that LZ4F_createCompressionContext() and
> LZ4F_compressBegin() ought to be called in InitCompressorLZ4().  It
> seems like it might *need* to be called there to avoid exactly the kind
> of issue that I reported with empty LOs with gzip.  But
> InitCompressorLZ4() isn't currently passed the ArchiveHandle, so can't
> write the header.  And LZ4CompressorState has a simple char *buf, and
> not an more elaborate data structure like zlib.  You could work around
> that by keeping storing the "len" of the existing buffer, and flushing
> it in EndCompressorLZ4(), but that adds needless complexity to the Write
> and End functions.  Maybe the Init function should be passed the AH.
> 

Not sure, but looking at GzipCompressorState I see the only extra thing
it has (compared to LZ4CompressorState) is "z_streamp". I can't
experiment with this until the end of t

Re: improving user.c error messages

2023-03-09 Thread Nathan Bossart

On Thu, Mar 09, 2023 at 10:55:54AM +0100, Peter Eisentraut wrote:
> On 20.02.23 23:58, Nathan Bossart wrote:
>> For now, I've reworded these as "must inherit privileges of".
> 
> I don't have a good mental model of all this role inheritance, personally,
> but I fear that this change makes the messages more jargony and less clear.
> Maybe the original wording was good enough.

I'm fine with that.

> "admin option" is sort of a natural language term, I think, so we don't need
> to parametrize it as "%s option".  Also, there are no other "options" in
> this context, I think.

v16 introduces the INHERIT and SET options.  I don't have a strong opinion
about parameterizing it, though.  My intent was to consistently capitalize
all the attributes and options.

> A general thought: It seems we currently don't have any error messages that
> address the user like "You must do this".  Do we want to go there? Should we
> try for a more impersonal wording like
> 
> "You must have the %s attribute to create roles."
> 
> "Current user must have the %s attribute to create roles."
> 
> "%s attribute is required to create roles."

I think I like the last option the most.  In general, I agree with trying
to avoid the second-person phrasing.

> By the way, I'm not sure what the separation between 0001 and 0002 is
> supposed to be.

I'll combine them.  I first started with user.c only, but we kept finding
new messages to improve.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

2023-03-09 Thread Tomas Vondra




On 3/9/23 01:30, Michael Paquier wrote:
> On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:
>> IMO we should fix that. We have a bunch of buildfarm members running on
>> Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
>> tests. But considering how trivial the fix is ...
>>
>> Barring objections, I'll push a fix early next week.
> 
> +1, better to change that, thanks.  Actually, would --rm be OK even on
> Windows?  As far as I can see, the CI detects a LZ4 command for the
> VS2019 environment but not the liblz4 libraries that would be needed
> to trigger the set of tests.

Thanks for noticing that. I'll investigate next week.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Add pretty-printed XML output option

2023-03-09 Thread Jim Jones


On 09.03.23 18:38, Tom Lane wrote:

While reviewing this patch, I started to wonder why we don't eliminate
the maintenance hassle of xml_1.out by putting in a short-circuit
at the top of the test, similar to those in some other scripts:

/* skip test if XML support not compiled in */
SELECT 'one'::xml;
\if :ERROR
\quit
\endif

(and I guess xmlmap.sql could get the same treatment).

The only argument I can think of against it is that the current
approach ensures we produce a clean error (and not, say, a crash)
for all xml.c entry points not just xml_in.  I'm not sure how much
that's worth though.  The compiler/linker would tell us if we miss
compiling out every reference to libxml2.

Thoughts?

regards, tom lane


Hi Tom,

I agree it would make things easier and it could indeed save some time 
(and some CI runs ;)).


However, checking in the absence of libxml2 if an error message is 
raised, and checking if this error message is the one we expect, is IMHO 
also a very nice test. But I guess I could also live with skipping the 
whole thing.


Best, Jim


smime.p7s
Description: S/MIME Cryptographic Signature

WaitEventSet resource leakage

2023-03-09 Thread Tom Lane

In [1] I wrote:

> PG Bug reporting form  writes:
>> The following script:
>> [ leaks a file descriptor per error ]
> 
> Yeah, at least on platforms where WaitEventSets own kernel file
> descriptors.  I don't think it's postgres_fdw's fault though,
> but that of ExecAppendAsyncEventWait, which is ignoring the
> possibility of failing partway through.  It looks like it'd be
> sufficient to add a PG_CATCH or PG_FINALLY block there to make
> sure the WaitEventSet is disposed of properly --- fortunately,
> it doesn't need to have any longer lifespan than that one
> function.

After further thought that seems like a pretty ad-hoc solution.
We probably can do no better in the back branches, but shouldn't
we start treating WaitEventSets as ResourceOwner-managed resources?
Otherwise, transient WaitEventSets are going to be a permanent
source of headaches.

regards, tom lane

[1] https://www.postgresql.org/message-id/423731.1678381075%40sss.pgh.pa.us

Re: Move defaults toward ICU in 16?

2023-03-09 Thread Jeff Davis

On Thu, 2023-03-09 at 10:36 +0100, Peter Eisentraut wrote:
> 0002 seems fine to me.

Committed 0002 with some test improvements.

> 
> Let's come back to that after dealing with the other two.

Leaving 0001 open for now.

0003 committed after addressing your comments.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Add standard collation UNICODE

2023-03-09 Thread Jeff Davis

On Thu, 2023-03-09 at 11:21 +0100, Peter Eisentraut wrote:
> How about this patch version?

Looks good to me.

Regards,
Jeff Davis

Re: psql \watch 2nd argument: iteration count

2023-03-09 Thread Nathan Bossart

+   pg_log_error("Watch period must be non-negative 
number, but argument is '%s'", opt);

After looking around at the other error messages in this file, I think we
should make this more concise.  Maybe something like

pg_log_error("\\watch: invalid delay interval: %s", opt);

+   free(opt);
+   resetPQExpBuffer(query_buf);
+   return PSQL_CMD_ERROR;

Is this missing psql_scan_reset(scan_state)?

I haven't had a chance to look closely at 0002 yet.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-03-09 Thread Andres Freund

Hi,

On 2023-03-09 06:51:31 -0600, Justin Pryzby wrote:
> On Tue, Mar 07, 2023 at 10:18:44AM -0800, Andres Freund wrote:
> > Hi,
> > 
> > On 2023-03-06 15:21:14 -0500, Melanie Plageman wrote:
> > > Good point. Attached is what you suggested. I committed the transaction
> > > before the drop table so that the statistics would be visible when we
> > > queried pg_stat_io.
> > 
> > Pushed, thanks for the report, analysis and fix, Tom, Horiguchi-san, 
> > Melanie.
> 
> There's a 2nd portion of the test that's still flapping, at least on
> cirrusci.
> 
> The issue that Tom mentioned is at:
>  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
> 
> But what I've seen on cirrusci is at:
>  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;

Seems you meant to copy a different line for Tom's (s/writes/redas/)?


> https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs

Hm. I guess the explanation here is that the buffers were already all written
out by another backend. Which is made more likely by your patch.


I found a few more occurances and chatted with Melanie. Melanie will come up
with a fix I think.

Greetings,

Andres Freund

Re: Improve logging when using Huge Pages

2023-03-09 Thread Nathan Bossart

On Thu, Mar 09, 2023 at 10:38:56AM -0600, Justin Pryzby wrote:
> On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote:
>> To that end- perhaps this isn't appropriate as a GUC at all?  Why not
>> have this just be a system information function?  Functionally it really
>> provides the same info- if the server is running then you get back
>> either true or false, if it's not then you can't call it but that's
>> hardly different from an unknown or error result.
> 
> We talked about making it a function ~6 weeks ago.
> 
> Is there an agreement to use a function, instead ?

If we're tallying votes, please count me as +1 for a system information
function.  I think that nicely sidesteps having to return "unknown" in some
cases, which I worry will just cause confusion.  IMHO it is much more
obvious that the value refers to the current server if you have to log in
and execute a function to see it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: buildfarm + meson

2023-03-09 Thread Andrew Dunstan



On 2023-03-09 Th 08:28, Andrew Dunstan wrote:




At this stage I think I'm prepared to turn this loose on a couple of 
my buildfarm animals, and if nothing goes awry for the remainder of 
the month merge the dev/meson branch and push a new release.


There is still probably a little polishing to do, especially w.r.t. 
log file artefacts.







A few things I've found:

. We don't appear to have an equivalent of the headerscheck and 
cpluspluscheck GNUmakefile targets


. I don't know how to build other docs targets (e.g. postgres-US.pdf)

. There appears to be some mismatch in database names (e.g. 
regression_dblink vs contrib_regression_dblink). That's going to cause 
some issues with the module that adjusts things for cross version upgrade.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Disable rdns for Kerberos tests

2023-03-09 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Push, thanks again!
> 
> Why'd you only change HEAD?  Isn't the test equally fragile in the
> back branches?

We hadn't had any complaints about it and so I wasn't sure if it was
useful to back-patch it.  I'm happy to do so though.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Improve logging when using Huge Pages

2023-03-09 Thread Nathan Bossart

On Thu, Mar 09, 2023 at 06:51:21PM +0100, Alvaro Herrera wrote:
> I'd rather make all these other places use "instance" instead.  We used
> to consider these terms interchangeable, but since we introduced the
> glossary to unify the terminology, they are no longer supposed to be.
> A server (== a machine) can contain many instances, and each individual
> instance in the server could be using huge pages or not.

Ah, good to know.  I've always considered "server" in this context to mean
the server process(es) for a single instance, but I can see the value in
having different terminology to clearly distinguish the process(es) from
the machine.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: buildfarm + meson

2023-03-09 Thread Andres Freund

Hi,

On 2023-03-09 14:47:36 -0500, Andrew Dunstan wrote:
> On 2023-03-09 Th 08:28, Andrew Dunstan wrote:
> > At this stage I think I'm prepared to turn this loose on a couple of my
> > buildfarm animals, and if nothing goes awry for the remainder of the
> > month merge the dev/meson branch and push a new release.

Cool!


> > There is still probably a little polishing to do, especially w.r.t. log
> > file artefacts.

> A few things I've found:
> 
> . We don't appear to have an equivalent of the headerscheck and
> cpluspluscheck GNUmakefile targets

Yes. I have a pending patch for it, but haven't yet cleaned it up
sufficiently. The way headercheck/cpluspluscheck query information from
Makefile.global is somewhat nasty.


> . I don't know how to build other docs targets (e.g. postgres-US.pdf)

There's an 'alldocs' target, or you can do ninja doc/src/sgml/postgres-US.pdf


> . There appears to be some mismatch in database names (e.g.
> regression_dblink vs contrib_regression_dblink). That's going to cause some
> issues with the module that adjusts things for cross version upgrade.

I guess we can try to do something about that, but the make situation is
overly complicated. I don't really want to emulate having randomly differing
database names just because a test is in contrib/ rather than src/.

Greetings,

Andres Freund

Re: Improve logging when using Huge Pages

2023-03-09 Thread Stephen Frost

Greetings,

* Justin Pryzby (pry...@telsasoft.com) wrote:
> On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote:
> > * Nathan Bossart (nathandboss...@gmail.com) wrote:
> > > On Wed, Feb 15, 2023 at 10:13:17AM -0800, Nathan Bossart wrote:
> > > > On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote:
> > > >> On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote:
> > > >>> I'm curious why you chose to make this a string instead of an enum.  
> > > >>> There
> > > >>> might be little practical difference, but since there are only three
> > > >>> possible values, I wonder if it'd be better form to make it an enum.
> > > >> 
> > > >> It takes more code to write as an enum - see 002.txt.  I'm not 
> > > >> convinced
> > > >> this is better.
> > > >> 
> > > >> But your comment made me fix its , and reconsider the strings,
> > > >> which I changed to active={unknown/true/false} rather than 
> > > >> {unk/on/off}.
> > > >> It could also be active={unknown/yes/no}...
> > > > 
> > > > I think unknown/true/false is fine.  I'm okay with using a string if no 
> > > > one
> > > > else thinks it should be an enum (or a bool).
> > > 
> > > There's been no response for this, so I guess we can proceed with a string
> > > GUC.
> > 
> > Just happened to see this and I'm not really a fan of this being a
> > string when it's pretty clear that's not what it actually is.
> 
> I don't understand what you mean by that.
> Why do you say it isn't a string ?

Because it's clearly a yes/no, either the server is currently running
with huge pages, or it isn't.  That's the definition of a boolean.
Sure, anything can be cast to text but when there's a data type that
fits better, that's almost uniformly better to use.

> > > +Reports whether huge pages are in use by the current instance.
> > > +See  for more information.
> > > 
> > > I still think we should say "server" in place of "current instance" here.
> > 
> > We certainly use 'server' a lot more in config.sgml than we do
> > 'instance'.  "currently running server" might be closer to how we
> > describe a running PG system in other parts (we talk about "currently
> > running server processes", "while the server is running", "When running
> > a standby server", "when the server is running"; "instance" is used much
> > less and seems to more typically refer to 'state of files on disk' in my
> > reading vs. 'actively running process' though there's some of each).
> 
> I called it "instance" since the GUC has no meaning when it's not
> running.  I'm fine to rename it to "running server".

Great, I do think that would match better with the rest of the
documentation.

> > > + {"huge_pages_active", PGC_INTERNAL, PRESET_OPTIONS,
> > > + gettext_noop("Indicates whether huge pages are in 
> > > use."),
> > > + NULL,
> > > + GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | 
> > > GUC_RUNTIME_COMPUTED
> > > + },
> > > 
> > > I don't think we need to use GUC_RUNTIME_COMPUTED.  'postgres -C' seems to
> > > always report "unknown" for this GUC, so all this would do is cause that
> > > command to error unnecessarily when the server is running.
> > 
> > ... or we could consider adjusting things to actually try the mmap() and
> > find out if we'd end up with huge pages or not.
> 
> That seems like a bad idea, since it might work one moment and fail one
> moment later.  If we could tell in advance whether it was going to work,
> we wouldn't be here, and probably also wouldn't have invented
> huge_pages=true.

Sure it might ... but I tend to disagree that it's actually all that
likely for it to end up being as inconsistent as that and it'd be nice
to be able to see if the server will end up successfully starting (for
this part, at least), or not, when configured with huge pages set to on,
or if starting with 'try' is likely to result in it actually using huge
pages, or not.

> > > It might be worth documenting exactly what "unknown" means.  IIUC you'll
> > > only ever see "on" or "off" via SHOW or pg_settings, which doesn't seem
> > > tremendously obvious.
> > 
> > If we could get rid of that case and just make this a boolean, that
> > seems like it'd really be the best answer.
> > 
> > To that end- perhaps this isn't appropriate as a GUC at all?  Why not
> > have this just be a system information function?  Functionally it really
> > provides the same info- if the server is running then you get back
> > either true or false, if it's not then you can't call it but that's
> > hardly different from an unknown or error result.
> 
> We talked about making it a function ~6 weeks ago.

Oh, good, glad I'm not the only one to have thought of that.

> Is there an agreement to use a function, instead ?

Looking back at the arguments for having it be a GUC ... I just don't
really see any of them as very strong.  Not that I feel super strongly
about it being a function either, but it's certainly not a configuration
va

Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

2023-03-09 Thread Mark Dilger




> On Mar 8, 2023, at 4:15 PM, Andres Freund  wrote:
> 
> I worked some more on the fixes for amcheck, and fixes for amcheck.
> 
> The second amcheck fix ends up correcting some inaccurate output in the tests
> - previously xids from before xid 0 were reported to be in the future.
> 
> Previously there was no test case exercising exceeding nextxid, without
> wrapping around into the past. I added that at the end of
> 004_verify_heapam.pl, because renumbering seemed too annoying.
> 
> What do you think?

The changes look reasonable to me.

> Somewhat random note:
> 
> Is it intentional that we VACUUM FREEZE test ROWCOUNT times? That's
> effectively O(ROWCOUNT^2), albeit with small enough constants to not really
> matter. I don't think we need to insert the rows one-by-one either. Changing
> that to a single INSERT and FREEZE shaves 10-12% off the tests.  I didn't
> change that, but we also fire off a psql for each tuple for heap_page_items(),
> with offset $N no less. That seems to be another 500ms.

I don't recall the reasoning.  Feel free to optimize the tests.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Improve logging when using Huge Pages

2023-03-09 Thread Stephen Frost

Greetings,

* Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> On 2023-Mar-09, Justin Pryzby wrote:
> > On Thu, Mar 09, 2023 at 09:34:10AM -0500, Stephen Frost wrote:
> > > > +Reports whether huge pages are in use by the current instance.
> > > > +See  for more information.
> > > > 
> > > > I still think we should say "server" in place of "current instance" 
> > > > here.
> > > 
> > > We certainly use 'server' a lot more in config.sgml than we do
> > > 'instance'.  "currently running server" might be closer to how we
> > > describe a running PG system in other parts (we talk about "currently
> > > running server processes", "while the server is running", "When running
> > > a standby server", "when the server is running"; "instance" is used much
> > > less and seems to more typically refer to 'state of files on disk' in my
> > > reading vs. 'actively running process' though there's some of each).
> > 
> > I called it "instance" since the GUC has no meaning when it's not
> > running.  I'm fine to rename it to "running server".
> 
> I'd rather make all these other places use "instance" instead.  We used
> to consider these terms interchangeable, but since we introduced the
> glossary to unify the terminology, they are no longer supposed to be.
> A server (== a machine) can contain many instances, and each individual
> instance in the server could be using huge pages or not.

Perhaps, but then the references to instance should probably also be
changed since there's certainly some that are referring to a set of data
files and not to backend processes, eg:

##
The shutdown setting is useful to have the instance
ready at the exact replay point desired.  The instance will still be
able to replay more WAL records (and in fact will have to replay WAL
records since the last checkpoint next time it is started).
##

Not sure about just changing one thing at a time though or using the
'right' term when introducing things while having the 'wrong' term be
used next to it.  Also not saying that this patch should be responsible
for fixing everything either.  'Server' in the glossary does explicitly
say it could be used when referring to an 'instance' too though, so
'running server' doesn't seem to be entirely wrong.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: ICU locale validation / canonicalization

2023-03-09 Thread Jeff Davis

On Thu, 2023-03-09 at 09:46 +0100, Peter Eisentraut wrote:
> This patch appears to do about three things at once, and it's not
> clear 
> exactly where the boundaries are between them and which ones we might
> actually want.  And I think the terminology also gets mixed up a bit,
> which makes following this harder.
> 
> 1. Canonicalizing the locale string.  This is presumably what 
> uloc_canonicalize() does, which the patch doesn't actually use.  What
> are examples of what this does?  Does the patch actually do this?

Both uloc_canonicalize() and uloc_getLanguageTag() do Level 2
Canonicalization, which is described here:

https://unicode-org.github.io/icu/userguide/locale/#canonicalization

> 2. Converting the locale string to BCP 47 format.  This converts 
> 'de@collation=phonebook' to 'de-u-co-phonebk'.  This is what 
> uloc_getLanguageTag() does.

Yes, though uloc_getLanguageTag() also canonicalizes. I consider
converting to the language tag a part of "canonicalization", because
it's the canonical form we agreed on in this thread.

> 3. Validating the locale string, to reject faulty input.

Canonicalization doesn't make sure the locale actually exists in ICU,
so it's easy to make a typo like "jp_JP" instead of "ja_JP". After
canonicalizing to a language tag, the former is "jp-JP" (resolving to
the collator with valid locale "root") and the latter is "ja-JP"
(resolving to the collator with valid locale "ja"). The former is
clearly a mistake, and I call catching that mistake "validation".

If the user specifies something other than the root locale (i.e. not
"root", "und", or ""), and the locale resolves to a collator with a
valid locale of "root", then this patch considers that to be a mistake
and issues a WARNING (upgraded to ERROR if the GUC
icu_locale_validation is true).

> What are the relationships between these?

1 & 2 are closely related. If we canonicalize, we need to pick one
canonical form: either BCP 47 or ICU format locale IDs.

3 is related, but can be seen as an independent change.

> I don't understand how the validation actually happens in your patch.
> Does uloc_getLanguageTag() do the validation also?

Using the above definition of "validation" it happens inside
icu_collator_exists().

> Can you do canonicalization without converting to language tag?

If we used uloc_canonicalize(), it would give us ICU format locale IDs,
and that would be a valid thing to do; and we could switch the
canonical form from ICU format locale IDs to BCP 47 in a separate
patch. I don't have a strong opinion, but if we're going to
canonicalize, I think it makes sense to go straight to language tags.

> Can you do validation of un-canonicalized locale names?

Yes, though I feel like an un-canonicalized name is less stable in
meaning, and so validation on that name may also be less stable.

For instance, if we don't canonicalize "fr_CA.UTF-8", it resolves to
plain "fr"; but if we do canonicalize it first, it resolves to "fr-CA".
Will the uncanonicalized name always resolve to "fr"? I'm not sure,
because the documentation says that ucol_open() expects either an ICU
format locale ID or, preferably, a language tag.

So they are technically independently useful changes, but I would
recommend that canonicalization goes in first.

> What is the guidance for the use of the icu_locale_validation GUC?

If an error when creating a new collation or database due to a bad
locale name would be highly disruptive, leave it false. If such an
error would be helpful to make sure you get the locale you expect, then
turn it on. In practice, existing important production systems would
leave it off; new systems could turn it on to help avoid
misconfigurations/mistakes.

> The description throws in yet another term: "validates that ICU
> locale 
> strings are well-formed".  What is "well-formed"?  How does that
> relate 
> to the other concepts?

Good point, I don't think I need to redefine "validation". Maybe I
should just describe it as elevating canonicalization or validation
problems from WARNING to ERROR.

> Personally, I'm not on board with this behavior:
> 
> => CREATE COLLATION test (provider = icu, locale = 
> 'de@collation=phonebook');
> NOTICE:  0: using language tag "de-u-co-phonebk" for locale 
> "de@collation=phonebook"
> 
> I mean, maybe that is a thing we want to do somehow sometime, to
> migrate 
> people to the "new" spellings, but the old ones aren't wrong.

I see what you mean; I'm not sure the best thing to do here. We are
adjusting the string passed by the user, and it feels like some users
might want to know that. It's a NOTICE, not a WARNING, so it's not
meant to imply that it's wrong.

But at the same time I can see it being annoying or confusing. If it's
confusing, perhaps a wording change and documentation would improve it?
If it's annoying, we might need to have an option and/or a different
log level?

> It also doesn't appear to address 
> how to handle ICU before version 54.

Do you have a specific c

Re: [PATCH] Add pretty-printed XML output option

2023-03-09 Thread Tom Lane

Peter Smith  writes:
> The patch v19 LGTM.

I've looked through this now, and have some minor complaints and a major
one.  The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT.  For example,

regression=# select '42'::xml is 
document;
 ?column? 
--
 f
(1 row)

regression=# select xmlserialize (content '42' as text);
   xmlserialize
---
 42
(1 row)

regression=# select xmlserialize (content '42' as text indent);
ERROR:  invalid XML document
DETAIL:  line 1: Extra content at the end of the document
42
  ^

This is not what the documentation promises, and I don't think it's
good enough --- the SQL spec has no restriction saying you can't
use INDENT with CONTENT.  I tried adjusting things so that we call
xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
but all that got me was empty output (except for a document header).
It seems like xmlDocDumpFormatMemory is not the thing to use, at least
not in the CONTENT case.  But libxml2 has a few other "dump"
functions, so maybe we can use a different one?  I see we are using
xmlNodeDump elsewhere, and that has a format option, so maybe there's
a way forward there.

A lesser issue is that INDENT tacks on a document header (XML declaration)
whether there was one or not.  I'm not sure whether that's an appropriate
thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
case.  We have code that can strip off the header again, but we
need to figure out exactly when to apply it.

I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding.  If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.

Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified.  We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.

I also had a bunch of cosmetic complaints (mostly around this having
a bad case of add-at-the-end-itis), which I've cleaned up in the
attached v20.  This doesn't address any of the above, however.

regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..3dcd15d5f0 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 Datum	   *argvalue = op->d.xmlexpr.argvalue;
 bool	   *argnull = op->d.xmlexpr.argnull;
+text	   *result;
 
 /* argument type is known to be xml */
 Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 	return;
 value = argvalue[0];
 
-*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
-		 xexpr->xmloption));
+result = xmltotext_with_xmloption(DatumG

Re: buildfarm + meson

2023-03-09 Thread Tom Lane

Andres Freund  writes:
> On 2023-03-09 14:47:36 -0500, Andrew Dunstan wrote:
>> . There appears to be some mismatch in database names (e.g.
>> regression_dblink vs contrib_regression_dblink). That's going to cause some
>> issues with the module that adjusts things for cross version upgrade.

> I guess we can try to do something about that, but the make situation is
> overly complicated. I don't really want to emulate having randomly differing
> database names just because a test is in contrib/ rather than src/.

We could talk about adjusting the behavior on the make side instead,
perhaps, but something needs to be done there eventually.

Having said that, I'm not sure that the first meson-capable buildfarm
version needs to support cross-version-upgrade testing.

regards, tom lane

Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-09 Thread Robert Haas

On Wed, Mar 8, 2023 at 9:52 PM Thomas Munro  wrote:
> Yeah.  We knew that this didn't work (was discussed in a couple of
> other threads), but you might be right that an error would be better
> for now.  It's absolutely not a user facing mode of operation, it was
> intended just for the replication tests.  Clearly it might be useful
> for testing purposes in the backup area too, which is probably how
> Robert got here.  I will think about what changes would be needed as I
> am looking at backup code currently, but I'm not sure when I'll be
> able to post a patch so don't let that stop anyone else who sees how
> to get it working...

If there had been an error message like "ERROR: pg_basebackup cannot
back up a database that contains in-place tablespaces," it would have
saved me a lot of time yesterday. If there had at least been a
documentation mention, I would have found it eventually (but not
nearly as quickly). As it is, the only reference to this topic in the
repository seems to be c6f2f01611d4f2c412e92eb7893f76fa590818e8, "Fix
pg_basebackup with in-place tablespaces," which makes it look like
this is supposed to be working.

I also think that if we're going to have in-place tablespaces, it's a
good idea for them to work with pg_basebackup. I wasn't really looking
to test pg_basebackup with this feature (although it's a good idea); I
was just trying to make sure I didn't break in-place tablespaces while
working on something else. I think it's sometimes OK to add stuff for
testing that doesn't work with absolutely everything we have in the
system, but the tablespace code is awfully closely related to the
pg_basebackup code for them to just not work together at all.

Now that I'm done grumbling, here's a patch.

--
Robert Haas
EDB: http://www.enterprisedb.com

v1-0001-Fix-pg_basebackup-with-in-place-tablespaces-some-.patch
Description: Binary data

Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Melanie Plageman

Hi,

I think that 4753ef37e0ed undid the work caf626b2c did to support
sub-millisecond delays for vacuum and autovacuum.

After 4753ef37e0ed, vacuum_delay_point()'s local variable msec is a
double which, after being passed to WaitLatch() as timeout, which is a
long, ends up being 0, so we don't end up waiting AFAICT.

When I set [autovacuum_]vacuum_delay_point to 0.5, SHOW will report that
it is 500us, but WaitLatch() is still getting 0 as timeout.

- Melanie

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Thomas Munro

On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman
 wrote:
> I think that 4753ef37e0ed undid the work caf626b2c did to support
> sub-millisecond delays for vacuum and autovacuum.
>
> After 4753ef37e0ed, vacuum_delay_point()'s local variable msec is a
> double which, after being passed to WaitLatch() as timeout, which is a
> long, ends up being 0, so we don't end up waiting AFAICT.
>
> When I set [autovacuum_]vacuum_delay_point to 0.5, SHOW will report that
> it is 500us, but WaitLatch() is still getting 0 as timeout.

Given that some of the clunkier underlying kernel primitives have
milliseconds in their interface, I don't think it would be possible to
make a usec-based variant of WaitEventSetWait() that works everywhere.
Could it possibly make sense to do something that accumulates the
error, so if you're using 0.5 then every second vacuum_delay_point()
waits for 1ms?

Re: Date-time extraneous fields with reserved keywords

2023-03-09 Thread Tom Lane

Joseph Koshakow  writes:
> Please see the attached patch with these changes.

Pushed with a couple of cosmetic adjustments.

regards, tom lane

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Tom Lane

Thomas Munro  writes:
> On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman
>  wrote:
>> I think that 4753ef37e0ed undid the work caf626b2c did to support
>> sub-millisecond delays for vacuum and autovacuum.

> Given that some of the clunkier underlying kernel primitives have
> milliseconds in their interface, I don't think it would be possible to
> make a usec-based variant of WaitEventSetWait() that works everywhere.
> Could it possibly make sense to do something that accumulates the
> error, so if you're using 0.5 then every second vacuum_delay_point()
> waits for 1ms?

Yeah ... using float math there was cute, but it'd only get us so far.
The caf626b2c code would only work well on platforms that have
microsecond-based sleep primitives, so it was already not too portable.

Can we fix this by making VacuumCostBalance carry the extra fractional
delay, or would a separate variable be better?

regards, tom lane

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Stephen Frost

Greetings,

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman
>  wrote:
> > I think that 4753ef37e0ed undid the work caf626b2c did to support
> > sub-millisecond delays for vacuum and autovacuum.
> >
> > After 4753ef37e0ed, vacuum_delay_point()'s local variable msec is a
> > double which, after being passed to WaitLatch() as timeout, which is a
> > long, ends up being 0, so we don't end up waiting AFAICT.
> >
> > When I set [autovacuum_]vacuum_delay_point to 0.5, SHOW will report that
> > it is 500us, but WaitLatch() is still getting 0 as timeout.
> 
> Given that some of the clunkier underlying kernel primitives have
> milliseconds in their interface, I don't think it would be possible to
> make a usec-based variant of WaitEventSetWait() that works everywhere.
> Could it possibly make sense to do something that accumulates the
> error, so if you're using 0.5 then every second vacuum_delay_point()
> waits for 1ms?

Hmm.  That generally makes sense to me.. though isn't exactly the same.
Still, I wouldn't want to go back to purely pg_usleep() as that has the
other downsides mentioned.

Perhaps if the delay is sub-millisecond, explicitly do the WaitLatch()
with zero but also do the pg_usleep()?  That's doing a fair bit of work
beyond just sleeping, but it also means we shouldn't miss out on the
postmaster going away or similar..

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Thomas Munro

On Fri, Mar 10, 2023 at 11:02 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman
> >  wrote:
> >> I think that 4753ef37e0ed undid the work caf626b2c did to support
> >> sub-millisecond delays for vacuum and autovacuum.
>
> > Given that some of the clunkier underlying kernel primitives have
> > milliseconds in their interface, I don't think it would be possible to
> > make a usec-based variant of WaitEventSetWait() that works everywhere.
> > Could it possibly make sense to do something that accumulates the
> > error, so if you're using 0.5 then every second vacuum_delay_point()
> > waits for 1ms?
>
> Yeah ... using float math there was cute, but it'd only get us so far.
> The caf626b2c code would only work well on platforms that have
> microsecond-based sleep primitives, so it was already not too portable.

Also, the previous coding was already b0rked, because pg_usleep()
rounds up to milliseconds on Windows (with a surprising formula for
rounding), and also the whole concept seems to assume things about
schedulers that aren't really universally true.  If we actually cared
about high res times maybe we should be using nanosleep and tracking
the drift?  And spreading it out a bit.  But I don't know.

> Can we fix this by making VacuumCostBalance carry the extra fractional
> delay, or would a separate variable be better?

I was wondering the same thing, but not being too familiar with that
code, no opinion on that yet.

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Melanie Plageman

On Thu, Mar 9, 2023 at 5:10 PM Thomas Munro  wrote:
>
> On Fri, Mar 10, 2023 at 11:02 AM Tom Lane  wrote:
> > Thomas Munro  writes:
> > > On Fri, Mar 10, 2023 at 10:26 AM Melanie Plageman
> > >  wrote:
> > >> I think that 4753ef37e0ed undid the work caf626b2c did to support
> > >> sub-millisecond delays for vacuum and autovacuum.
> >
> > > Given that some of the clunkier underlying kernel primitives have
> > > milliseconds in their interface, I don't think it would be possible to
> > > make a usec-based variant of WaitEventSetWait() that works everywhere.
> > > Could it possibly make sense to do something that accumulates the
> > > error, so if you're using 0.5 then every second vacuum_delay_point()
> > > waits for 1ms?
> >
> > Yeah ... using float math there was cute, but it'd only get us so far.
> > The caf626b2c code would only work well on platforms that have
> > microsecond-based sleep primitives, so it was already not too portable.
>
> Also, the previous coding was already b0rked, because pg_usleep()
> rounds up to milliseconds on Windows (with a surprising formula for
> rounding), and also the whole concept seems to assume things about
> schedulers that aren't really universally true.  If we actually cared
> about high res times maybe we should be using nanosleep and tracking
> the drift?  And spreading it out a bit.  But I don't know.
>
> > Can we fix this by making VacuumCostBalance carry the extra fractional
> > delay, or would a separate variable be better?
>
> I was wondering the same thing, but not being too familiar with that
> code, no opinion on that yet.

Well, that is reset to zero in vacuum() at the top -- which is called for
each table for autovacuum, so it would get reset to zero between
autovacuuming tables. I dunno how you feel about that...

- Melanie

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Tom Lane

Thomas Munro  writes:
> On Fri, Mar 10, 2023 at 11:02 AM Tom Lane  wrote:
>> The caf626b2c code would only work well on platforms that have
>> microsecond-based sleep primitives, so it was already not too portable.

> Also, the previous coding was already b0rked, because pg_usleep()
> rounds up to milliseconds on Windows (with a surprising formula for
> rounding), and also the whole concept seems to assume things about
> schedulers that aren't really universally true.  If we actually cared
> about high res times maybe we should be using nanosleep and tracking
> the drift?  And spreading it out a bit.  But I don't know.

Yeah, I was wondering about trying to make it a closed-loop control,
but I think that'd be huge overkill considering what the mechanism is
trying to accomplish.

A minimalistic fix could be as attached.  I'm not sure if it's worth
making the state variable global so that it can be reset to zero in
the places where we zero out VacuumCostBalance etc.  Also note that
this is ignoring the VacuumSharedCostBalance stuff, so you'd possibly
have the extra delay accumulating in unexpected places when there are
multiple workers.  But I really doubt it's worth worrying about that.

Is it reasonable to assume that all modern platforms can time
millisecond delays accurately?  Ten years ago I'd have suggested
truncating the delay to a multiple of 10msec and using this logic
to track the remainder, but maybe now that's unnecessary.

regards, tom lane

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 2e12baf8eb..64d3c59709 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2229,14 +2229,30 @@ vacuum_delay_point(void)
 	/* Nap if appropriate */
 	if (msec > 0)
 	{
+		/*
+		 * Since WaitLatch can only wait in units of milliseconds, carry any
+		 * residual fractional msec in a static variable, and accumulate it to
+		 * add into the wait interval next time.
+		 */
+		static double residual_msec = 0;
+		long		lmsec;
+
+		msec += residual_msec;
+
 		if (msec > VacuumCostDelay * 4)
 			msec = VacuumCostDelay * 4;

-		(void) WaitLatch(MyLatch,
-		 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-		 msec,
-		 WAIT_EVENT_VACUUM_DELAY);
-		ResetLatch(MyLatch);
+		lmsec = floor(msec);
+		residual_msec = msec - lmsec;
+
+		if (lmsec > 0)
+		{
+			(void) WaitLatch(MyLatch,
+			 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+			 lmsec,
+			 WAIT_EVENT_VACUUM_DELAY);
+			ResetLatch(MyLatch);
+		}

 		VacuumCostBalance = 0;

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Nathan Bossart

On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote:
> Is it reasonable to assume that all modern platforms can time
> millisecond delays accurately?  Ten years ago I'd have suggested
> truncating the delay to a multiple of 10msec and using this logic
> to track the remainder, but maybe now that's unnecessary.

If so, it might also be worth updating or removing this comment in
pgsleep.c:

 * NOTE: although the delay is specified in microseconds, the effective
 * resolution is only 1/HZ, or 10 milliseconds, on most Unixen.  Expect
 * the requested delay to be rounded up to the next resolution boundary.

I've had doubts for some time about whether this is still accurate...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: buildfarm + meson

2023-03-09 Thread Andrew Dunstan



On 2023-03-09 Th 15:25, Tom Lane wrote:

Andres Freund  writes:

On 2023-03-09 14:47:36 -0500, Andrew Dunstan wrote:

. There appears to be some mismatch in database names (e.g.
regression_dblink vs contrib_regression_dblink). That's going to cause some
issues with the module that adjusts things for cross version upgrade.

I guess we can try to do something about that, but the make situation is
overly complicated. I don't really want to emulate having randomly differing
database names just because a test is in contrib/ rather than src/.

We could talk about adjusting the behavior on the make side instead,
perhaps, but something needs to be done there eventually.

Having said that, I'm not sure that the first meson-capable buildfarm
version needs to support cross-version-upgrade testing.





Well, I want to store up as little future work as possible. This 
particular issue won't be much of a problem for several months until we 
branch the code, as we don't do database adjustments for a same version 
upgrade. At that stage I think a small modification to AdjustUpgrade.pm 
will do the trick. We just need to remember to do it.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Melanie Plageman

On Thu, Mar 9, 2023 at 5:27 PM Tom Lane  wrote:
>
> Thomas Munro  writes:
> > On Fri, Mar 10, 2023 at 11:02 AM Tom Lane  wrote:
> >> The caf626b2c code would only work well on platforms that have
> >> microsecond-based sleep primitives, so it was already not too portable.
>
> > Also, the previous coding was already b0rked, because pg_usleep()
> > rounds up to milliseconds on Windows (with a surprising formula for
> > rounding), and also the whole concept seems to assume things about
> > schedulers that aren't really universally true.  If we actually cared
> > about high res times maybe we should be using nanosleep and tracking
> > the drift?  And spreading it out a bit.  But I don't know.
>
> Yeah, I was wondering about trying to make it a closed-loop control,
> but I think that'd be huge overkill considering what the mechanism is
> trying to accomplish.

Not relevant to fixing this, but I wonder if you could eliminate the
need to specify the cost delay in most cases for autovacuum if you used
feedback from how much vacuuming work was done during the last cycle of
vacuuming to control the delay value internally - a kind of
feedback-adjusted controller.

- Melanie

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Melanie Plageman

On Thu, Mar 9, 2023 at 5:27 PM Tom Lane  wrote:
>
> Thomas Munro  writes:
> > On Fri, Mar 10, 2023 at 11:02 AM Tom Lane  wrote:
> >> The caf626b2c code would only work well on platforms that have
> >> microsecond-based sleep primitives, so it was already not too portable.
>
> > Also, the previous coding was already b0rked, because pg_usleep()
> > rounds up to milliseconds on Windows (with a surprising formula for
> > rounding), and also the whole concept seems to assume things about
> > schedulers that aren't really universally true.  If we actually cared
> > about high res times maybe we should be using nanosleep and tracking
> > the drift?  And spreading it out a bit.  But I don't know.
>
> Yeah, I was wondering about trying to make it a closed-loop control,
> but I think that'd be huge overkill considering what the mechanism is
> trying to accomplish.
>
> A minimalistic fix could be as attached.  I'm not sure if it's worth
> making the state variable global so that it can be reset to zero in
> the places where we zero out VacuumCostBalance etc.  Also note that
> this is ignoring the VacuumSharedCostBalance stuff, so you'd possibly
> have the extra delay accumulating in unexpected places when there are
> multiple workers.  But I really doubt it's worth worrying about that.

What if someone resets the delay guc and there is still a large residual?

Add macros for ReorderBufferTXN toptxn

2023-03-09 Thread Peter Smith

Hi,

During a recent code review, I was confused multiple times by the
toptxn member of ReorderBufferTXN, which is defined only for
sub-transactions.

e.g. txn->toptxn member == NULL means the txn is a top level txn.
e.g. txn->toptxn member != NULL means the txn is not a top level txn

It makes sense if you squint and read it slowly enough, but IMO it's
too easy to accidentally misinterpret the meaning when reading code
that uses this member.

~

Such code can be made easier to read just by introducing some simple macros:

#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL)
#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL)
#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn)

~

PSA a small patch that does this.

(Tests OK using make check-world)

Thoughts?

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Add-macros-for-ReorderBufferTXN-toptxn.patch
Description: Binary data

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

2023-03-09 Thread Juan José Santamaría Flecha

On Tue, Mar 7, 2023 at 1:36 PM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

>
> On Thu, Mar 2, 2023 at 8:01 AM Michael Paquier 
> wrote:
>
>>
>> The internal implementation of _pgstat64() is used in quite a few
>>
> places, so we'd better update this part first, IMO, and then focus on
>> the pg_dump part.  Anyway, it looks like you are right here: there is
>> nothing for FILE_TYPE_PIPE or FILE_TYPE_CHAR in this WIN32
>> implementation of fstat().
>>
>> I am amazed to hear that both ftello64() and fseek64() actually
>> succeed if you use a pipe:
>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
>> Could it be something we should try to make more portable by ourselves
>> with a wrapper for these on WIN32?  That would not be the first one to
>> accomodate our code with POSIX, and who knows what code could be broken
>> because of that, like external extensions that use fseek64() without
>> knowing it.
>>
>
> The error is reproducible in versions previous to win32stat.c, so that
> might work as bug fix.
>

I've broken the patch in two:
1. fixes the detection of unseekable files in checkSeek(), using logic that
hopefully is backpatchable,
2. the improvements on file type detection for stat() proposed by the OP.

Regards,

Juan José Santamaría Flecha


v2-0002-improve-detection-file-type-for-WIN32-stat.patch
Description: Binary data


v2-0001-fix-chechSeek-for-WIN32.patch
Description: Binary data

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Tom Lane

Melanie Plageman  writes:
> On Thu, Mar 9, 2023 at 5:27 PM Tom Lane  wrote:
>> A minimalistic fix could be as attached.  I'm not sure if it's worth
>> making the state variable global so that it can be reset to zero in
>> the places where we zero out VacuumCostBalance etc.  Also note that
>> this is ignoring the VacuumSharedCostBalance stuff, so you'd possibly
>> have the extra delay accumulating in unexpected places when there are
>> multiple workers.  But I really doubt it's worth worrying about that.

> What if someone resets the delay guc and there is still a large residual?

By definition, the residual is less than 1msec.

regards, tom lane

Re: buildfarm + meson

2023-03-09 Thread Andrew Dunstan



On 2023-03-09 Th 14:47, Andrew Dunstan wrote:



On 2023-03-09 Th 08:28, Andrew Dunstan wrote:




At this stage I think I'm prepared to turn this loose on a couple of 
my buildfarm animals, and if nothing goes awry for the remainder of 
the month merge the dev/meson branch and push a new release.


There is still probably a little polishing to do, especially w.r.t. 
log file artefacts.







A few things I've found:

. We don't appear to have an equivalent of the headerscheck and 
cpluspluscheck GNUmakefile targets


. I don't know how to build other docs targets (e.g. postgres-US.pdf)

. There appears to be some mismatch in database names (e.g. 
regression_dblink vs contrib_regression_dblink). That's going to cause 
some issues with the module that adjusts things for cross version upgrade.






Another thing: the test for uuid.h is too strict. On Fedora 36 the OSSP 
header is in /usr/include, not /usr/include/ossp (I got around that for 
now by symlinking it, but obviously that's a nasty hack we can't ask 
people to do)



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: improving user.c error messages

2023-03-09 Thread Nathan Bossart

On Thu, Mar 09, 2023 at 09:58:46AM -0800, Nathan Bossart wrote:
> On Thu, Mar 09, 2023 at 10:55:54AM +0100, Peter Eisentraut wrote:
>> On 20.02.23 23:58, Nathan Bossart wrote:
>>> For now, I've reworded these as "must inherit privileges of".
>> 
>> I don't have a good mental model of all this role inheritance, personally,
>> but I fear that this change makes the messages more jargony and less clear.
>> Maybe the original wording was good enough.
> 
> I'm fine with that.

I used the original wording in v7.

>> "admin option" is sort of a natural language term, I think, so we don't need
>> to parametrize it as "%s option".  Also, there are no other "options" in
>> this context, I think.
> 
> v16 introduces the INHERIT and SET options.  I don't have a strong opinion
> about parameterizing it, though.  My intent was to consistently capitalize
> all the attributes and options.

I didn't change this in v7, but I can do so if you still think it shouldn't
be parameterized.

>> A general thought: It seems we currently don't have any error messages that
>> address the user like "You must do this".  Do we want to go there? Should we
>> try for a more impersonal wording like
>> 
>> "You must have the %s attribute to create roles."
>> 
>> "Current user must have the %s attribute to create roles."
>> 
>> "%s attribute is required to create roles."
> 
> I think I like the last option the most.  In general, I agree with trying
> to avoid the second-person phrasing.

I ended up using the "current user must have" wording in a few places, and
for most others, I used "only roles with X may do Y."  That seemed to flow
relatively well, and IMO it made the required privileges abundantly clear.
I initially was going to use the "X attribute is required to Y" wording,
but I was worried that didn't make it sufficiently clear that the _role_
must have the attribute.  In any case, I'm not wedded to the approach I
used in the patch and am willing to try out other wordings.

BTW I did find one example of a "you must" message while I was updating the
patch:

write_stderr("%s does not know where to find the server configuration 
file.\n"
 "You must specify the --config-file or -D invocation "
 "option or set the PGDATA environment variable.\n",
 progname);

I don't think it's a common style, though.

>> By the way, I'm not sure what the separation between 0001 and 0002 is
>> supposed to be.
> 
> I'll combine them.  I first started with user.c only, but we kept finding
> new messages to improve.

I combined the patches in v7.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 66cc7a3999cae48c1b97ef067bc16872b31b887f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 26 Jan 2023 11:05:13 -0800
Subject: [PATCH v7 1/1] Improve several permission-related error messages.

---
 contrib/file_fdw/expected/file_fdw.out|   3 +-
 contrib/file_fdw/file_fdw.c   |   8 +-
 .../test_decoding/expected/permissions.out|  12 +-
 src/backend/backup/basebackup_server.c|   4 +-
 src/backend/catalog/objectaddress.c   |  16 +-
 src/backend/commands/copy.c   |  12 +-
 src/backend/commands/user.c   | 169 +-
 src/backend/replication/slot.c|   6 +-
 src/backend/storage/ipc/procarray.c   |   4 +-
 src/backend/storage/ipc/signalfuncs.c |  16 +-
 src/backend/tcop/utility.c|   5 +-
 src/backend/utils/init/miscinit.c |   4 +
 src/backend/utils/init/postinit.c |  12 +-
 src/backend/utils/misc/guc.c  |  15 +-
 .../expected/dummy_seclabel.out   |   3 +-
 .../unsafe_tests/expected/rolenames.out   |   3 +-
 src/test/regress/expected/create_role.out |  80 ++---
 src/test/regress/expected/dependency.out  |   4 +
 src/test/regress/expected/privileges.out  |  23 ++-
 19 files changed, 287 insertions(+), 112 deletions(-)

diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out
index 36d76ba26c..a5acf6d4f7 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -474,7 +474,8 @@ ALTER FOREIGN TABLE agg_text OWNER TO regress_file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE regress_file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
-ERROR:  only superuser or a role with privileges of the pg_read_server_files role may specify the filename option of a file_fdw foreign table
+ERROR:  permission denied to set the "filename" option of a file_fdw foreign table
+DETAIL:  Only roles with privileges of the "pg_read_server_files" role may set this option.
 SET ROLE regress_file_fdw_superuser;
 -- cleanup
 RESET ROLE;
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2d2b0b6a6b..8a312b5d0e 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/fi

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Thomas Munro

On Fri, Mar 10, 2023 at 11:37 AM Nathan Bossart
 wrote:
>
> On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote:
> > Is it reasonable to assume that all modern platforms can time
> > millisecond delays accurately?  Ten years ago I'd have suggested
> > truncating the delay to a multiple of 10msec and using this logic
> > to track the remainder, but maybe now that's unnecessary.
>
> If so, it might also be worth updating or removing this comment in
> pgsleep.c:
>
>  * NOTE: although the delay is specified in microseconds, the effective
>  * resolution is only 1/HZ, or 10 milliseconds, on most Unixen.  Expect
>  * the requested delay to be rounded up to the next resolution boundary.
>
> I've had doubts for some time about whether this is still accurate...

What I see with the old select(), or a more modern clock_nanosleep()
call, is that Linux, FreeBSD, macOS are happy sleeping for .1ms, .5ms,
1ms, 2ms, 3ms, and through innaccuracies and scheduling overheads etc
it works out to about 5-25% extra sleep time (I expect that can be
affected by choice of time source/available hardware, and perhaps
various system calls use different tricks).  I definitely recall the
behaviour described, back in the old days where more stuff was
scheduler-tick based.  I have no clue for Windows; quick googling
tells me that it might still be pretty chunky, unless you do certain
other stuff that I didn't follow up; we could probably get more
accurate sleep times by rummaging through nt.dll.  It would be good to
find out how well WaitEventSet does on Windows; perhaps we should have
a little timing accuracy test in the tree to collect build farm data?

FWIW epoll has a newer _pwait2() call that has higher res timeout
argument, and Windows WaitEventSet could also do high res timers if
you add timer events rather than using the timeout argument, and I
guess conceptually even the old poll() thing could do the equivalent
with a signal alarm timer, but it sounds a lot like a bad idea to do
very short sleeps to me, burning so much CPU on scheduling.  I kinda
wonder if the 10ms + residual thing might even turn out to be a better
idea... but I dunno.

The 1ms residual thing looks pretty good to me as a fix to the
immediate problem report, but we might also want to adjust the wording
in config.sgml?

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Thomas Munro

Erm, but maybe I'm just looking at this too myopically.  Is there
really any point in letting people set it to 0.5, if it behaves as if
you'd set it to 1 and doubled the cost limit?  Isn't it just more
confusing?  I haven't read the discussion from when fractional delays
came in, where I imagine that must have come up...

Re: Date-Time dangling unit fix

2023-03-09 Thread Tom Lane

Joseph Koshakow  writes:
> Also I removed some dead code from the previous patch.

This needs a rebase over bcc704b52, so I did that and made a
couple of other adjustments.

I'm inclined to think that you removed too much from DecodeTimeOnly.
That does accept date specs (at least for timetz), and I see no very
good reason for it not to accept a Julian date spec.  I also wonder
why you removed the UNITS: case there.  Seems like we want these
functions to accept the same syntax as much as possible.

I think the code is still a bit schizophrenic about placement of
ptype specs.  In the UNITS: case, we don't insist that a unit
apply to exactly the very next field; instead it applies to the next
one where it disambiguates.  So for instead this is accepted:

regression=# select 'J PM 1234567 1:23'::timestamp;
   timestamp

 1333-01-11 13:23:00 BC

That's a little weird, or maybe even a lot weird, but it's not
inherently nonsensical so I'm hesitant to stop accepting it.
However, if UNITS acts that way, then why is ISOTIME different?
So I'm inclined to remove ISOTIME's lookahead check

if (i >= nf - 1 ||
(ftype[i + 1] != DTK_NUMBER &&
 ftype[i + 1] != DTK_TIME &&
 ftype[i + 1] != DTK_DATE))
return DTERR_BAD_FORMAT;

and rely on the ptype-still-set error at the bottom of the loop
to complain about nonsensical cases.

Also, if we do keep the lookahead checks, the one in DecodeTimeOnly
could be simplified --- it's accepting some cases that actually
aren't supported there.

regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index a7558d39a0..13856b06d1 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -983,7 +983,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	int			fmask = 0,
 tmask,
 type;
-	int			ptype = 0;		/* "prefix type" for ISO y2001m02d04 format */
+	int			ptype = 0;		/* "prefix type" for ISO and Julian formats */
 	int			i;
 	int			val;
 	int			dterr;
@@ -1071,6 +1071,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	{
 		char	   *cp;
 
+		/*
+		 * Allow a preceding "t" field, but no other units.
+		 */
 		if (ptype != 0)
 		{
 			/* Sanity check; should not fail this test */
@@ -1175,8 +1178,7 @@ DecodeDateTime(char **field, int *ftype, int nf,
 			case DTK_NUMBER:
 
 /*
- * Was this an "ISO date" with embedded field labels? An
- * example is "y2001m02d04" - thomas 2001-02-04
+ * Deal with cases where previous field labeled this one
  */
 if (ptype != 0)
 {
@@ -1187,85 +1189,11 @@ DecodeDateTime(char **field, int *ftype, int nf,
 	value = strtoint(field[i], &cp, 10);
 	if (errno == ERANGE)
 		return DTERR_FIELD_OVERFLOW;
-
-	/*
-	 * only a few kinds are allowed to have an embedded
-	 * decimal
-	 */
-	if (*cp == '.')
-		switch (ptype)
-		{
-			case DTK_JULIAN:
-			case DTK_TIME:
-			case DTK_SECOND:
-break;
-			default:
-return DTERR_BAD_FORMAT;
-break;
-		}
-	else if (*cp != '\0')
+	if (*cp != '.' && *cp != '\0')
 		return DTERR_BAD_FORMAT;
 
 	switch (ptype)
 	{
-		case DTK_YEAR:
-			tm->tm_year = value;
-			tmask = DTK_M(YEAR);
-			break;
-
-		case DTK_MONTH:
-
-			/*
-			 * already have a month and hour? then assume
-			 * minutes
-			 */
-			if ((fmask & DTK_M(MONTH)) != 0 &&
-(fmask & DTK_M(HOUR)) != 0)
-			{
-tm->tm_min = value;
-tmask = DTK_M(MINUTE);
-			}
-			else
-			{
-tm->tm_mon = value;
-tmask = DTK_M(MONTH);
-			}
-			break;
-
-		case DTK_DAY:
-			tm->tm_mday = value;
-			tmask = DTK_M(DAY);
-			break;
-
-		case DTK_HOUR:
-			tm->tm_hour = value;
-			tmask = DTK_M(HOUR);
-			break;
-
-		case DTK_MINUTE:
-			tm->tm_min = value;
-			tmask = DTK_M(MINUTE);
-			break;
-
-		case DTK_SECOND:
-			tm->tm_sec = value;
-			tmask = DTK_M(SECOND);
-			if (*cp == '.')
-			{
-dterr = ParseFractionalSecond(cp, fsec);
-if (dterr)
-	return dterr;
-tmask = DTK_ALL_SECS_M;
-			}
-			break;
-
-		case DTK_TZ:
-			tmask = DTK_M(TZ);
-			dterr = DecodeTimezone(field[i], tzp);
-			if (dterr)
-return dterr;
-			break;
-
 		case DTK_JULIAN:
 			/* previous field was a label for "julian date" */
 			if (value < 0)
@@ -1519,6 +1447,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 
 	case UNITS:
 		tmask = 0;
+		/* reject consecutive unhandled units */
+		if (ptype != 0)
+			return DTERR_BAD_FORMAT;
 		ptype = val;
 		break;
 
@@ -1576,6 +1507,10 @@ DecodeDateTime(char **field, int

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Tom Lane

Thomas Munro  writes:
> Erm, but maybe I'm just looking at this too myopically.  Is there
> really any point in letting people set it to 0.5, if it behaves as if
> you'd set it to 1 and doubled the cost limit?  Isn't it just more
> confusing?  I haven't read the discussion from when fractional delays
> came in, where I imagine that must have come up...

At [1] I argued

>> The reason is this: what we want to do is throttle VACUUM's I/O demand,
>> and by "throttle" I mean "gradually reduce".  There is nothing gradual
>> about issuing a few million I/Os and then sleeping for many milliseconds;
>> that'll just produce spikes and valleys in the I/O demand.  Ideally,
>> what we'd have it do is sleep for a very short interval after each I/O.
>> But that's not too practical, both for code-structure reasons and because
>> most platforms don't give us a way to so finely control the length of a
>> sleep.  Hence the design of sleeping for awhile after every so many I/Os.
>> 
>> However, the current settings are predicated on the assumption that
>> you can't get the kernel to give you a sleep of less than circa 10ms.
>> That assumption is way outdated, I believe; poking around on systems
>> I have here, the minimum delay time using pg_usleep(1) seems to be
>> generally less than 100us, and frequently less than 10us, on anything
>> released in the last decade.
>> 
>> I propose therefore that instead of increasing vacuum_cost_limit,
>> what we ought to be doing is reducing vacuum_cost_delay by a similar
>> factor.  And, to provide some daylight for people to reduce it even
>> more, we ought to arrange for it to be specifiable in microseconds
>> not milliseconds.  There's no GUC_UNIT_US right now, but it's time.

That last point was later overruled in favor of keeping it measured in
msec to avoid breaking existing configuration files.  Nonetheless,
vacuum_cost_delay *is* an actual time to wait (conceptually at least),
not just part of a unitless ratio; and there seem to be good arguments
in favor of letting people make it small.

I take your point that really short sleeps are inefficient so far as the
scheduling overhead goes.  But on modern machines you probably have to get
down to a not-very-large number of microseconds before that's a big deal.

regards, tom lane

[1] https://www.postgresql.org/message-id/28720.1552101086%40sss.pgh.pa.us

Re: Add pg_walinspect function with block info columns

2023-03-09 Thread Michael Paquier

On Thu, Mar 09, 2023 at 03:37:21PM +0900, Michael Paquier wrote:
> Okay, thanks.  Let's use that, then.

I have done one pass over that today, and applied it.  Thanks!

I'd really like to do something about the errors we raise in the
module when specifying LSNs in the future for this release, now.  I
got annoyed by it again this morning while doing \watch queries that
kept failing randomly while stressing this patch.
--
Michael


signature.asc
Description: PGP signature

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Thomas Munro

On Fri, Mar 10, 2023 at 1:46 PM Tom Lane  wrote:
> >> I propose therefore that instead of increasing vacuum_cost_limit,
> >> what we ought to be doing is reducing vacuum_cost_delay by a similar
> >> factor.  And, to provide some daylight for people to reduce it even
> >> more, we ought to arrange for it to be specifiable in microseconds
> >> not milliseconds.  There's no GUC_UNIT_US right now, but it's time.
>
> That last point was later overruled in favor of keeping it measured in
> msec to avoid breaking existing configuration files.  Nonetheless,
> vacuum_cost_delay *is* an actual time to wait (conceptually at least),
> not just part of a unitless ratio; and there seem to be good arguments
> in favor of letting people make it small.
>
> I take your point that really short sleeps are inefficient so far as the
> scheduling overhead goes.  But on modern machines you probably have to get
> down to a not-very-large number of microseconds before that's a big deal.

OK.  One idea is to provide a WaitLatchUsec(), which is just some
cross platform donkeywork that I think I know how to type in, and it
would have to round up on poll() and Windows builds.  Then we could
either also provide WaitEventSetResolution() that returns 1000 or 1
depending on availability of 1us waits so that we could round
appropriately and then track residual, but beyond that let the user
worry about inaccuracies and overheads (as mentioned in the
documentation), or we could start consulting the clock and tracking
our actual sleep time and true residual over time (maybe that's what
"closed-loop control" means?).

Re: Add pg_walinspect function with block info columns

2023-03-09 Thread Bharath Rupireddy

On Fri, Mar 10, 2023 at 6:43 AM Michael Paquier  wrote:
>
> I'd really like to do something about the errors we raise in the
> module when specifying LSNs in the future for this release, now.  I
> got annoyed by it again this morning while doing \watch queries that
> kept failing randomly while stressing this patch.

Perhaps what is proposed here
https://www.postgresql.org/message-id/calj2acwqj+m0hoqj9qkav2uqfq97yk5jn2modfkcxusxsyp...@mail.gmail.com
might help and avoid many errors around input LSN validations. Let's
discuss that in that thread.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-09 Thread Tom Lane

Thomas Munro  writes:
> OK.  One idea is to provide a WaitLatchUsec(), which is just some
> cross platform donkeywork that I think I know how to type in, and it
> would have to round up on poll() and Windows builds.  Then we could
> either also provide WaitEventSetResolution() that returns 1000 or 1
> depending on availability of 1us waits so that we could round
> appropriately and then track residual, but beyond that let the user
> worry about inaccuracies and overheads (as mentioned in the
> documentation),

... so we'd still need to have the residual-sleep-time logic?

> or we could start consulting the clock and tracking
> our actual sleep time and true residual over time (maybe that's what
> "closed-loop control" means?).

Yeah, I was hand-waving about trying to measure our actual sleep times.
On reflection I doubt it's a great idea.  It'll add overhead and there's
still a question of whether measurement noise would accumulate.

regards, tom lane

Re: Add pg_walinspect function with block info columns

2023-03-09 Thread Michael Paquier

On Fri, Mar 10, 2023 at 06:50:05AM +0530, Bharath Rupireddy wrote:
> Perhaps what is proposed here
> https://www.postgresql.org/message-id/calj2acwqj+m0hoqj9qkav2uqfq97yk5jn2modfkcxusxsyp...@mail.gmail.com
> might help and avoid many errors around input LSN validations. Let's
> discuss that in that thread.

Yep, I am going to look at your proposal.
--
Michael


signature.asc
Description: PGP signature

Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32

2023-03-09 Thread Michael Paquier

On Fri, Mar 10, 2023 at 12:12:37AM +0100, Juan José Santamaría Flecha wrote:
> I've broken the patch in two:
> 1. fixes the detection of unseekable files in checkSeek(), using logic that
> hopefully is backpatchable,
> 2. the improvements on file type detection for stat() proposed by the OP.

I am OK with 0002, so I'll try to get this part backpatched down to
where the implementation of stat() has been added.  I am not
completely sure that 0001 is the right way forward, though,
particularly with the long-term picture..  In the backend, we have one
caller of fseeko() as of read_binary_file(), so we would never pass 
down a pipe to that.  However, there could be a risk of some silent
breakages on Windows if some new code relies on that?

There is a total of 11 callers of fseeko() in pg_dump, so rather than
relying on checkSeek() to see if it actually works, I'd like to think
that we should have a central policy to make this code more
bullet-proof in the future.
--
Michael

signature.asc
Description: PGP signature

1 2 >

1 - 100 of 118 matches

Mail list logo