Re: REVOKE FROM warning on grantor

2024-03-18 Thread Étienne BERSAC
Hi Tom,

Thanks for your anwser.


> It does not say that that set must be nonempty.  Admittedly it's not
> very clear from this one point.  However, if you look around in the
> standard it seems clear that they expect no-op revokes to be no-ops
> not errors.

Postgres actually identifies memberhips to revoke.  The list is not
empty.  Event if revoker has USAGE privilege on parent role, the
membership is protected by a new check on grantor of membership.  This
is a new semantic for me.  I guess this may obfuscate other people too.

I would compare denied revoking of role with revoking privilege on
denied table:

> REVOKE SELECT ON TABLE toto FROM PUBLIC ;
ERROR:  permission denied for table toto

> Even taking the position that this is an unspecified point that we
> could implement how we like, I don't think there's a sufficient
> argument for changing behavior that's stood for a couple of decades.

In Postgres 15, revoking a membership granted by another role is
accepted.  I suspect this is related to the new CREATEROLE behaviour
implemented by Robert Haas (which is great job anyway).  Attached is a
script to reproduce.

Here is the output on Postgres 15:

   SET
   DROP ROLE
   DROP ROLE
   DROP ROLE
   CREATE ROLE
   CREATE ROLE
   CREATE ROLE
   GRANT ROLE
   SET
   REVOKE ROLE
   DO
   
Here is the output of the same script on Postgres 16:
   
   
   SET
   DROP ROLE
   DROP ROLE
   DROP ROLE
   CREATE ROLE
   CREATE ROLE
   CREATE ROLE
   GRANT ROLE
   SET
   psql:ldap2pg/my-revoke.sql:12: WARNING:  role "r" has not been granted 
membership in role "g" by role "m"
   REVOKE ROLE
   psql:ldap2pg/my-revoke.sql:18: ERROR:  REVOKE failed
   CONTEXTE : PL/pgSQL function inline_code_block line 4 at RAISE

Can you confirm this ?


Regards,
Étienne


my-revoke.sql
Description: application/sql


Re: What about Perl autodie?

2024-03-18 Thread Daniel Gustafsson
> On 18 Mar 2024, at 14:18, Dagfinn Ilmari Mannsåker  wrote:
> Daniel Gustafsson  writes:

>> It would have been nice to standardize on
>> using one of "|| die" and "or die" consistently but that's clearly not for 
>> this
>> body of work.
> 
> "or die" is generally the preferred form, since || has higher precedence
> than comma, so it's easy to make mistakes if you don't parenthesise the
> function args, like:
> 
>   open my $fh, '>', $filname || die "can't open $filename: $!";
> 
> which will only fail if $filename is falsy (i.e. undef, "", or "0").

Thanks for the clarification!  Looking over the || die() codepaths we have, and
we'll add as part of this patchset, none are vulnerable to the above issue
AFAICT.

--
Daniel Gustafsson





Re: sslinfo extension - add notbefore and notafter timestamps

2024-03-18 Thread Jacob Champion
On Fri, Mar 8, 2024 at 4:16 PM Cary Huang  wrote:
> Yes, I noticed this in the SSL test too. I am also in GTM-8, so for me the 
> tests would fail too due to the time zone differences from GMT. It shall be 
> okay to specifically set the outputs of pg_stat_ssl, 
> ssl_client_get_notbefore, and ssl_client_get_notafte to be in GMT time zone. 
> The not before and not after time stamps in a client certificate are 
> generally expressed in GMT.

Hi Cary, did you have any thoughts on the timestamptz notes from my last mail?

> It might also be nice to rename
> ASN1_TIME_to_timestamp().
>
> Squinting further at the server backend implementation, should that
> also be using TimestampTz throughout, instead of Timestamp? It all
> goes through float8_timestamptz at the end, so I guess it shouldn't
> have a material impact, but it's a bit confusing.

Thanks,
--Jacob




Re: Is there still password max length restrictions in PG?

2024-03-18 Thread Daniel Gustafsson
> On 18 Mar 2024, at 14:29, Sean  wrote:

> Need some document to make a clarification or suggestion to the user?

The suggestion is to not use password authentication but instead use SCRAM.

--
Daniel Gustafsson





Re: Alternative SAOP support for GiST

2024-03-18 Thread Michał Kłeczek
Hi All,

> On 11 Mar 2024, at 18:58, Michał Kłeczek  wrote:
> 
> Hi All,
> 
> During my journey of designing Pg based solution at my work I was severely 
> hit by several shortcomings in GiST.
> The most severe one is the lack of support for SAOP filters as it makes it 
> difficult to have partition pruning and index (only) scans working together.
> 
> To overcome the difficulties I implemented a simple extension:
> https://github.com/mkleczek/btree_gist_extra
> 
> Since it provides a separate operator class from btree_gist it requires 
> re-indexing the data.
> So I thought maybe it would be a good idea to incorporate it into btree_gist.
> 

While working on supporting (sort of) SAOP support for Gist I was stuck with 
the interplay between Gist consistent function and partition pruning.
Not sure how it applies to SAOP handling in general though.

I’ve implemented an operator class/family that supports Gist index scan for the 
following query:

SELECT * FROM tbl WHERE col ||= ARRAY[‘a’, ‘b’, ‘c’];

It all works well except cases where tbl is partitioned by “col” column.
In this case index scan unnecessarily scans pages for values that are not in 
the partition.

I am not sure if it works as expected (ie. no unnecessary scans) in case of ANY 
= (ARRAY[]) and nbtree.
In case of Gist the only place where pre-processing of queries can be performed 
is consistent function.
But right now there is no possibility to access any scan related information as 
it is not passed to consistent function.
The only thing available is GISTENTRY and it does not contain any metadata.

As a workaround I’ve added options to the op family that allows (redundantly) 
specifying MODULUS/REMAINDER for the index:

CREATE INDEX idx ON tbl_partition_01 USING gist ( col gist_text_extra_ops 
(modulus=4, remainder=2) )

and use the values to filter out query array passed to consistent function.

This is of course not ideal:
- the information is redundant
- changing partitioning scheme requires re-indexing

I am wondering if it is possible to extend consistent function API so that 
either ScanKey or even the whole IndexScanDesc is passed as additional 
parameter.

—
Michal





Is there still password max length restrictions in PG?

2024-03-18 Thread Sean
Hi All,


Just noticed that the definition:
postgres=# \d pg_shadow
.
usebypassrls | boolean   
  | |   
  |
passwd   | text 
 | C
| |
.

Looks like there is no length restriction for the password of a user.


And in the code change history, 67a472d71c ("Remove arbitrary 
restrictions on password length.", 2020-09-03)
seems having removed the length restriction. (in the history, 
there is 100 or even max length of 1024.)

So, here, just a minor question, can we consider there is no 
max length restriction for the password of a user???
Need some document to make a clarification or suggestion to the user?

BR,
Sean He (iih...@qq.com)

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2024-03-18 Thread Matthias van de Meent
On Sat, 16 Mar 2024 at 01:12, Peter Geoghegan  wrote:
>
> On Fri, Mar 8, 2024 at 9:00 AM Matthias van de Meent
>  wrote:
> > I've attached v14, where 0001 is v13, 0002 is a patch with small
> > changes + some large comment suggestions, and 0003 which contains
> > sorted merge join code for _bt_merge_arrays.
>
> I have accepted your changes from 0003. Agree that it's better that
> way. It's at least a little faster, but not meaningfully more
> complicated.

Thanks.

> > I'll try to work a bit on v13/14's _bt_preprocess_keys, and see what I
> > can make of it.
>
> That's been the big focus of this new v15, which now goes all out with
> teaching _bt_preprocess_keys with how to deal with arrays. We now do
> comprehensive normalization of even very complicated combinations of
> arrays and non-array scan keys in this version.

I was thinking about a more unified processing model, where
_bt_preprocess_keys would iterate over all keys, including processing
of array keys (by merging and reduction to normal keys) if and when
found. This would also reduce the effort expended when there are
contradictory scan keys, as array preprocessing is relatively more
expensive than other scankeys and contradictions are then found before
processing of later keys.
As I wasn't very far into the work yet it seems I can reuse a lot of
your work here.

> For example, consider this query:
[...]
> This has a total of 6 input scankeys -- 3 of which are arrays. But by
> the time v15's _bt_preprocess_keys is done with it, it'll have only 1
> scan key -- which doesn't even have an array (not anymore). And so we
> won't even need to advance the array keys one single time -- there'll
> simply be no array left to advance. In other words, it'll be just as
> if the query was written this way from the start:
>
> select *
> from multi_test
> where
>   a = 3::int;
>
> (Though of course the original query will spend more cycles on
> preprocessing, compared to this manually simplified variant.)

That's a good improvement, much closer to an optimal access pattern.

> It turned out to not be terribly difficult to teach
> _bt_preprocess_keys everything it could possibly need to know about
> arrays, so that it can operate on them directly, as a variant of the
> standard equality strategy (we do still need _bt_preprocess_array_keys
> for basic preprocessing of arrays, mostly just merging them). This is
> better overall (in that it gets every little subtlety right), but it
> also simplified a number of related issues. For example, there is no
> longer any need to maintain a mapping between so->keyData[]-wise scan
> keys (output scan keys), and scan->keyData[]-wise scan keys (input
> scan keys). We can just add a step to fix-up the references to the end
> of _bt_preprocess_keys, to make life easier within
> _bt_advance_array_keys.
>
> This preprocessing work should all be happening during planning, not
> during query execution -- that's the design that makes the most sense.
> This is something we've discussed in the past in the context of skip
> scan (see my original email to this thread for the reference).

Yes, but IIRC we also agreed that it's impossible to do this fully in
planning, amongst others due to joins on array fields.

> It
> would be especially useful for the very fancy kinds of preprocessing
> that are described by the MDAM paper, like using an index scan for a
> NOT IN() list/array (this can actually make sense with a low
> cardinality index).

Yes, indexes such as those on enums. Though, in those cases the NOT IN
() could be transformed into IN()-lists by the planner, but not the
index.

> The structure for preprocessing that I'm working towards (especially
> in v15) sets the groundwork for making those shifts in the planner,
> because we'll no longer treat each array constant as its own primitive
> index scan during preprocessing.

I hope that's going to be a fully separate patch. I don't think I can
handle much more complexity in this one.

> Right now, on HEAD, preprocessing
> with arrays kinda treats each array constant like the parameter of an
> imaginary inner index scan, from an imaginary nested loop join. But
> the planner really needs to operate on the whole qual, all at once,
> including any arrays. An actual nestloop join's inner index scan
> naturally cannot do that, and so might still require runtime/dynamic
> preprocessing in a world where that mostly happens in the planner --
> but that clearly not appropriate for arrays ("WHERE a = 5" and "WHERE
> a in(4, 5)" are almost the same thing, and so should be handled in
> almost the same way by preprocessing).

Yeah, if the planner could handle some of this that'd be great. At the
same time, I think that this might need to be gated behind a guc for
more expensive planner-time deductions.

> > > + * We generally permit primitive index scans to continue onto the 
> > > next
> > > + * sibling page when the page's finaltup satisfies all required scan 
> > > keys
> > > + * 

Re: What about Perl autodie?

2024-03-18 Thread Dagfinn Ilmari Mannsåker
Daniel Gustafsson  writes:

>> On 18 Mar 2024, at 07:27, Peter Eisentraut  wrote:
>
>> After some pondering, I figured the exclude list is better.  
>
> Agreed.
>
>> So here is a squashed patch, also with a complete commit message.
>
> Looks good from a read-through.  It would have been nice to standardize on
> using one of "|| die" and "or die" consistently but that's clearly not for 
> this
> body of work.

"or die" is generally the preferred form, since || has higher precedence
than comma, so it's easy to make mistakes if you don't parenthesise the
function args, like:

   open my $fh, '>', $filname || die "can't open $filename: $!";

which will only fail if $filename is falsy (i.e. undef, "", or "0").

- ilmari




Re: Support json_errdetail in FRONTEND builds

2024-03-18 Thread Jacob Champion
On Sun, Mar 17, 2024 at 4:49 PM Daniel Gustafsson  wrote:
> I took another look at this tonight and committed it with some mostly cosmetic
> changes.

Great! Thanks everyone.

--Jacob




Re: Possibility to disable `ALTER SYSTEM`

2024-03-18 Thread Daniel Gustafsson
> On 18 Mar 2024, at 13:57, Robert Haas  wrote:

> my proposal is something like this, taking a
> bunch of text from Jelte's patch and some inspiration from Magnus's
> earlier remarks:

I still think any wording should clearly mention that settings in the file are
still applied.  The proposed wording says to implicitly but to avoid confusion
I think it should be explicit.

> Perhaps figuring out how to
> document this is best left to a separate thread, and there's also the
> question of whether a new section that talks about this also ought to
> talk about anything else. But I feel like we're way overdue to do
> something about this.

Seconded, both that it needs to be addressed and that it should be done on a
separate thread from this one.

--
Daniel Gustafsson





Re: Improve readability by using designated initializers when possible

2024-03-18 Thread jian he
On Mon, Mar 18, 2024 at 6:01 PM jian he  wrote:
>
> On Mon, Mar 18, 2024 at 3:09 PM Peter Eisentraut  wrote:
> >
> > On 14.03.24 01:26, Michael Paquier wrote:
> > > -EventTriggerSupportsObjectClass(ObjectClass objclass)
> > > +EventTriggerSupportsObject(const ObjectAddress *object)
> > >
> > > The shortcut introduced here is interesting, but it is inconsistent.
> > > HEAD treats OCLASS_SUBSCRIPTION as something supported by event
> > > triggers, but as pg_subscription is a shared catalog it would be
> > > discarded with your change.  Subscriptions are marked as supported in
> > > the event trigger table:
> > > https://www.postgresql.org/docs/devel/event-trigger-matrix.html
> >
> > Ah, good catch.  Subscriptions are a little special there.  Here is a
> > new patch that keeps the switch/case arrangement in that function.  That
> > also makes it easier to keep the two EventTriggerSupports... functions
> > aligned.  Also added a note about subscriptions and a reference to the
> > documentation.
>
> select relname from pg_class where relisshared and relkind = 'r';
> relname
> ---
>  pg_authid
>  pg_subscription
>  pg_database
>  pg_db_role_setting
>  pg_tablespace
>  pg_auth_members
>  pg_shdepend
>  pg_shdescription
>  pg_replication_origin
>  pg_shseclabel
>  pg_parameter_acl
> (11 rows)
>

also in function doDeletion
we have:

/*
* These global object types are not supported here.
*/
case AuthIdRelationId:
case DatabaseRelationId:
case TableSpaceRelationId:
case SubscriptionRelationId:
case ParameterAclRelationId:
elog(ERROR, "global objects cannot be deleted by doDeletion");
break;

do we need to add other global objects?

in the end, it does not matter since we have:
default:
elog(ERROR, "unsupported object class: %u", object->classId);




Re: Possibility to disable `ALTER SYSTEM`

2024-03-18 Thread Robert Haas
On Fri, Mar 15, 2024 at 7:09 AM Jelte Fennema-Nio  wrote:
> On Fri, 15 Mar 2024 at 11:08, Daniel Gustafsson  wrote:
> > Another quirk for the documentation of this: if I disable ALTER SYSTEM I 
> > would
> > assume that postgresql.auto.conf is no longer consumed, but it still is (and
> > still need to be), so maybe "enable/disable" is the wrong choice of words?
>
> Updated the docs to reflect this quirk. But I kept the same name for
> the GUC for now, because I couldn't come up with a better name myself.
> If someone suggests a better name, I'm happy to change it though.

Hmm. So in this patch, we have a whole new kind of GUC - guard rails -
of which enable_alter_system is the first member. Is that what we
want? I would have been somewhat inclined to find an existing section
of postgresql.auto.conf for this parameter, perhaps "platform and
version compatibility". But if we're going to add a bunch of similar
GUCs, maybe grouping them all together is the way to go.

Even if that is what we're going to do, do we want to call them "guard
rails"? I'm not sure I'd find that name terribly clear, as a user. We
know what we mean right now because we're having a very active
discussion about this topic, but it might not seem as clear to someone
coming at it fresh.

On balance, I'm disinclined to add a new category for this. If we get
to a point where we have several of these and we want to break them
out into a new category, we can do it then. Maybe by that time the
naming will seem more clear, too.

I also don't think it's good enough to just say that this isn't a
security feature. Talk is cheap. I think we need to say why it's not a
security feature. So my proposal is something like this, taking a
bunch of text from Jelte's patch and some inspiration from Magnus's
earlier remarks:

==
When enable_alter_system is set to
off, an error is returned if the ALTER
SYSTEM command is used. This parameter can only be set in
the postgresql.conf file or on the server command
line. The default value is on.

Note that this setting cannot be regarded as a security feature. It
only disables the ALTER SYSTEM command. It does not
prevent a superuser from changing the configuration remotely using
other means. A superuser has many ways of executing shell commands at
the operating system level, and can therefore modify
postgresql.auto.conf regardless of the value of
this setting. The purpose of the setting is to prevent
accidental modifications via ALTER
SYSTEM in environments where PostgreSQL's
configuration is managed by some outside mechanism. In such
environments, using ALTER SYSTEM to make
configuration changes might appear to work, but then may be discarded
at some point in the future when that outside mechanism updates the
configuration. Setting this parameter to false can
help to avoid such mistakes.
==

I agree with Daniel's comment that Tom's concerns about people filing
CVEs are not without merit; indeed, I said the same thing in my first
post to this thread. However, I also believe that's not a sufficient
reason for rejecting a feature that many people seem to want. I think
the root of this problem is that our documentation is totally unclear
about the fact that we don't intend for there to be privilege
separation between the operating system user and the PostgreSQL
superuser; people want there to be a distinction, and think there is.
Hence CVE-2019-9193, for example. Several people, including me, wrote
blog posts about how that's not a security vulnerability, but while I
was researching mine, I went looking for where in the documentation we
actually SAY that there's no privilege separation between the OS user
and the superuser. The only mention I found at the time was the
PL/perlu documentation, which said this:

"The writer of a PL/PerlU function must take care that the function
cannot be used to do anything unwanted, since it will be able to do
anything that could be done by a user logged in as the database
administrator."

That statement, from the official documentation, in my mind at least,
DOES confirm that we don't intend privilege separation, but it's
really oblique. You have to think through the fact that the superuser
has to be the one to install plperlu, and that plperlu functions can
usurp the OS user; since both of those things are documented to be the
case, it follows that we know and expect that the superuser can usurp
the OS user. But someone who is wondering how PostgreSQL's security
model works is not going to read the plperlu documentation and make
the inferences I just described. It's crazy to me that a principle
frequently cited as gospel on this mailing list and others is nearly
undocumented. Obviously, even if we did document it clearly, people
could still get confused (or just disagree with our position) and file
CVEs anyway, but we're not helping our case by having nothing to cite.

A difficulty is where to PUT such a mention in the documentation.
There's not a single section title in the top-level 

Re: Building with meson on NixOS/nixpkgs

2024-03-18 Thread Nazir Bilal Yavuz
Hi,

Thank you for the patches!

On Sat, 16 Mar 2024 at 14:48, Wolfgang Walther  wrote:
>
> To build on NixOS/nixpkgs I came up with a few small patches to
> meson.build. All of this works fine with Autoconf/Make already.

I do not have NixOS but I confirm that patches cleanly apply to master
and do pass CI. I have a small feedback:

0001 & 0002: Adding code comments to explain why they have fallback
could be nice.
0003: Looks good to me.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: Reducing connection overhead in pg_upgrade compat check phase

2024-03-18 Thread Daniel Gustafsson
Attached is a fresh rebase with only minor cosmetic touch-ups which I would
like to go ahead with during this CF.

Peter: does this address the comments you had on translation and code
duplication?

--
Daniel Gustafsson



v15-0001-pg_upgrade-run-all-data-type-checks-per-connecti.patch
Description: Binary data


Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-18 Thread Heikki Linnakangas

On 14/02/2024 21:42, Andres Freund wrote:

On 2024-02-13 18:11:25 -0500, Melanie Plageman wrote:

patch 0004 is, I think, a bug fix. see [2].


I'd not quite call it a bugfix, it's not like it leads to wrong
behaviour. Seems more like an optimization. But whatever :)


It sure looks like bug to me, albeit a very minor one. Certainly not an 
optimization, it doesn't affect performance in any way, only what 
EXPLAIN reports. So committed and backported that to all supported branches.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-18 Thread Amit Langote
On Mon, Mar 18, 2024 at 8:57 PM Ashutosh Bapat
 wrote:
> On Mon, Mar 18, 2024 at 5:05 PM Amit Langote  wrote:
>> On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat  
>> wrote:
>>> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote  
>>> wrote:
 Could you please post the numbers with the palloc() / pfree() version?
>>>
>>> Here are they
>>>  tables | master  | patched
>>> +-+-
>>>   2 | 29 MB   | 28 MB
>>>   3 | 102 MB  | 93 MB
>>>   4 | 307 MB  | 263 MB
>>>   5 | 1076 MB | 843 MB
>>>
>>> The numbers look slightly different from my earlier numbers. But they were 
>>> quite old. The patch used to measure memory that time is different from the 
>>> one that we committed. So there's a slight difference in the way we measure 
>>> memory as well.
>>
>>
>> Sorry, I should’ve mentioned that I was interested in seeing cpu times to 
>> compare the two approaches. Specifically, to see if the palloc / frees add 
>> noticeable overhead.
>
> No problem. Here you go
>
>  tables |  master  | patched  | perc_change
> +--+--+-
>   2 |   477.87 |   492.32 |   -3.02
>   3 |  1970.83 |  1989.52 |   -0.95
>   4 |  6305.01 |  6268.81 |0.57
>   5 | 19261.56 | 18684.86 |2.99
>
> +ve change indicates reduced planning time. It seems that the planning time 
> improves as the number of tables increases. But all the numbers are well 
> within noise levels and thus may not show any improvement or deterioration. 
> If you want, I can repeat the experiment.

Thanks.  I also wanted to see cpu time difference between the two
approaches of SpecialJoinInfo allocation -- on stack and on heap.  So
comparing times with the previous version of the patch using stack
allocation and the new one with palloc.  I'm hoping that the new
approach is only worse in the noise range.

-- 
Thanks, Amit Langote




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-18 Thread Ashutosh Bapat
On Mon, Mar 18, 2024 at 5:05 PM Amit Langote 
wrote:

> On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat 
> wrote:
>
>> Hi Amit,
>>
>>
>> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote 
>> wrote:
>>
>>> > >
>>> > > That being said I'm a big fan of using a local variable on stack and
>>> > > filling it. I'd probably go with the usual palloc/pfree, because that
>>> > > makes it much easier to use - the callers would not be responsible
>>> for
>>> > > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
>>> > > overhead, but with the AllocSet caching I doubt it's measurable.
>>> >
>>> > You are suggesting that instead of declaring a local variable of type
>>> > SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
>>> > SpecialJoinInfo which will be freed by free_child_sjinfo()
>>> > (free_child_sjinfo_members in the patch). I am fine with that.
>>>
>>> Agree with Tomas about using makeNode() / pfree().  Having the pfree()
>>> kind of makes it extra-evident that those SpecialJoinInfos are
>>> transitory.
>>>
>>
>> Attached patch-set
>>
>> 0001 - original patch as is
>> 0002 - addresses your first set of comments
>> 0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo
>> structure.
>>
>> I will squash both 0002 and 0003 into 0001 once you review those changes
>> and are fine with those.
>>
>
> Thanks for the new patches.
>
> > > I did put this through check-world on amd64/arm64, with valgrind,
>>> > > without any issue. I also tried the scripts shared by Ashutosh in his
>>> > > initial message (with some minor fixes, adding MEMORY to explain
>>> etc).
>>> > >
>>> > > The results with the 20240130 patches are like this:
>>> > >
>>> > >tablesmasterpatched
>>> > >   -
>>> > > 2  40.8   39.9
>>> > > 3 151.7  142.6
>>> > > 4 464.0  418.5
>>> > > 51663.9 1419.5
>>>
>>> Could you please post the numbers with the palloc() / pfree() version?
>>>
>>>
>> Here are they
>>  tables | master  | patched
>> +-+-
>>   2 | 29 MB   | 28 MB
>>   3 | 102 MB  | 93 MB
>>   4 | 307 MB  | 263 MB
>>   5 | 1076 MB | 843 MB
>>
>> The numbers look slightly different from my earlier numbers. But they
>> were quite old. The patch used to measure memory that time is different
>> from the one that we committed. So there's a slight difference in the way
>> we measure memory as well.
>>
>
> Sorry, I should’ve mentioned that I was interested in seeing cpu times to
> compare the two approaches. Specifically, to see if the palloc / frees add
> noticeable overhead.
>

No problem. Here you go

 tables |  master  | patched  | perc_change
+--+--+-
  2 |   477.87 |   492.32 |   -3.02
  3 |  1970.83 |  1989.52 |   -0.95
  4 |  6305.01 |  6268.81 |0.57
  5 | 19261.56 | 18684.86 |2.99

+ve change indicates reduced planning time. It seems that the planning time
improves as the number of tables increases. But all the numbers are well
within noise levels and thus may not show any improvement or deterioration.
If you want, I can repeat the experiment.

-- 
Best Wishes,
Ashutosh Bapat


Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-18 Thread Amit Langote
On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat 
wrote:

> Hi Amit,
>
>
> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote 
> wrote:
>
>> > >
>> > > That being said I'm a big fan of using a local variable on stack and
>> > > filling it. I'd probably go with the usual palloc/pfree, because that
>> > > makes it much easier to use - the callers would not be responsible for
>> > > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
>> > > overhead, but with the AllocSet caching I doubt it's measurable.
>> >
>> > You are suggesting that instead of declaring a local variable of type
>> > SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
>> > SpecialJoinInfo which will be freed by free_child_sjinfo()
>> > (free_child_sjinfo_members in the patch). I am fine with that.
>>
>> Agree with Tomas about using makeNode() / pfree().  Having the pfree()
>> kind of makes it extra-evident that those SpecialJoinInfos are
>> transitory.
>>
>
> Attached patch-set
>
> 0001 - original patch as is
> 0002 - addresses your first set of comments
> 0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo
> structure.
>
> I will squash both 0002 and 0003 into 0001 once you review those changes
> and are fine with those.
>

Thanks for the new patches.

> > I did put this through check-world on amd64/arm64, with valgrind,
>> > > without any issue. I also tried the scripts shared by Ashutosh in his
>> > > initial message (with some minor fixes, adding MEMORY to explain etc).
>> > >
>> > > The results with the 20240130 patches are like this:
>> > >
>> > >tablesmasterpatched
>> > >   -
>> > > 2  40.8   39.9
>> > > 3 151.7  142.6
>> > > 4 464.0  418.5
>> > > 51663.9 1419.5
>>
>> Could you please post the numbers with the palloc() / pfree() version?
>>
>>
> Here are they
>  tables | master  | patched
> +-+-
>   2 | 29 MB   | 28 MB
>   3 | 102 MB  | 93 MB
>   4 | 307 MB  | 263 MB
>   5 | 1076 MB | 843 MB
>
> The numbers look slightly different from my earlier numbers. But they were
> quite old. The patch used to measure memory that time is different from the
> one that we committed. So there's a slight difference in the way we measure
> memory as well.
>

Sorry, I should’ve mentioned that I was interested in seeing cpu times to
compare the two approaches. Specifically, to see if the palloc / frees add
noticeable overhead.


Re: BitmapHeapScan streaming read user and prelim refactoring

2024-03-18 Thread Tomas Vondra
On 3/17/24 20:36, Tomas Vondra wrote:
> 
> ...
> 
>> Besides a lot of other things, I finally added debugging fprintfs printing 
>> the
>> pid, (prefetch, read), block number. Even looking at tiny excerpts of the
>> large amount of output that generates shows that two iterators were out of
>> sync.
>>
> 
> Thanks. I did experiment with fprintf, but it's quite cumbersome, so I
> was hoping you came up with some smart way to trace this king of stuff.
> For example I was wondering if ebpf would be a more convenient way.
> 

FWIW I just realized why I failed to identify this "late prefetch" issue
during my investigation. I was experimenting with instrumenting this by
adding a LD_PRELOAD library, logging all pread/fadvise calls. But the
FilePrefetch call is skipped in the page is already in shared buffers,
so this case "disappeared" during processing which matched the two calls
by doing an "inner join".

That being said, I think tracing this using LD_PRELOAD or perf may be
more convenient way to see what's happening. For example I ended up
doing this:

  perf record -a -e syscalls:sys_enter_fadvise64 \
 -e syscalls:sys_exit_fadvise64 \
 -e syscalls:sys_enter_pread64 \
 -e syscalls:sys_exit_pread64

  perf script -ns

Alternatively, perf-trace can be used and prints the filename too (but
time has ms resolution only). Processing this seems comparable to the
fprintf approach.

It still has the issue that some of the fadvise calls may be absent if
the prefetch iterator gets too far behind, but I think that can be
detected / measured by simply counting the fadvise calls, and comparing
them to pread calls. We expect these to be about the same, so

   (#pread - #fadvise) / #fadvise

is a measure of how many were "late" and skipped.

It also seems better than fprintf because it traces the actual syscalls,
not just calls to glibc wrappers. For example I saw this

postgres   54769 [001] 33768.771524828:
   syscalls:sys_enter_pread64: ..., pos: 0x30d04000

postgres   54769 [001] 33768.771526867:
   syscalls:sys_exit_pread64: 0x2000

postgres   54820 [000] 33768.771527473:
   syscalls:sys_enter_fadvise64: ..., offset: 0x30d04000, ...

postgres   54820 [000] 33768.771528320:
   syscalls:sys_exit_fadvise64: 0x0

which is clearly a case where we issue fadvise after pread of the same
block already completed.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add pg_basetype() function to obtain a DOMAIN base type

2024-03-18 Thread Alexander Korotkov
On Mon, Mar 18, 2024 at 2:01 AM jian he  wrote:
>
> looking at it again.
> I found out we can just simply do
> `
> Datum
> pg_basetype(PG_FUNCTION_ARGS)
> {
> Oid oid;
>
> oid =  get_fn_expr_argtype(fcinfo->flinfo, 0);
> PG_RETURN_OID(getBaseType(oid));
> }
> `
>
> if the type is not a domain, work the same as  pg_typeof.
> if the type is domain,  pg_typeof return as is, pg_basetype return the
> base type.
> so it only diverges when the argument type is a type of domain.
>
> the doc:
> pg_basetype ( "any" )
> regtype
>
>
>Returns the OID of the base type of a domain. If the argument
> is not a type of domain,
>return the OID of the data type of the argument just like  linkend="function-pg-typeof">pg_typeof().
>If there's a chain of domain dependencies, it will recurse
> until finding the base type.
>
>
>
> also, I think this way, we only do one syscache lookup.

Looks good to me.  But should it be named pg_basetypeof()?

--
Regards,
Alexander Korotkov




Re: speed up a logical replica setup

2024-03-18 Thread Peter Eisentraut

On 18.03.24 06:43, Hayato Kuroda (Fujitsu) wrote:

02.

"The main difference between the logical replication setup and 
pg_createsubscriber
is the initial data copy."

Grammarly suggests:
"The initial data copy is the main difference between the logical replication
setup and pg_createsubscriber."


I think that change is worse.


09.

"The source server must accept connections from the target server. The source server 
must not be in recovery."

Grammarly suggests:
"The source server must accept connections from the target server and not be in 
recovery."


I think the previous version is better.


17.

"specifies promote"

We can do double-quote for the word promote.


The v30 patch has promote, which I think is adequate.


19.

"is accepting read-write transactions"

Grammarly suggests:
"accepts read-write transactions"


I like the first one better.


20.

New options must be also documented as well. This helps not only users but also
reviewers.
(Sometimes we cannot identify that the implementation is intentinal or not.)


Which ones are missing?


21.

Also, not sure the specification is good. I preferred to specify them by format
string. Because it can reduce the number of arguments and I cannot find use 
cases
which user want to control the name of objects.

However, your approach has a benefit which users can easily identify the 
generated
objects by pg_createsubscriber. How do other think?


I think listing them explicitly is better for the first version.  It's 
simpler to implement and more flexible.



22.

```
#define BASE_OUTPUT_DIR "pg_createsubscriber_output.d"
```

No one refers the define.


This is gone in v30.


23.
  
```

}   CreateSubscriberOptions;
...
}   LogicalRepInfo;
```

Declarations after the "{" are not needed, because we do not do typedef.


Yeah, this is actually wrong, because as it is written now, it defines 
global variables.



22.

While seeing definitions of functions, I found that some pointers are declared
as const, but others are not. E.g., "char *lsn" in setup_recovery() won' be
changed but not the constant. Is it just missing or is there another rule?


Yes, more could be done here.  I have attached a patch for this.  (This 
also requires the just-committed 48018f1d8c.)



24.

```
/* standby / subscriber data directory */
static char *subscriber_dir = NULL;
```

It is bit strange that only subscriber_dir is a global variable. Caller requires
the CreateSubscriberOptions as an argument, except cleanup_objects_atexit() and
main. So, how about makeing `CreateSubscriberOptions opt` to global one?


Fewer global variables seem preferable.  Only make global as needed.


30.

```
if (num_replslots == 0)
dbinfo[i].replslotname = pg_strdup(genname);
```

I think the straightforward way is to use the name of subscription if no name
is specified. This follows the rule for CREATE SUBSCRIPTION.


right


31.

```
/* Create replication slot on publisher */
if (lsn)
pg_free(lsn);
```

I think allocating/freeing memory is not so efficient.
Can we add a flag to create_logical_replication_slot() for controlling the
returning value (NULL or duplicated string)? We can use the condition (i == 
num_dbs-1)
as flag.


Nothing is even using the return value of 
create_logical_replication_slot().  I think this can be removed altogether.



34.

```
{"config-file", required_argument, NULL, 1},
{"publication", required_argument, NULL, 2},
{"replication-slot", required_argument, NULL, 3},
{"subscription", required_argument, NULL, 4},
```

The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
options which do not have short notation are listed behind.

35.

```
opt.sub_port = palloc(16);
```

Per other lines, pg_alloc() should be used.


Even better psprintf().


37.

```
/* Register a function to clean up objects in case of failure */
atexit(cleanup_objects_atexit);
```

Sorry if we have already discussed. I think the registration can be moved just
before the boot of the standby. Before that, the callback will be no-op.


But it can also stay where it is.  What is the advantage of moving it later?


40.

```
 * XXX this code was extracted from BootStrapXLOG().
```

So, can we extract the common part to somewhere? Since system identifier is 
related
with the controldata file, I think it can be located in controldata_util.c.


Let's leave it as is for this PG release.
From d951a9f186b2162aa241f7554908b236c718154f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 18 Mar 2024 11:54:03 +0100
Subject: [PATCH] fixup! pg_createsubscriber: creates a new logical replica
 from a standby server

Add const decorations.
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 54 ++---
 1 file changed, 27 insertions(+), 27 

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-18 Thread Ashutosh Bapat
Hi Amit,


On Fri, Mar 15, 2024 at 11:45 AM Amit Langote 
wrote:

> > >
> > > That being said I'm a big fan of using a local variable on stack and
> > > filling it. I'd probably go with the usual palloc/pfree, because that
> > > makes it much easier to use - the callers would not be responsible for
> > > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > > overhead, but with the AllocSet caching I doubt it's measurable.
> >
> > You are suggesting that instead of declaring a local variable of type
> > SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> > SpecialJoinInfo which will be freed by free_child_sjinfo()
> > (free_child_sjinfo_members in the patch). I am fine with that.
>
> Agree with Tomas about using makeNode() / pfree().  Having the pfree()
> kind of makes it extra-evident that those SpecialJoinInfos are
> transitory.
>

Attached patch-set

0001 - original patch as is
0002 - addresses your first set of comments
0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo
structure.

I will squash both 0002 and 0003 into 0001 once you review those changes
and are fine with those.


>
> > > I did put this through check-world on amd64/arm64, with valgrind,
> > > without any issue. I also tried the scripts shared by Ashutosh in his
> > > initial message (with some minor fixes, adding MEMORY to explain etc).
> > >
> > > The results with the 20240130 patches are like this:
> > >
> > >tablesmasterpatched
> > >   -
> > > 2  40.8   39.9
> > > 3 151.7  142.6
> > > 4 464.0  418.5
> > > 51663.9 1419.5
>
> Could you please post the numbers with the palloc() / pfree() version?
>
>
Here are they
 tables | master  | patched
+-+-
  2 | 29 MB   | 28 MB
  3 | 102 MB  | 93 MB
  4 | 307 MB  | 263 MB
  5 | 1076 MB | 843 MB

The numbers look slightly different from my earlier numbers. But they were
quite old. The patch used to measure memory that time is different from the
one that we committed. So there's a slight difference in the way we measure
memory as well.

-- 
Best Wishes,
Ashutosh Bapat
From fc14dd22b1a150e82f99cad4440485daaf65eb00 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 27 Sep 2023 16:29:11 +0530
Subject: [PATCH 2/3] Address Amit's comments

---
 src/backend/optimizer/path/joinrels.c | 13 +
 src/include/nodes/pathnodes.h |  4 
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index d972aef988..05b3a6c017 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
 /*
  * free_child_sjinfo_members
  *		Free memory consumed by members of a child SpecialJoinInfo.
+ *
+ * Only members that are translated copies of their counterpart in the parent
+ * SpecialJoinInfo are freed here.  However, members that could be referenced
+ * elsewhere are not freed.
  */
 static void
 free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
@@ -1746,19 +1750,12 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
 	if (child_sjinfo->jointype == JOIN_INNER)
 		return;
 
-	/*
-	 * The relids are used only for comparison and their references are not
-	 * stored anywhere. Free those.
-	 */
 	bms_free(child_sjinfo->min_lefthand);
 	bms_free(child_sjinfo->min_righthand);
 	bms_free(child_sjinfo->syn_lefthand);
 	bms_free(child_sjinfo->syn_righthand);
 
-	/*
-	 * But the list of operator OIDs and the list of expressions may be
-	 * referenced somewhere else. Do not free those.
-	 */
+	/* semi_rhs_exprs may be referenced, so don't free. */
 }
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..7192ae5171 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2854,6 +2854,10 @@ typedef struct PlaceHolderVar
  * cost estimation purposes it is sometimes useful to know the join size under
  * plain innerjoin semantics.  Note that lhs_strict and the semi_xxx fields
  * are not set meaningfully within such structs.
+ *
+ * We also create transient SpecialJoinInfos for child joins by translating
+ * corresponding parent SpecialJoinInfos. These are also not part of
+ * PlannerInfo's join_info_list.
  */
 #ifndef HAVE_SPECIALJOININFO_TYPEDEF
 typedef struct SpecialJoinInfo SpecialJoinInfo;
-- 
2.25.1

From 5a12690afe180defb8cc4f923fb0c4b89c2153a1 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 24 Jul 2023 20:20:53 +0530
Subject: [PATCH 1/3] Reduce memory used by child SpecialJoinInfo

The SpecialJoinInfo applicable to a child join is computed by
translating the same applicable to the parent join in
try_partitionwise_join(). The child SpecialJoinInfo is not needed once
the child join RelOptInfo is 

Re: speed up a logical replica setup

2024-03-18 Thread Peter Eisentraut

On 18.03.24 08:18, vignesh C wrote:

1) Maximum size of the object name is 64, we can have a check so that
we don't specify more than the maximum allowed length:
+ case 3:
+ if (!simple_string_list_member(_names, optarg))
+ {
+ simple_string_list_append(_names, optarg);
+ num_replslots++;
+ }
+ else
+ {
+ pg_log_error("duplicate replication slot \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
  ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
In this case creation of replication slot will fail:
pg_createsubscriber: error: could not create replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
database "db2": ERROR:  replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
already exists


I think this is fine.  The server can check whether the names it is 
given are of the right size.  We don't need to check it again in the client.



2) Similarly here too:
+ case 4:
+ if (!simple_string_list_member(_names, optarg))
+ {
+ simple_string_list_append(_names, optarg);
+ num_subs++;
+ }
+ else
+ {
+ pg_log_error("duplicate subscription \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3

Subscriptions will be created with the same name and later there will
be a problem when setting replication progress as there will be
multiple subscriptions with the same name.


Could you clarify this problem?





Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-18 Thread Bertrand Drouvot
Hi,

On Sat, Mar 16, 2024 at 09:29:01AM +0530, Bharath Rupireddy wrote:
> Please see the attached v11 patch set with all the above review
> comments addressed.

Thanks!

Looking at 0001:

1 ===

+   True if this logical slot conflicted with recovery (and so is now
+   invalidated). When this column is true, check

Worth to add back the physical slot mention "Always NULL for physical slots."?

2 ===

@@ -1023,9 +1023,10 @@ CREATE VIEW pg_replication_slots AS
 L.wal_status,
 L.safe_wal_size,
 L.two_phase,
-L.conflict_reason,
+L.conflicting,
 L.failover,
-L.synced
+L.synced,
+L.invalidation_reason

What about making invalidation_reason close to conflict_reason?

3 ===

- * Maps a conflict reason for a replication slot to
+ * Maps a invalidation reason for a replication slot to

s/a invalidation/an invalidation/?

4 ===

While at it, shouldn't we also rename "conflict" to say "invalidation_cause" in
InvalidatePossiblyObsoleteSlot()?

5 ===

+ * rows_removed and wal_level_insufficient are only two reasons

s/are only two/are the only two/?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: remaining sql/json patches

2024-03-18 Thread Amit Langote
Himanshu,

On Mon, Mar 18, 2024 at 4:57 PM Himanshu Upadhyaya
 wrote:
> I have tested a nested case  but  why is the negative number allowed in 
> subscript(NESTED '$.phones[-1]'COLUMNS), it should error out if the number is 
> negative.
>
> ‘postgres[170683]=#’SELECT * FROM JSON_TABLE(jsonb '{
> ‘...>’ "id" : "0.234567897890",
> ‘...>’ "name" : { 
> "first":"John", "last":"Doe" 
> },
> ‘...>’ "phones" : [{"type":"home", "number":"555-3762"},
> ‘...>’ {"type":"work", "number":"555-7252", 
> "test":123}]}',
> ‘...>’'$'
> ‘...>’COLUMNS(
> ‘...>’ id numeric(2,2) PATH 'lax $.id',
> ‘...>’ last_name varCHAR(10) PATH 'lax $.name.last', 
> first_name VARCHAR(10) PATH 'lax $.name.first',
> ‘...>’  NESTED '$.phones[-1]'COLUMNS (
> ‘...>’"type" VARCHAR(10),
> ‘...>’"number" VARCHAR(10)
> ‘...>’ )
> ‘...>’  )
> ‘...>’   ) as t;
>   id  | last_name | first_name | type | number
> --+---++--+
>  0.23 | Doe   | Johnnn |  |
> (1 row)

You're not getting an error because the default mode of handling
structural errors in SQL/JSON path expressions is "lax".  If you say
"strict" in the path string, you will get an error:

SELECT * FROM JSON_TABLE(jsonb '{
 "id" : "0.234567897890",
 "name" : {
"first":"John",
"last":"Doe" },
 "phones" : [{"type":"home", "number":"555-3762"},
 {"type":"work", "number":"555-7252", "test":123}]}',
'$'
COLUMNS(
 id numeric(2,2) PATH 'lax $.id',
 last_name varCHAR(10) PATH 'lax $.name.last',
first_name VARCHAR(10) PATH 'lax $.name.first',
  NESTED 'strict $.phones[-1]'COLUMNS (
"type" VARCHAR(10),
"number" VARCHAR(10)
 )
  ) error on error
   ) as t;
ERROR:  jsonpath array subscript is out of bounds

-- 
Thanks, Amit Langote




Re: Catalog domain not-null constraints

2024-03-18 Thread Aleksander Alekseev
Hi,

> Anyway, in order to move this forward, here is an updated patch where
> the ADD CONSTRAINT ... NOT NULL behavior for domains matches the
> idempotent behavior of tables.  This uses the patch that Jian He posted.

I tested the patch on Raspberry Pi 5 and Intel MacBook and also
experimented with it. Everything seems to work properly.

Personally I believe new functions such as
validateDomainNotNullConstraint() and findDomainNotNullConstraint()
could use a few lines of comments (accepts..., returns..., etc). Also
I think that the commit message should explicitly say that supporting
NOT VALID constraints is out of scope of this patch.

Except for named nitpicks v4 LGTM.

-- 
Best regards,
Aleksander Alekseev




Re: Improve readability by using designated initializers when possible

2024-03-18 Thread jian he
On Mon, Mar 18, 2024 at 3:09 PM Peter Eisentraut  wrote:
>
> On 14.03.24 01:26, Michael Paquier wrote:
> > -EventTriggerSupportsObjectClass(ObjectClass objclass)
> > +EventTriggerSupportsObject(const ObjectAddress *object)
> >
> > The shortcut introduced here is interesting, but it is inconsistent.
> > HEAD treats OCLASS_SUBSCRIPTION as something supported by event
> > triggers, but as pg_subscription is a shared catalog it would be
> > discarded with your change.  Subscriptions are marked as supported in
> > the event trigger table:
> > https://www.postgresql.org/docs/devel/event-trigger-matrix.html
>
> Ah, good catch.  Subscriptions are a little special there.  Here is a
> new patch that keeps the switch/case arrangement in that function.  That
> also makes it easier to keep the two EventTriggerSupports... functions
> aligned.  Also added a note about subscriptions and a reference to the
> documentation.

select relname from pg_class where relisshared and relkind = 'r';
relname
---
 pg_authid
 pg_subscription
 pg_database
 pg_db_role_setting
 pg_tablespace
 pg_auth_members
 pg_shdepend
 pg_shdescription
 pg_replication_origin
 pg_shseclabel
 pg_parameter_acl
(11 rows)

EventTriggerSupportsObject should return false for the following:
SharedSecLabelRelationId
SharedDescriptionRelationId
DbRoleSettingRelationId
SharedDependRelationId

but I am not sure ReplicationOriginRelationId.




Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-18 Thread Alexander Korotkov
On Mon, Mar 18, 2024 at 5:17 AM Amit Kapila  wrote:
> On Sun, Mar 17, 2024 at 7:40 PM Alexander Korotkov 
wrote:
> > On Sat, Mar 16, 2024 at 5:05 PM Bharath Rupireddy
> >  wrote:
> > > On Sat, Mar 16, 2024 at 4:26 PM Amit Kapila 
wrote:
> > > > In general, it seems this patch has been stuck for a long time on
the
> > > > decision to choose an appropriate UI (syntax), and we thought of
> > > > moving it further so that the other parts of the patch can be
> > > > reviewed/discussed. So, I feel before pushing this we should see
> > > > comments from a few (at least two) other senior members who earlier
> > > > shared their opinion on the syntax. I know we don't have much time
> > > > left but OTOH pushing such a change (where we didn't have a
consensus
> > > > on syntax) without much discussion at this point of time could lead
to
> > > > discussions after commit.
> > >
> > > +1 to gain consensus first on the syntax changes. With this, we might
> > > be violating the SQL standard for explicit txn commands (I stand for
> > > correction about the SQL standard though).
> >
> > Thank you for your feedback.  Generally, I agree it's correct to get
> > consensus on syntax first.  And yes, this patch has been here since
> > 2016.  We didn't get consensus for syntax for 8 years.  Frankly
> > speaking, I don't see a reason why this shouldn't take another 8
> > years.  At the same time the ability to wait on standby given LSN is
> > replayed seems like pretty basic and simple functionality.  Thus, it's
> > quite frustrating it already took that long and still unclear when/how
> > this could be finished.
> >
> > My current attempt was to commit minimal implementation as less
> > invasive as possible.  A new clause for BEGIN doesn't require
> > additional keywords and doesn't introduce additional statements.  But
> > yes, this is still a new qual.  And, yes, Amit you're right that even
> > if I had committed that, there was still a high risk of further
> > debates and revert.
> >
> > Given my specsis about agreement over syntax, I'd like to check
> > another time if we could go without new syntax at all.  There was an
> > attempt to implement waiting for lsn as a function.  But function
> > holds a snapshot, which could prevent WAL records from being replayed.
> > Releasing a snapshot could break the parent query.  But now we have
> > procedures, which need a dedicated statement for the call and can even
> > control transactions.  Could we implement a waitlsn in a procedure
> > that:
> >
> > 1. First, check that it was called with non-atomic context (that is,
> > it's not called within a transaction). Trigger error if called with
> > atomic context.
> > 2. Release a snapshot to be able to wait without risk of WAL replay
> > stuck.  Procedure is still called within the snapshot.  It's a bit of
> > a hack to release a snapshot, but Vacuum statements already do so.
> >
>
> Can you please provide a bit more details with some example what is
> the existing problem with functions and how using procedures will
> resolve it? How will this this address the implicit transaction case
> or do we have any other workaround for those cases?

Please check [1] and [2] for the explanation of the problem with functions.

Also, please find a draft patch implementing the procedure.  The issue with
the snapshot is addressed with the following lines.

We first ensure we're in a non-atomic context, then pop an active snapshot
(tricky, but ExecuteVacuum() does the same).  Then we should have no active
snapshot and it's safe to wait for lsn replay.

if (context->atomic)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("pg_wait_lsn() must be only called in non-atomic
context")));

if (ActiveSnapshotSet())
PopActiveSnapshot();
Assert(!ActiveSnapshotSet());

The function call could be added either before the BEGIN statement or
before the implicit transaction.

CALL pg_wait_lsn('my_lsn', my_timeout); BEGIN;
CALL pg_wait_lsn('my_lsn', my_timeout); SELECT ...;

Links
1.
https://www.postgresql.org/message-id/CAPpHfduBSN8j5j5Ynn5x%3DThD%3D8ypNd53D608VXGweBsPzxPvqA%40mail.gmail.com
2.
https://www.postgresql.org/message-id/CAPpHfdtiGgn0iS1KbW2HTam-1%2BoK%2BvhXZDAcnX9hKaA7Oe%3DF-A%40mail.gmail.com

--
Regards,
Alexander Korotkov


v11-0001-Implement-AFTER-clause-for-BEGIN-command.patch
Description: Binary data


Re: Refactoring backend fork+exec code

2024-03-18 Thread Heikki Linnakangas

On 13/03/2024 09:30, Heikki Linnakangas wrote:

Attached is a new version of the remaining patches.


Committed, with some final cosmetic cleanups. Thanks everyone!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-18 Thread Bertrand Drouvot
Hi,

On Mon, Mar 18, 2024 at 08:50:56AM +0530, Amit Kapila wrote:
> On Sun, Mar 17, 2024 at 2:03 PM Bharath Rupireddy
>  wrote:
> >
> > On Sat, Mar 16, 2024 at 3:55 PM Amit Kapila  wrote:
> > >
> > > procArray->replication_slot_catalog_xmin) but then don't adjust it for
> > > 'max_slot_xid_age'. I could be missing something in this but it is
> > > better to keep discussing this and try to move with another parameter
> > > 'inactive_replication_slot_timeout' which according to me can be kept
> > > at slot level instead of a GUC but OTOH we need to see the arguments
> > > on both side and then decide which makes more sense.
> >
> > Hm. Are you suggesting inactive_timeout to be a slot level parameter
> > similar to 'failover' property added recently by
> > c393308b69d229b664391ac583b9e07418d411b6 and
> > 73292404370c9900a96e2bebdc7144f7010339cf? With this approach, one can
> > set inactive_timeout while creating the slot either via
> > pg_create_physical_replication_slot() or
> > pg_create_logical_replication_slot() or CREATE_REPLICATION_SLOT or
> > ALTER_REPLICATION_SLOT command, and postgres tracks the
> > last_inactive_at for every slot based on which the slot gets
> > invalidated. If this understanding is right, I can go ahead and work
> > towards it.
> >
> 
> Yeah, I have something like that in mind. You can prepare the patch
> but it would be good if others involved in this thread can also share
> their opinion.

I think it makes sense to put the inactive_timeout granularity at the slot
level (as the activity could vary a lot say between one slot linked to a 
subcription and one linked to some plugins). As far max_slot_xid_age I've the
feeling that a new GUC is good enough.

> > Alternatively, we can go the route of making GUC a list of key-value
> > pairs of {slot_name, inactive_timeout}, but this kind of GUC for
> > setting slot level parameters is going to be the first of its kind, so
> > I'd prefer the above approach.
> >
> 
> I would prefer a slot-level parameter in this case rather than a GUC.

Yeah, same here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Autogenerate some wait events code and documentation

2024-03-18 Thread Alvaro Herrera
On 2023-Aug-28, Michael Paquier wrote:

> I have looked again at that, and switching wait_event_names.txt to use
> two columns made of the typedef definitions and the docs like is not a
> problem:
> FOO_BAR   "Waiting on Foo Bar."
> 
> WAIT_EVENT_ is appended to the typedef definitions in the script.  The
> wait event names like "FooBar" are generated from the enums by
> splitting using their underscores and doing some lc().  Lock and
> LWLock don't need to change.  This way, it is easy to grep the wait
> events from the source code and match them with wait_event_names.txt.

FTR I had a rather unpleasant time last week upon finding a wait event
named PgSleep.  If you grep for that, there are no matches at all; and I
spent ten minutes (for real) trying to figure out where that was coming
from, until I remembered this thread.

Now you have to guess that not only random lowercasing is happening, but
also underscore removal.  This is not a good developer experience and I
think we should rethink this choice.  It would be infinitely more
usable, and not one bit more difficult, to make these lines be

WAIT_EVENT_FOO_BAR  FooBar  "Waiting on Foo Bar."

then there is no guessing involved.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The important things in the world are problems with society that we don't
understand at all. The machines will become more complicated but they won't
be more complicated than the societies that run them."(Freeman Dyson)




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-18 Thread Bertrand Drouvot
Hi,

On Sat, Mar 16, 2024 at 09:29:01AM +0530, Bharath Rupireddy wrote:
> I've also responded to Bertrand's comments here.

Thanks!

> 
> On Wed, Mar 6, 2024 at 3:56 PM Bertrand Drouvot
>  wrote:
> >
> > 5 ===
> >
> > -# Check conflict_reason is NULL for physical slot
> > +# Check invalidation_reason is NULL for physical slot
> >  $res = $node_primary->safe_psql(
> > 'postgres', qq[
> > -SELECT conflict_reason is null FROM pg_replication_slots 
> > where slot_name = '$primary_slotname';]
> > +SELECT invalidation_reason is null FROM 
> > pg_replication_slots where slot_name = '$primary_slotname';]
> >  );
> >
> >
> > I don't think this test is needed anymore: it does not make that much sense 
> > since
> > it's done after the primary database initialization and startup.
> 
> It is now turned into a test verifying 'conflicting boolean' is null
> for the physical slot. Isn't that okay?

Yeah makes more sense now, thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Autogenerate some wait events code and documentation

2024-03-18 Thread Bertrand Drouvot
Hi,

On Mon, Mar 18, 2024 at 08:02:24AM +0900, Michael Paquier wrote:
> On Sun, Mar 17, 2024 at 11:31:14AM -0700, Noah Misch wrote:
> > Adding an event
> > will renumber others, which can make an extension report the wrong event 
> > until
> > recompiled.  Extensions citus, pg_bulkload, and vector refer to static 
> > events.
> > If a back-patch added WAIT_EVENT_MESSAGE_QUEUE_SOMETHING_NEW, an old-build
> > pg_bulkload report of WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN would show up in
> > pg_stat_activity as WAIT_EVENT_PARALLEL_BITMAP_SCAN.  (WAIT_EVENT_EXTENSION 
> > is
> > not part of a generated enum, fortunately.)

Nice catch, thanks!

> I see an option (4), similar to your (2) without the per-line
> annotation: add a new magic keyword like the existing "Section" that
> is used in the first lines of generate-wait_event_types.pl where we
> generate tab-separated lines with the section name as prefix of each
> line.  So I can think of something like:
> Section: ClassName - WaitEventFoo
> FOO_1 "Waiting in foo1"
> FOO_2 "Waiting in foo2"
> Backpatch:
> BAR_1 "Waiting in bar1"
> BAR_2 "Waiting in bar2"
> 
> Then force the ordering for the docs and keep the elements in the
> backpatch section at the end of the enums in the order in the txt.

Yeah I think that's a good idea.

> One thing that could make sense is to enforce that "Backpatch" is at
> the end of a section, meaning that we would need a second keyword like
> a "Section: EndBackpatch" or something like that.  That's not strictly
> necessary IMO as the format of the txt is easy to follow.

I gave it a try in the POC patch attached. I did not use a "EndBackpatch"
section to keep the perl script as simple a possible though (but documented the
expectation instead).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From f93412201df9fa9156cee0b2492a25ac89ff0921 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 18 Mar 2024 08:34:19 +
Subject: [PATCH v1] Add a "Backpatch" section in wait_event_names.txt

When backpatching, adding an event will renumber others, which can make an
extension report the wrong event until recompiled. This is due to the fact that
generate-wait_event_types.pl sorts events to position them. The "Backpatch"
section added here ensures no ordering is done for the wait events found after
this delimiter.
---
 .../activity/generate-wait_event_types.pl | 26 ++-
 .../utils/activity/wait_event_names.txt   | 10 +--
 2 files changed, 33 insertions(+), 3 deletions(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index f1adf0e8e7..d129d94889 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -38,7 +38,9 @@ die "Not possible to specify --docs and --code simultaneously"
 
 open my $wait_event_names, '<', $ARGV[0] or die;
 
+my @backpatch_lines;
 my @lines;
+my $backpatch = 0;
 my $section_name;
 my $note;
 my $note_name;
@@ -59,10 +61,26 @@ while (<$wait_event_names>)
 	{
 		$section_name = $_;
 		$section_name =~ s/^.*- //;
+		$backpatch = 0;
 		next;
 	}
 
-	push(@lines, $section_name . "\t" . $_);
+	# Are we dealing with backpatch wait events?
+	if (/^Backpatch:$/)
+	{
+		$backpatch = 1;
+		next;
+	}
+
+	# Backpatch wait events won't be sorted during code generation
+	if ($gen_code && $backpatch)
+	{
+		push(@backpatch_lines, $section_name . "\t" . $_);
+	}
+	else
+	{
+		push(@lines, $section_name . "\t" . $_);
+	}
 }
 
 # Sort the lines based on the second column.
@@ -70,6 +88,12 @@ while (<$wait_event_names>)
 my @lines_sorted =
   sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines;
 
+# If we are generating code then concat @lines_sorted and @backpatch_lines.
+if ($gen_code)
+{
+	push(@lines_sorted, @backpatch_lines);
+}
+
 # Read the sorted lines and populate the hash table
 foreach my $line (@lines_sorted)
 {
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c08e00d1d6..62c835df2e 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -24,7 +24,12 @@
 #  SGML tables of wait events for inclusion in the documentation.
 #
 # When adding a new wait event, make sure it is placed in the appropriate
-# ClassName section.
+# ClassName section. If the wait event is backpatched to a version < 17 then
+# put it under a "Backpatch" delimiter at the end of the related ClassName
+# section.
+# Ensure that the wait events under the "Backpatch" delimiter are placed in the
+# same order as in the pre 17 wait_event_types.h (see VERSION_FILE_SYNC as an
+# example).
 #
 # WaitEventLWLock and WaitEventLock have their own C code for their wait event
 # enums and function names.  Hence, for 

Re: Skip collecting decoded changes of already-aborted transactions

2024-03-18 Thread Masahiko Sawada
On Fri, Mar 15, 2024 at 1:21 PM Ajin Cherian  wrote:
>
>
>
> On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada  wrote:
>>
>>
>> I resumed working on this item. I've attached the new version patch.
>>
>> I rebased the patch to the current HEAD and updated comments and
>> commit messages. The patch is straightforward and I'm somewhat
>> satisfied with it, but I'm thinking of adding some tests for it.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> Amazon Web Services: https://aws.amazon.com
>
>
> I just had a look at the patch, the patch no longer applies because of a 
> removal of a header in a recent commit. Overall the patch looks fine, and I 
> didn't find any issues. Some cosmetic comments:

Thank you for your review comments.

> in ReorderBufferCheckTXNAbort()
> + /* Quick return if we've already knew the transaction status */
> + if (txn->aborted)
> + return true;
>
> knew/know

Maybe it should be "known"?

>
> /*
> + * If logical_replication_mode is "immediate", we don't check the
> + * transaction status so the caller always process this transaction.
> + */
> + if (debug_logical_replication_streaming == 
> DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
> + return false;
>
> /process/processes
>

Fixed.

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v4-0001-Skip-logical-decoding-of-already-aborted-transact.patch
Description: Binary data


Re: Check lateral references within PHVs for memoize cache keys

2024-03-18 Thread Richard Guo
On Fri, Feb 2, 2024 at 5:18 PM Richard Guo  wrote:

> The v4 patch does not apply any more.  I've rebased it on master.
> Nothing else has changed.
>

Here is another rebase over master so it applies again.  I also added a
commit message to help review.  Nothing else has changed.

Thanks
Richard


v6-0001-Check-lateral-refs-within-PHVs-for-memoize-cache-keys.patch
Description: Binary data


Re: Support json_errdetail in FRONTEND builds

2024-03-18 Thread Daniel Gustafsson
> On 18 Mar 2024, at 01:13, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> I took another look at this tonight and committed it with some mostly 
>> cosmetic
>> changes.  Since then, tamandua failed the SSL test on this commit but I am
>> unable to reproduce any error both on older OpenSSL and matching the 3.1
>> version on tamandua, so not sure if its related.
> 
> That failure looks like just a random buildfarm burp:

Indeed, and it corrected itself a few hours later.

--
Daniel Gustafsson





Re: What about Perl autodie?

2024-03-18 Thread Daniel Gustafsson
> On 18 Mar 2024, at 07:27, Peter Eisentraut  wrote:

> After some pondering, I figured the exclude list is better.  

Agreed.

> So here is a squashed patch, also with a complete commit message.

Looks good from a read-through.  It would have been nice to standardize on
using one of "|| die" and "or die" consistently but that's clearly not for this
body of work.

--
Daniel Gustafsson





Re: Remove a FIXME and unused variables in Meson

2024-03-18 Thread Peter Eisentraut

On 14.03.24 06:13, Tristan Partin wrote:
One of them adds version gates to two LLVM flags (-frwapv, 
-fno-strict-aliasing). I believe we moved the minimum LLVM version 
recently, so these might not be necessary, but maybe it helps for 
historictal reasons. If not, I'll just remove the comment in a different 
patch.


We usually remove version gates once the overall minimum required 
version is new enough.  So this doesn't seem like a step in the right 
direction.


Second patch removes some unused variables. Were they analogous to 
things in autotools and the Meson portions haven't been added yet?


Hmm, yeah, no idea.  These were not used even in the first commit for 
Meson support.  Might have had a purpose in earlier draft patches.






Re: MERGE ... RETURNING

2024-03-18 Thread Dean Rasheed
On Fri, 15 Mar 2024 at 17:14, Jeff Davis  wrote:
>
> On Fri, 2024-03-15 at 11:20 +, Dean Rasheed wrote:
>
> > So barring any further objections, I'd like to go ahead and get this
> > patch committed.
>
> I like this feature from a user perspective. So +1 from me.
>

I have committed this. Thanks for all the feedback everyone.

Regards,
Dean




Re: remaining sql/json patches

2024-03-18 Thread Himanshu Upadhyaya
I have tested a nested case  but  why is the negative number allowed in
subscript(NESTED '$.phones[-1]'COLUMNS), it should error out if the number
is negative.

‘postgres[170683]=#’SELECT * FROM JSON_TABLE(jsonb '{
‘...>’ "id" : "0.234567897890",
‘...>’ "name" : {
"first":"John",
"last":"Doe" },
‘...>’ "phones" : [{"type":"home", "number":"555-3762"},
‘...>’ {"type":"work", "number":"555-7252",
"test":123}]}',
‘...>’'$'
‘...>’COLUMNS(
‘...>’ id numeric(2,2) PATH 'lax $.id',
‘...>’ last_name varCHAR(10) PATH 'lax $.name.last',
first_name VARCHAR(10) PATH 'lax $.name.first',
‘...>’  NESTED '$.phones[-1]'COLUMNS (
‘...>’"type" VARCHAR(10),
‘...>’"number" VARCHAR(10)
‘...>’ )
‘...>’  )
‘...>’   ) as t;
  id  | last_name | first_name | type | number
--+---++--+
 0.23 | Doe   | Johnnn |  |
(1 row)

Thanks,
Himanshu


Re: Catalog domain not-null constraints

2024-03-18 Thread Peter Eisentraut
Anyway, in order to move this forward, here is an updated patch where 
the ADD CONSTRAINT ... NOT NULL behavior for domains matches the 
idempotent behavior of tables.  This uses the patch that Jian He posted.
From a0075e4bcd5f2db292f92fc2b70576ccebd07210 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 23 Nov 2023 07:35:32 +0100
Subject: [PATCH v4 1/2] Add tests for domain-related information schema views

Discussion: 
https://www.postgresql.org/message-id/flat/9ec24d7b-633d-463a-84c6-7acff769c9e8%40eisentraut.org
---
 src/test/regress/expected/domain.out | 47 
 src/test/regress/sql/domain.sql  | 24 ++
 2 files changed, 71 insertions(+)

diff --git a/src/test/regress/expected/domain.out 
b/src/test/regress/expected/domain.out
index 6d94e84414a..e70aebd70c0 100644
--- a/src/test/regress/expected/domain.out
+++ b/src/test/regress/expected/domain.out
@@ -1207,3 +1207,50 @@ create domain testdomain1 as int constraint unsigned 
check (value > 0);
 alter domain testdomain1 rename constraint unsigned to unsigned_foo;
 alter domain testdomain1 drop constraint unsigned_foo;
 drop domain testdomain1;
+--
+-- Information schema
+--
+SELECT * FROM information_schema.column_domain_usage
+  WHERE domain_name IN ('con', 'dom', 'pos_int', 'things')
+  ORDER BY domain_name;
+ domain_catalog | domain_schema | domain_name | table_catalog | table_schema | 
table_name | column_name 
++---+-+---+--++-
+ regression | public| con | regression| public   | 
domcontest | col1
+ regression | public| dom | regression| public   | 
domview| col1
+ regression | public| things  | regression| public   | 
thethings  | stuff
+(3 rows)
+
+SELECT * FROM information_schema.domain_constraints
+  WHERE domain_name IN ('con', 'dom', 'pos_int', 'things')
+  ORDER BY constraint_name;
+ constraint_catalog | constraint_schema | constraint_name | domain_catalog | 
domain_schema | domain_name | is_deferrable | initially_deferred 
++---+-++---+-+---+
+ regression | public| con_check   | regression | 
public| con | NO| NO
+ regression | public| meow| regression | 
public| things  | NO| NO
+ regression | public| pos_int_check   | regression | 
public| pos_int | NO| NO
+(3 rows)
+
+SELECT * FROM information_schema.domains
+  WHERE domain_name IN ('con', 'dom', 'pos_int', 'things')
+  ORDER BY domain_name;
+ domain_catalog | domain_schema | domain_name | data_type | 
character_maximum_length | character_octet_length | character_set_catalog | 
character_set_schema | character_set_name | collation_catalog | 
collation_schema | collation_name | numeric_precision | numeric_precision_radix 
| numeric_scale | datetime_precision | interval_type | interval_precision | 
domain_default | udt_catalog | udt_schema | udt_name | scope_catalog | 
scope_schema | scope_name | maximum_cardinality | dtd_identifier 
++---+-+---+--++---+--++---+--++---+-+---++---+++-++--+---+--++-+
+ regression | public| con | integer   |
  ||   |  | 
   |   |  ||
32 |   2 | 0 || 
  ||| regression  | pg_catalog 
| int4 |   |  || | 1
+ regression | public| dom | integer   |
  ||   |  | 
   |   |  ||
32 |   2 | 0 || 
  ||| regression  | pg_catalog 
| int4 |   |  || | 1
+ regression | public| pos_int | integer   |
  ||   |  | 
   |   |  |   

Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value

2024-03-18 Thread Yugo NAGATA
On Thu, 14 Mar 2024 11:10:42 -0500
Nathan Bossart  wrote:

> On Wed, Mar 13, 2024 at 01:09:18PM +0700, Yugo NAGATA wrote:
> > On Tue, 12 Mar 2024 22:07:17 -0500
> > Nathan Bossart  wrote:
> >> I did some light editing to prepare this for commit.
> > 
> > Thank you. I confirmed the test you improved and I am fine with that.
> 
> Committed.

Thank you!

> 
> -- 
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com


-- 
Yugo NAGATA 




Re: speed up a logical replica setup

2024-03-18 Thread vignesh C
On Sat, 16 Mar 2024 at 21:13, Euler Taveira  wrote:
>
> On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
>
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.
>
>
> Yes. I was wondering to implement after v1 is pushed. I started to write a 
> code
> for it but I wasn't sure about the UI. The best approach I came up with was
> multiple options in the same order. (I don't provide short options to avoid
> possible portability issues with the order.) It means if I have 3 databases 
> and
> the following command-line:
>
> pg_createsubscriber ... --database pg1 --database pg2 --database3 
> --publication
> pubx --publication puby --publication pubz
>
> pubx, puby and pubz are created in the database pg1, pg2, and pg3 
> respectively.
>
> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?
>
>
> If it happens after creating a publication and before promotion, the cleanup
> routine will drop the publications on primary and it will eventually be 
> applied
> to the standby via replication later.
>
> I see the below note in the patch:
> +If pg_createsubscriber fails while processing,
> +then the data directory is likely not in a state that can be recovered. 
> It
> +is true if the target server was promoted. In such a case, creating a new
> +standby server is recommended.
>
> By reading this it is not completely clear whether the standby is not
> recoverable in case of any error or only an error after the target
> server is promoted. If others agree with this behavior then we should
> write the detailed reason for this somewhere in the comments as well
> unless it is already explained.
>
>
> I rewrote the sentence to make it clear that only if the server is promoted,
> the target server will be in a state that cannot be reused. It provides a
> message saying it too.
>
> pg_createsubscriber: target server reached the consistent state
> pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you
> must recreate the physical replica before continuing.
>
> I'm attaching a new version (v30) that adds:
>
> * 3 new options (--publication, --subscription, --replication-slot) to assign
>   names to the objects. The --database option used to ignore duplicate names,
>   however, since these new options rely on the number of database options to
>   match the number of object name options, it is forbidden from now on. The
>   duplication is also forbidden for the object names to avoid errors earlier.
> * rewrite the paragraph related to unusuable target server after
>   pg_createsubscriber fails.

1) Maximum size of the object name is 64, we can have a check so that
we don't specify more than the maximum allowed length:
+ case 3:
+ if (!simple_string_list_member(_names, optarg))
+ {
+ simple_string_list_append(_names, optarg);
+ num_replslots++;
+ }
+ else
+ {
+ pg_log_error("duplicate replication slot \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
 ./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2"
--replication-slot="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes3"
In this case creation of replication slot will fail:
pg_createsubscriber: error: could not create replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes" on
database "db2": ERROR:  replication slot
"testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes"
already exists

2) Similarly here too:
+ case 4:
+ if (!simple_string_list_member(_names, optarg))
+ {
+ simple_string_list_append(_names, optarg);
+ num_subs++;
+ }
+ else
+ {
+ pg_log_error("duplicate subscription \"%s\"", optarg);
+ exit(1);
+ }
+ break;

If we allow something like this:
./pg_createsubscriber -U postgres -D data_N2/ -P "port=5431
user=postgres"  -p 5432 -s /home/vignesh/postgres/inst/bin/ -d db1 -d
db2 -d db3 
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes1
--subscription=testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttes2

Re: Improve readability by using designated initializers when possible

2024-03-18 Thread Peter Eisentraut

On 14.03.24 01:26, Michael Paquier wrote:

-EventTriggerSupportsObjectClass(ObjectClass objclass)
+EventTriggerSupportsObject(const ObjectAddress *object)

The shortcut introduced here is interesting, but it is inconsistent.
HEAD treats OCLASS_SUBSCRIPTION as something supported by event
triggers, but as pg_subscription is a shared catalog it would be
discarded with your change.  Subscriptions are marked as supported in
the event trigger table:
https://www.postgresql.org/docs/devel/event-trigger-matrix.html


Ah, good catch.  Subscriptions are a little special there.  Here is a 
new patch that keeps the switch/case arrangement in that function.  That 
also makes it easier to keep the two EventTriggerSupports... functions 
aligned.  Also added a note about subscriptions and a reference to the 
documentation.
From 3a17f075f9ccac2eb7a969be5e455bb29a9e40e2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 18 Mar 2024 08:07:37 +0100
Subject: [PATCH v4] Remove ObjectClass

ObjectClass is an enum whose values correspond to catalog OIDs.  But
the extra layer of redirection, which is used only in small parts of
the code, and the similarity to ObjectType, are confusing and
cumbersome.

One advantage has been that some switches processing the OCLASS enum
don't have "default:" cases.  This is so that the compiler tells us
when we fail to add support for some new object class.  But you can
also handle that with some assertions and proper test coverage.  It's
not even clear how strong this benefit is.  For example, in
AlterObjectNamespace_oid(), you could still put a new OCLASS into the
"ignore object types that don't have schema-qualified names" case, and
it might or might not be wrong.  Also, there are already various
OCLASS switches that do have a default case, so it's not even clear
what the preferred coding style should be.

Discussion: 
https://www.postgresql.org/message-id/flat/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw%40mail.gmail.com
---
 src/backend/catalog/dependency.c | 239 
 src/backend/catalog/objectaddress.c  | 316 ---
 src/backend/commands/alter.c |  73 ++-
 src/backend/commands/event_trigger.c | 130 ++-
 src/backend/commands/tablecmds.c |  62 ++
 src/include/catalog/dependency.h |  51 -
 src/include/commands/event_trigger.h |   2 +-
 src/tools/pgindent/typedefs.list |   1 -
 8 files changed, 233 insertions(+), 641 deletions(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index eadcf6af0df..6e8f6a57051 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -206,7 +206,7 @@ deleteObjectsInList(ObjectAddresses *targetObjects, 
Relation *depRel,
if (extra->flags & DEPFLAG_REVERSE)
normal = true;
 
-   if 
(EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
+   if (EventTriggerSupportsObject(thisobj))
{
EventTriggerSQLDropAddObject(thisobj, original, 
normal);
}
@@ -1349,9 +1349,9 @@ deleteOneObject(const ObjectAddress *object, Relation 
*depRel, int flags)
 static void
 doDeletion(const ObjectAddress *object, int flags)
 {
-   switch (getObjectClass(object))
+   switch (object->classId)
{
-   case OCLASS_CLASS:
+   case RelationRelationId:
{
charrelKind = 
get_rel_relkind(object->objectId);
 
@@ -1382,104 +1382,102 @@ doDeletion(const ObjectAddress *object, int flags)
break;
}
 
-   case OCLASS_PROC:
+   case ProcedureRelationId:
RemoveFunctionById(object->objectId);
break;
 
-   case OCLASS_TYPE:
+   case TypeRelationId:
RemoveTypeById(object->objectId);
break;
 
-   case OCLASS_CONSTRAINT:
+   case ConstraintRelationId:
RemoveConstraintById(object->objectId);
break;
 
-   case OCLASS_DEFAULT:
+   case AttrDefaultRelationId:
RemoveAttrDefaultById(object->objectId);
break;
 
-   case OCLASS_LARGEOBJECT:
+   case LargeObjectRelationId:
LargeObjectDrop(object->objectId);
break;
 
-   case OCLASS_OPERATOR:
+   case OperatorRelationId:
RemoveOperatorById(object->objectId);
break;
 
-   case OCLASS_REWRITE:
+   case RewriteRelationId:
RemoveRewriteRuleById(object->objectId);
break;
 
-   

RE: PostgreSQL 17 Release Management Team & Feature Freeze

2024-03-18 Thread Hayato Kuroda (Fujitsu)
Dear Michael,

> If you think that this is OK, and as far as I can see this looks OK on
> the thread, then this open item should be moved under "resolved before
> 17beta1", mentioning the commit involved in the fix.  Please see [1]
> for examples.

OK, I understood that I could wait checking from you. Thanks.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 





Re: What about Perl autodie?

2024-03-18 Thread Andrew Dunstan
On Mon, Mar 18, 2024 at 2:28 AM Peter Eisentraut 
wrote:

> On 21.02.24 08:26, Peter Eisentraut wrote:
> > On 14.02.24 17:52, Peter Eisentraut wrote:
> >> A gentler way might be to start using some perlcritic policies like
> >> InputOutput::RequireCheckedOpen or the more general
> >> InputOutput::RequireCheckedSyscalls and add explicit error checking at
> >> the sites it points out.
> >
> > Here is a start for that.  I added the required stanza to perlcriticrc
> > and started with an explicit list of functions to check:
> >
> > functions = chmod flock open read rename seek symlink system
> >
> > and fixed all the issues it pointed out.
> >
> > I picked those functions because most existing code already checked
> > those, so the omissions are probably unintended, or in some cases also
> > because I thought it would be important for test correctness (e.g., some
> > tests using chmod).
> >
> > I didn't design any beautiful error messages, mostly just used "or die
> > $!", which mostly matches existing code, and also this is
> > developer-level code, so having the system error plus source code
> > reference should be ok.
> >
> > In the second patch, I changed the perlcriticrc stanza to use an
> > exclusion list instead of an explicit inclusion list.  That way, you can
> > see what we are currently *not* checking.  I'm undecided which way
> > around is better, and exactly what functions we should be checking.  (Of
> > course, in principle, all of them, but since this is test and build
> > support code, not production code, there are probably some reasonable
> > compromises to be made.)
>
> After some pondering, I figured the exclude list is better.  So here is
> a squashed patch, also with a complete commit message.
>
> Btw., do we check perlcritic in an automated way, like on the buildfarm?


Yes. crake and koel do.

cheers

andrew


Re: What about Perl autodie?

2024-03-18 Thread Peter Eisentraut

On 21.02.24 08:26, Peter Eisentraut wrote:

On 14.02.24 17:52, Peter Eisentraut wrote:
A gentler way might be to start using some perlcritic policies like 
InputOutput::RequireCheckedOpen or the more general 
InputOutput::RequireCheckedSyscalls and add explicit error checking at 
the sites it points out.


Here is a start for that.  I added the required stanza to perlcriticrc 
and started with an explicit list of functions to check:


functions = chmod flock open read rename seek symlink system

and fixed all the issues it pointed out.

I picked those functions because most existing code already checked 
those, so the omissions are probably unintended, or in some cases also 
because I thought it would be important for test correctness (e.g., some 
tests using chmod).


I didn't design any beautiful error messages, mostly just used "or die 
$!", which mostly matches existing code, and also this is 
developer-level code, so having the system error plus source code 
reference should be ok.


In the second patch, I changed the perlcriticrc stanza to use an 
exclusion list instead of an explicit inclusion list.  That way, you can 
see what we are currently *not* checking.  I'm undecided which way 
around is better, and exactly what functions we should be checking.  (Of 
course, in principle, all of them, but since this is test and build 
support code, not production code, there are probably some reasonable 
compromises to be made.)


After some pondering, I figured the exclude list is better.  So here is 
a squashed patch, also with a complete commit message.


Btw., do we check perlcritic in an automated way, like on the buildfarm?From c70c6af0e369496bec04d20dbe42d09a233f046f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 17 Mar 2024 16:10:50 +0100
Subject: [PATCH v2] Activate perlcritic InputOutput::RequireCheckedSyscalls
 and fix resulting warnings

This checks that certain I/O-related Perl functions properly check
their return value.  Some parts of the PostgreSQL code had been a bit
sloppy about that.  The new perlcritic warnings are fixed here.  I
didn't design any beautiful error messages, mostly just used "or die
$!", which mostly matches existing code, and also this is
developer-level code, so having the system error plus source code
reference should be ok.

Initially, we only activate this check for a subset of what the
perlcritic check would warn about.  The effective list is

chmod flock open read rename seek symlink system

The initial set of functions is picked because most existing code
already checked the return value of those, so any omissions are
probably unintended, or because it seems important for test
correctness.

The actual perlcritic configuration is written as an exclude list.
That seems better so that we are clear on what we are currently not
checking.  Maybe future patches want to investigate checking some of
the other functions.  (In principle, we might eventually want to check
all of them, but since this is test and build support code, not
production code, there are probably some reasonable compromises to be
made.)

Discussion: 
https://www.postgresql.org/message-id/flat/88b7d4f2-46d9-4cc7-b1f7-613c90f9a76a%40eisentraut.org
---
 .../t/010_pg_archivecleanup.pl  |  2 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl|  8 
 src/bin/pg_ctl/t/001_start_stop.pl  |  2 +-
 src/bin/pg_resetwal/t/002_corrupted.pl  |  2 +-
 src/bin/pg_rewind/t/009_growing_files.pl|  2 +-
 src/bin/pg_rewind/t/RewindTest.pm   |  4 ++--
 src/pl/plperl/text2macro.pl |  4 ++--
 src/test/kerberos/t/001_auth.pl |  2 +-
 .../ssl_passphrase_callback/t/001_testfunc.pl   |  2 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm| 12 ++--
 src/test/perl/PostgreSQL/Test/Utils.pm  | 16 
 src/test/ssl/t/SSL/Server.pm| 10 +-
 src/tools/msvc_gendef.pl|  4 ++--
 src/tools/perlcheck/perlcriticrc|  8 
 src/tools/pgindent/pgindent | 17 +
 15 files changed, 52 insertions(+), 43 deletions(-)

diff --git a/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl 
b/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl
index 792f5677c87..91a98c71e99 100644
--- a/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl
+++ b/src/bin/pg_archivecleanup/t/010_pg_archivecleanup.pl
@@ -36,7 +36,7 @@ sub create_files
 {
foreach my $fn (map { $_->{name} } @_)
{
-   open my $file, '>', "$tempdir/$fn";
+   open my $file, '>', "$tempdir/$fn" or die $!;
 
print $file 'CONTENT';
close $file;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl 
b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index d3c83f26e4b..490a9822f09 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ 

Re: Weird test mixup

2024-03-18 Thread Michael Paquier
On Mon, Mar 18, 2024 at 10:50:25AM +0500, Andrey M. Borodin wrote:
> Maybe consider function injection_points_attach_local(‘point name’)
> instead of static switch?
> Or even injection_points_attach_global(‘point name’), while function
> injection_points_attach(‘point name’) will be global? This would
> favour writing concurrent test by default… 

The point is to limit accidents like the one of this thread.  So, for 
cases already in the tree, not giving the point name in the SQL
function would be simple enough.

What you are suggesting can be simply done, as well, though I'd rather
wait for a reason to justify doing so.
--
Michael


signature.asc
Description: PGP signature


Re: Switching XLog source from archive to streaming when primary available

2024-03-18 Thread Michael Paquier
On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote:
> Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a
> conflict in meson.build. Please see the attached v23 patch.

I've been reading this patch, and this is a very tricky one.  Please
be *very* cautious.

+#streaming_replication_retry_interval = 0# time after which standby
+# attempts to switch WAL source from archive to
+# streaming replication in seconds; 0 disables

This stuff allows a minimal retry interval of 1s.  Could it be useful
to have more responsiveness here and allow lower values than that?
Why not switching the units to be milliseconds?

+if (streaming_replication_retry_interval <= 0 ||
+!StandbyMode ||
+currentSource != XLOG_FROM_ARCHIVE)
+return SWITCH_TO_STREAMING_NONE;

Hmm.  Perhaps this should mention why we don't care about the
consistent point.

+/* See if we can switch WAL source to streaming */
+if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE)
+wal_source_switch_state = SwitchWALSourceToPrimary();

Rather than a routine that returns as result the value to use for the
GUC, I'd suggest to let this routine set the GUC as there is only one
caller of SwitchWALSourceToPrimary().  This can also include a check
on SWITCH_TO_STREAMING_NONE, based on what I'm reading that.

-   if (lastSourceFailed)
+   if (lastSourceFailed ||
+   wal_source_switch_state == SWITCH_TO_STREAMING) 

Hmm.  This one may be tricky.  I'd recommend a separation between the
failure in reading from a source and the switch to a new "forced"
source.

+if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING)
+readFrom = XLOG_FROM_PG_WAL;
+else
+readFrom = currentSource == XLOG_FROM_ARCHIVE ?
+XLOG_FROM_ANY : currentSource;

WALSourceSwitchState looks confusing here, and are you sure that this
is actualy correct?  Shouldn't we still try a READ_FROM_ANY or a read
from the archives even with a streaming pending.

By the way, I am not convinced that what you have is the best
interface ever.  This assumes that we'd always want to switch to
streaming more aggressively.  Could there be a point in also
controlling if we should switch to pg_wal/ or just to archiving more
aggressively as well, aka be able to do the opposite switch of WAL
source?  This design looks somewhat limited to me.  The origin of the
issue is that we don't have a way to control the order of the sources
consumed by WAL replay.  Perhaps something like a replay_source_order
that uses a list would be better, with elements settable to archive,
pg_wal and streaming?
--
Michael


signature.asc
Description: PGP signature


<    1   2