date:20240518

Re: [PATCH] LockAcquireExtended improvement

2024-05-18 Thread Jingxian Li

On 2024/5/18 14:38, Will Mortensen wrote:
> On Tue, Mar 26, 2024 at 7:14 PM Will Mortensen  wrote:
>> This comment on ProcSleep() seems to have the values of dontWait
>> backward (double negatives are tricky):
>>
>> * Result: PROC_WAIT_STATUS_OK if we acquired the lock,
>> PROC_WAIT_STATUS_ERROR
>> * if not (if dontWait = true, this is a deadlock; if dontWait = false, we
>> * would have had to wait).
>>
>> Also there's a minor typo in a comment in LockAcquireExtended():
>>
>> * Check the proclock entry status. If dontWait = true, this is an
>> * expected case; otherwise, it will open happen if something in the
>> * ipc communication doesn't work correctly.
>>
>> "open" should be "only".
>
> Here's a patch fixing those typos.

Nice catch! The patch looks good to me.


--
Jingxian Li

Re: allow sorted builds for btree_gist

2024-05-18 Thread Tomas Vondra




On 5/18/24 08:51, Andrey M. Borodin wrote:
> 
> 
>> On 18 May 2024, at 00:41, Tomas Vondra
>>  wrote:
>> 
>> if the opclass supports sorted builds, because then we could
>> parallelize the sort.
> 
> Hi Tomas!
> 
> Yup, I'd also be glad to see this feature. PostGIS folks are using
> their geometry (sortsupport was developed for this) with object id
> (this disables sort build).
> 
> It was committed once [0], but then reverted, vardata opclasses were
> implemented wrong. Now it's on CF[1], Bernd is actively responding in
> the thread, but currently patch lacks tests.
> 
> Thanks for raising this!
> 

Oh, damn! I didn't notice the CF already has a patch doing this, and
that it was committed/reverted in 2021. I was completely oblivious to
that. Apologies.

Let's continue working on that patch/thread, I'll take a look in the
next CF.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_combinebackup does not detect missing files

2024-05-18 Thread Tomas Vondra




On 5/17/24 14:20, Robert Haas wrote:
> On Fri, May 17, 2024 at 1:18 AM David Steele  wrote:
>> However, I think allowing the user to optionally validate the input
>> would be a good feature. Running pg_verifybackup as a separate step is
>> going to be a more expensive then verifying/copying at the same time.
>> Even with storage tricks to copy ranges of data, pg_combinebackup is
>> going to aware of files that do not need to be verified for the current
>> operation, e.g. old copies of free space maps.
> 
> In cases where pg_combinebackup reuses a checksums from the input
> manifest rather than recomputing it, this could accomplish something.
> However, for any file that's actually reconstructed, pg_combinebackup
> computes the checksum as it's writing the output file. I don't see how
> it's sensible to then turn around and verify that the checksum that we
> just computed is the same one that we now get. It makes sense to run
> pg_verifybackup on the output of pg_combinebackup at a later time,
> because that can catch bits that have been flipped on disk in the
> meanwhile. But running the equivalent of pg_verifybackup during
> pg_combinebackup would amount to doing the exact same checksum
> calculation twice and checking that it gets the same answer both
> times.
> 
>> One more thing occurs to me -- if data checksums are enabled then a
>> rough and ready output verification would be to test the checksums
>> during combine. Data checksums aren't very good but something should be
>> triggered if a bunch of pages go wrong, especially since the block
>> offset is part of the checksum. This would be helpful for catching
>> combine bugs.
> 
> I don't know, I'm not very enthused about this. I bet pg_combinebackup
> has some bugs, and it's possible that one of them could involve
> putting blocks in the wrong places, but it doesn't seem especially
> likely. Even if it happens, it's more likely to be that
> pg_combinebackup thinks it's putting them in the right places but is
> actually writing them to the wrong offset in the file, in which case a
> block-checksum calculation inside pg_combinebackup is going to think
> everything's fine, but a standalone tool that isn't confused will be
> able to spot the damage.
> 

Perhaps more importantly, can you even verify data checksums before the
recovery is completed? I don't think you can (pg_checksums certainly
does not allow doing that). Because who knows in what shape you copied
the block?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] LockAcquireExtended improvement

2024-05-18 Thread Michael Paquier

On Fri, May 17, 2024 at 11:38:35PM -0700, Will Mortensen wrote:
> On Tue, Mar 26, 2024 at 7:14 PM Will Mortensen  wrote:
>> This comment on ProcSleep() seems to have the values of dontWait
>> backward (double negatives are tricky):
>>
>> * Result: PROC_WAIT_STATUS_OK if we acquired the lock,
>> PROC_WAIT_STATUS_ERROR
>> * if not (if dontWait = true, this is a deadlock; if dontWait = false, we
>> * would have had to wait).
>>
>> Also there's a minor typo in a comment in LockAcquireExtended():
>>
>> * Check the proclock entry status. If dontWait = true, this is an
>> * expected case; otherwise, it will open happen if something in the
>> * ipc communication doesn't work correctly.
>>
>> "open" should be "only".
> 
> Here's a patch fixing those typos.

Perhaps, this, err..  Should not have been named "dontWait" but
"doWait" ;)

Anyway, this goes way back in time and it is deep in the stack
(LockAcquireExtended, etc.) so it is too late to change: the patch
should be OK as it is.
--
Michael


signature.asc
Description: PGP signature

Re: allow sorted builds for btree_gist

2024-05-18 Thread Andrey M. Borodin




> On 18 May 2024, at 15:22, Tomas Vondra  wrote:
> 
> Let's continue working on that patch/thread, I'll take a look in the
> next CF.

Cool! I'd be happy to review the patch before CF when Bernd or Christoph will 
address current issues.


Best regards, Andrey Borodin.

Re: allow sorted builds for btree_gist

2024-05-18 Thread Tomas Vondra

On 5/18/24 02:00, Paul A Jungwirth wrote:
> On Fri, May 17, 2024 at 12:41 PM Tomas Vondra
>  wrote:
>> I've been looking at GiST to see if there could be a good way to do
>> parallel builds - and there might be, if the opclass supports sorted
>> builds, because then we could parallelize the sort.
>>
>> But then I noticed we support this mode only for point_ops, because
>> that's the only opclass with sortsupport procedure. Which mostly makes
>> sense, because types like geometries, polygons, ... do not have a well
>> defined order.
> 
> Oh, I'm excited about this for temporal tables. It seems to me that
> ranges and multiranges should have a well-defined order (assuming
> their base types do), since you can do dictionary-style ordering
> (compare the first value, then the next, then the next, etc.). Is
> there any reason not to support those? No reason not to commit these
> btree_gist functions first though. If you aren't interested in adding
> GiST sortsupport for ranges & multiranges I'm willing to do it myself;
> just let me know.
> 

I believe that's pretty much what the existing patch [1] linked by
Andrey (and apparently running for a number of CFs) does.

[1] https://commitfest.postgresql.org/48/3686/

> Do note that the 1.7 -> 1.8 changes have been reverted though (as part
> of my temporal work), and it looks like your patch is a bit messed up
> from that. You'll want to take 1.8 for yourself, and also your 1.9
> upgrade script is trying to add the reverted stratnum functions.
> 

Yeah, I happened to branch from a slightly older master, not noticing
this is affected by the revert.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #18348: Inconsistency with EXTRACT([field] from INTERVAL);

2024-05-18 Thread Martijn Wallet

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Hi, works out well everything. This is my first review, so if I should add more 
content here let me know.
Cheers, Martijn.

The new status of this patch is: Ready for Committer

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Andrew Dunstan




On 2024-05-17 Fr 02:34, Peter Eisentraut wrote:

On 17.05.24 08:09, Yasir wrote:
I have been playing with PG on the Windows platform recently. An 
annoying thing I faced is that a lot of Visual Studio's temp files 
kept appearing in git changed files. Therefore, I am submitting this 
very trivial patch to ignore these temp files.


Our general recommendation is that you put such things into your 
personal global git ignore file.


For example, I have in ~/.gitconfig

[core]
    excludesFile = ~/.gitexcludes

and then in ~/.gitexcludes I have various ignores that are specific to 
my local tooling.


That way we don't have to maintain ignore lists for all the tools in 
the world in the PostgreSQL source tree.






or if you want something repo-specific, you can add entries to 
.git/info/exclude



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Josef Šimánek

pá 17. 5. 2024 v 8:09 odesílatel Yasir  napsal:
>
> Hi Hackers,
>
> I have been playing with PG on the Windows platform recently. An annoying 
> thing I faced is that a lot of Visual Studio's temp files kept appearing in 
> git changed files. Therefore, I am submitting this very trivial patch to 
> ignore these temp files.

see 
https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
for various strategies

Anyway if those are not files specific to your setup (like editor
ones), but files which every PG hacker on Windows will generate as
well (which is this case IMHO), it will make sense to add it into
project's gitignore.

> Looking forward to the PG guru's guidance!
>
> Regards...
>
> Yasir Hussain
> Principal Software Engineer
> Bitnine Global Inc.
>

Re: First draft of PG 17 release notes

2024-05-18 Thread Bruce Momjian

On Thu, May 16, 2024 at 03:35:17PM +1200, David Rowley wrote:
> On Thu, 16 May 2024 at 14:48, Bruce Momjian  wrote:
> >
> > On Wed, May 15, 2024 at 09:13:14AM -0400, Melanie Plageman wrote:
> > > Also +1 on the Sawada/Naylor change being on the highlight section of
> > > the release (as David suggested upthread).
> >
> > Agreed, I went with the attached applied patch.
> 
> +Allow vacuum to more efficiently store tuple references and remove
> its memory limit (Masahiko Sawada, John Naylor)
> +
> 
> I don't want it to seem like I'm splitting hairs, but I'd drop the "
> and remove its memory limit"
> 
> +
> +Specifically, maintenance_work_mem and autovacuum_work_mem can now be
> configured to use more than one gigabyte of memory.  WAL traffic
> caused by vacuum is also more compact.
> 
> I'd say the first sentence above should be written as:
> 
> "Additionally, vacuum no longer silently imposes a 1GB tuple reference
> limit even when maintenance_work_mem or autovacuum_work_mem are set to
> higher values"
> 
> It's not "Specifically" as the "more efficiently store tuple
> references" isn't the same thing as removing the 1GB cap. Also, there
> was never a restriction in configuring maintenance_work_mem or
> autovacuum_work_mem  to values higher than 1GB. The restriction was
> that vacuum was unable to utilize anything more than that.

Slightly adjusted wording patch attached and applied.

My deep apologies for the delay in addressing this.  I should have done
it sooner.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index ed1af17bb0a..d7fd87f3d5c 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -530,11 +530,12 @@ Author: Heikki Linnakangas 
 
 
 
-Allow vacuum to more efficiently store tuple references and remove its memory limit (Masahiko Sawada, John Naylor)
+Allow vacuum to more efficiently store tuple references (Masahiko Sawada, John Naylor)
 
 
 
-Specifically, maintenance_work_mem and autovacuum_work_mem can now be configured to use more than one gigabyte of memory.  WAL traffic caused by vacuum is also more compact.
+Additionally, vacuum is no longer silently limited to one gigabyte of memory when maintenance_work_mem and autovacuum_work_mem are higher. WAL traffic caused by
+vacuum is also more compact.

Re: BUG #18348: Inconsistency with EXTRACT([field] from INTERVAL);

2024-05-18 Thread Martijn Wallet

For some reason the review indicated "failed".
It should of course read:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Re: First draft of PG 17 release notes

2024-05-18 Thread Bruce Momjian

On Wed, May 15, 2024 at 08:48:02PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-05-15 10:38:20 +0200, Alvaro Herrera wrote:
> > I disagree with this.  IMO the impact of the Sawada/Naylor change is
> > likely to be enormous for people with large tables and large numbers of
> > tuples to clean up (I know we've had a number of customers in this
> > situation, I can't imagine any Postgres service provider that doesn't).
> > The fact that maintenance_work_mem is no longer capped at 1GB is very
> > important and I think we should mention that explicitly in the release
> > notes, as setting it higher could make a big difference in vacuum run
> > times.
> 
> +many.
> 
> We're having this debate every release. I think the ongoing reticence to note
> performance improvements in the release notes is hurting Postgres.
> 
> For one, performance improvements are one of the prime reason users
> upgrade. Without them being noted anywhere more dense than the commit log,
> it's very hard to figure out what improved for users. A halfway widely
> applicable performance improvement is far more impactful than many of the
> feature changes we do list in the release notes.

I agree the impact of performance improvements are often greater than
the average release note item.  However, if people expect Postgres to be
faster, is it important for them to know _why_ it is faster?

If we add a new flag to a command, people will want to know about it so
they can make use of it, or if there is a performance improvement that
allows new workloads, they will want to know about that too so they can
consider those workloads.

On the flip side, a performance improvement that makes everything 10%
faster has little behavioral change for users, and in fact I think we
get ~6% faster in every major release.

> For another, it's also very frustrating for developers that focus on
> performance. The reticence to note their work, while noting other, far
> smaller, things in the release notes, pretty much tells us that our work isn't
> valued.

Yes, but are we willing to add text that every user will have to read
just for this purpose?

One think we _could_ do is to create a generic performance release note
item saying performance has been improved in the following areas, with
no more details, but we can list the authors on the item.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: First draft of PG 17 release notes

2024-05-18 Thread Bruce Momjian

On Thu, May 16, 2024 at 09:09:11AM -0400, Melanie Plageman wrote:
> On Wed, May 15, 2024 at 11:48 PM Andres Freund  wrote:
> >
> > On 2024-05-15 10:38:20 +0200, Alvaro Herrera wrote:
> > > I disagree with this.  IMO the impact of the Sawada/Naylor change is
> > > likely to be enormous for people with large tables and large numbers of
> > > tuples to clean up (I know we've had a number of customers in this
> > > situation, I can't imagine any Postgres service provider that doesn't).
> > > The fact that maintenance_work_mem is no longer capped at 1GB is very
> > > important and I think we should mention that explicitly in the release
> > > notes, as setting it higher could make a big difference in vacuum run
> > > times.
> >
> > +many.
> >
> > We're having this debate every release. I think the ongoing reticence to 
> > note
> > performance improvements in the release notes is hurting Postgres.
> >
> > For one, performance improvements are one of the prime reason users
> > upgrade. Without them being noted anywhere more dense than the commit log,
> > it's very hard to figure out what improved for users. A halfway widely
> > applicable performance improvement is far more impactful than many of the
> > feature changes we do list in the release notes.
> 
> The practical reason this matters to users is that they want to change
> their configuration or expectations in response to performance
> improvements.
> 
> And also, as Jelte mentions upthread, describing performance
> improvements made each release in Postgres makes it clear that we are
> consistently improving it.
> 
> > For another, it's also very frustrating for developers that focus on
> > performance. The reticence to note their work, while noting other, far
> > smaller, things in the release notes, pretty much tells us that our work 
> > isn't
> > valued.
> 
> +many

Please see the email I just posted.  There are three goals we have to
adjust for:

1.  short release notes so they are readable
2.  giving people credit for performance improvements
3.  showing people Postgres cares about performance

I would like to achieve 2 & 3 without harming #1.  My experience is if I
am reading a long document, and I get to a section where I start to
wonder, "Why should I care about this?", I start to skim the rest of
the document.  I am particularly critical if I start to wonder, "Why
does the author _think_ I should care about this?" becasue it feels like
the author is writing for him/herself and not the audience.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: First draft of PG 17 release notes

2024-05-18 Thread Bruce Momjian

On Thu, May 16, 2024 at 04:29:38PM +0800, jian he wrote:
> On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
> >
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html
> >
> 
> >> Add jsonpath methods to convert JSON values to different data types 
> >> (Jeevan Chalke)
> >> The jsonpath methods are .bigint(), .boolean(), .date(), 
> >> .decimal([precision [, scale]]), .integer(), .number(), .string(), 
> >> .time(), .time_tz(), .timestamp(), and .timestamp_tz().
> 
> I think it's slightly incorrect.
> 
> for example:
> select jsonb_path_query('"2023-08-15"', '$.date()');
> I think it does is trying to validate json value "2023-08-15" can be a
> date value, if so, print json string out, else error out.
> 
> 
> "convert JSON values to different data types"
> meaning that we are converting json values to another data type, date?

I see your point.  I have reworded it to be:

Add jsonpath methods to convert JSON values to other JSON data
types (Jeevan Chalke)

Does that help?  I think your example is causing confusion because once
JSON values enter the SQL data type space, they are strings.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Speed up clean meson builds by ~25%

2024-05-18 Thread Alvaro Herrera

On 2024-May-17, Robert Haas wrote:

> Anyone else want to vote?

I had pretty much the same thought as you.  It seems a waste to leave
the code in existing branches be slow only because we have a much better
approach for a branch that doesn't even exist yet.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"We're here to devour each other alive"(Hobbes)

Re: Schema variables - new implementation for Postgres 15

2024-05-18 Thread Alvaro Herrera

On 2024-Jan-30, Dmitry Dolgov wrote:

> Yep, in this constellation the implementation holds much better (in
> terms of memory) in my create/let/drop testing.
> 
> I've marked the CF item as ready for committer, but a note for anyone
> who would like to pick up it from here -- we're talking about first 5
> patches here, up to the memory cleaning after DROP VARIABLE. It doesn't
> mean the rest is somehow not worth it, but I believe it's a good first
> step.

Hmm, I think patch 16 is essential, because the point of variable shadowing
is a critical aspect of how the whole thing works.  So I would say that
a first step would be those first five patches plus 16.

I want to note that when we discussed this patch series at the dev
meeting in FOSDEM, a sort-of conclusion was reached that we didn't want
schema variables at all because of the fact that creating a variable
would potentially change the meaning of queries by shadowing table
columns.  But this turns out to be incorrect: it's _variables_ that are
shadowed by table columns, not the other way around.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)

Re: First draft of PG 17 release notes

2024-05-18 Thread Bruce Momjian

On Thu, May 16, 2024 at 11:50:20AM -0400, Andrew Dunstan wrote:
> > Maybe "Introduce an incremental JSON parser" would have been a better
> > > headline.
> > Well, this gets into a level of detail that is beyond the average
> > reader.  I think at that level people will need to read the git logs or
> > review the code.  Do we use it for anything yet?
> 
> 
> Yes, certainly, it's used in handling backup manifests. Without it we can't
> handle huge manifests. See commits ea7b4e9a2a and 222e11a10a.
> 
> Other uses are in the works.

Okay, added in the attached applied patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 428cb5c5a2e..9c511848943 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -2451,6 +2451,17 @@ User-defined data type receive functions will no longer receive their data null-
 
 
 
+
+
+
+
+Add incremental JSON parser for use with huge JSON documents (Andrew Dunstan)
+
+
+

Re: Speed up clean meson builds by ~25%

2024-05-18 Thread Tom Lane

Alvaro Herrera  writes:
> On 2024-May-17, Robert Haas wrote:
>> Anyone else want to vote?

> I had pretty much the same thought as you.  It seems a waste to leave
> the code in existing branches be slow only because we have a much better
> approach for a branch that doesn't even exist yet.

I won't complain too loudly as long as we remember to revert the
patch from HEAD once the ecpg fix goes in.

regards, tom lane

Re: allow sorted builds for btree_gist

2024-05-18 Thread Michał Kłeczek




> On 17 May 2024, at 21:41, Tomas Vondra  wrote:
> 
> Hi,
> 
> I've been looking at GiST to see if there could be a good way to do
> parallel builds - and there might be, if the opclass supports sorted
> builds, because then we could parallelize the sort.
> 
> But then I noticed we support this mode only for point_ops, because
> that's the only opclass with sortsupport procedure. Which mostly makes
> sense, because types like geometries, polygons, ... do not have a well
> defined order.
> 
> Still, we have btree_gist, and I don't think there's much reason to not
> support sorted builds for those opclasses, where the gist opclass is
> defined on top of btree ordering semantics.
> 


I wonder if it was possible to add sort support to pg_trgm. Speeding up index 
build for multicolumn indexes supporting text search would be great.

—
Michal

Sort functions with specialized comparators

2024-05-18 Thread Andrey M. Borodin

Hi!

In a thread about sorting comparators[0] Andres noted that we have 
infrastructure to help compiler optimize sorting. PFA attached PoC 
implementation. I've checked that it indeed works on the benchmark from that 
thread.

postgres=# CREATE TABLE arrays_to_sort AS
   SELECT array_shuffle(a) arr
   FROM
   (SELECT ARRAY(SELECT generate_series(1, 100)) a),
   generate_series(1, 10);

postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
Time: 990.199 ms
postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
Time: 696.156 ms

The benefit seems to be on the order of magnitude with 30% speedup.

There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber, Oid etc. 
But this sorting routines never show up in perf top or something like that.

Seems like in most cases we do not spend much time in sorting. But 
specialization does not cost us much too, only some CPU cycles of a compiler. I 
think we can further improve speedup by converting inline comparator to value 
extractor: more compilers will see what is actually going on. But I have no 
proofs for this reasoning.

What do you think?


Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/20240209184014.sobshkcsfjix6u4r%40awork3.anarazel.de#fc23df2cf314bef35095b632380b4a59


v1-0001-Use-specialized-sort-facilities.patch
Description: Binary data

Re: Streaming read-ready sequential scan code

2024-05-18 Thread Alexander Lakhin


Hello Thomas,

18.05.2024 07:47, Thomas Munro wrote:

After more debugging, we learned a lot more things...

1.  That query produces spectacularly bad estimates, so we finish up
having to increase the number of buckets in a parallel hash join many
times.  That is quite interesting, but unrelated to new code.
2.  Parallel hash join is quite slow at negotiating an increase in the
number of hash bucket, if all of the input tuples are being filtered
out by quals, because of the choice of where workers check for
PHJ_GROWTH_NEED_MORE_BUCKETS.  That could be improved quite easily I
think.  I have put that on my todo list 'cause that's also my code,
but it's not a new issue it's just one that is now highlighted...
3.  This bit of read_stream.c is exacerbating unfairness in the
underlying scan, so that 1 and 2 come together and produce a nasty
slowdown, which goes away if you change it like so:

-   BlockNumber blocknums[16];
+   BlockNumber blocknums[1];

I will follow up after some more study.


Thank you for the information!

Unfortunately, I can't see significant differences in my environment with
parallel_leader_participation=off.

With blocknums[1], timing is changed, but the effect is not persistent.
10 query15 executions in a row, b7b0f3f27:
277.932 ms
281.805 ms
278.335 ms
281.565 ms
284.167 ms
283.171 ms
281.165 ms
281.615 ms
285.394 ms
277.301 ms

b7b0f3f27~1:
159.789 ms
165.407 ms
160.893 ms
159.343 ms
160.936 ms
161.577 ms
161.637 ms
163.421 ms
163.143 ms
167.109 ms

b7b0f3f27 + blocknums[1]:
164.133 ms
280.920 ms
160.748 ms
163.182 ms
161.709 ms
161.998 ms
161.239 ms
276.256 ms
161.601 ms
160.384 ms

I placed PGDATA on tmpfs to rule out any blockdev specifics (increasing
blockdev ra from 256 to 4096 didn't help me with PGDATA on NVME either.)

Best regards,
Alexander

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Yasir

On Sat, May 18, 2024 at 7:27 PM Josef Šimánek 
wrote:

> pá 17. 5. 2024 v 8:09 odesílatel Yasir 
> napsal:
> >
> > Hi Hackers,
> >
> > I have been playing with PG on the Windows platform recently. An
> annoying thing I faced is that a lot of Visual Studio's temp files kept
> appearing in git changed files. Therefore, I am submitting this very
> trivial patch to ignore these temp files.
>
> see
> https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
> for various strategies
>
> Anyway if those are not files specific to your setup (like editor
> ones), but files which every PG hacker on Windows will generate as
> well (which is this case IMHO), it will make sense to add it into
> project's gitignore.
>

.vs directory and temp files within it are created once you open any of the
.sln, .vcproj or .vcxproj files (created with build command when PWD is
postgres/src/tools/msvc) in visual studio. It's a common practice that
developers use visual studio on codebase as it's mostly the default c/c++
files/projects editor.
So, it would be a common case for most of the developers with Windows
platform to add it in project's .gitignore.

> > Looking forward to the PG guru's guidance!
> >
> > Regards...
> >
> > Yasir Hussain
> > Principal Software Engineer
> > Bitnine Global Inc.
> >
>

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Yasir

On Sat, May 18, 2024 at 7:27 PM Josef Šimánek 
wrote:

> pá 17. 5. 2024 v 8:09 odesílatel Yasir 
> napsal:
> >
> > Hi Hackers,
> >
> > I have been playing with PG on the Windows platform recently. An
> annoying thing I faced is that a lot of Visual Studio's temp files kept
> appearing in git changed files. Therefore, I am submitting this very
> trivial patch to ignore these temp files.
>
> see
> https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
> for various strategies
>
>
We can add it to "~/.config/git/ignore" as it will ignore globally on
windows which we don't want. Also we don't have ".git/info/exclude" in PG
project's so the best place left is projects's .gitignore. That's what was
patched.


> Anyway if those are not files specific to your setup (like editor
> ones), but files which every PG hacker on Windows will generate as
> well (which is this case IMHO), it will make sense to add it into
> project's gitignore.
>
> > Looking forward to the PG guru's guidance!
> >
> > Regards...
> >
> > Yasir Hussain
> > Principal Software Engineer
> > Bitnine Global Inc.
> >
>

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Tom Lane

Yasir  writes:
> We can add it to "~/.config/git/ignore" as it will ignore globally on
> windows which we don't want. Also we don't have ".git/info/exclude" in PG
> project's so the best place left is projects's .gitignore. That's what was
> patched.

As Peter said, we're not going to do that.  The intention with
the project's .gitignore files is to ignore files that are
intentionally built by our "make" targets (and, hopefully, will be
removed by "make maintainer-clean").  Anything else that you want
git to ignore should be in a personal ignore list; especially
files that are platform-specific.  The fact that it's reasonable
to ignore ".vs" files when working with your toolset doesn't mean
that it's reasonable to ignore them when working on some other
platform.

If we used some other policy, we'd have tons of debates about
which files were reasonable to exclude.  For myself, for example,
I exclude "*~" (Emacs backup files) and "*.orig" (patch(1)
backup files) but those choices are very much dependent on the
set of tools I choose to use.  Other developers have other
personal exclusion lists.  If we tried to make the project's
files be the union of all those lists, we'd be at serious risk
of ignoring stuff we absolutely shouldn't ignore in some contexts.

regards, tom lane

Re: First draft of PG 17 release notes

2024-05-18 Thread Andrew Dunstan




On 2024-05-18 Sa 12:50, Bruce Momjian wrote:

On Thu, May 16, 2024 at 11:50:20AM -0400, Andrew Dunstan wrote:

Maybe "Introduce an incremental JSON parser" would have been a better

headline.

Well, this gets into a level of detail that is beyond the average
reader.  I think at that level people will need to read the git logs or
review the code.  Do we use it for anything yet?


Yes, certainly, it's used in handling backup manifests. Without it we can't
handle huge manifests. See commits ea7b4e9a2a and 222e11a10a.

Other uses are in the works.

Okay, added in the attached applied patch.



Thanks


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Josef Šimánek

so 18. 5. 2024 v 22:36 odesílatel Tom Lane  napsal:
>
> Yasir  writes:
> > We can add it to "~/.config/git/ignore" as it will ignore globally on
> > windows which we don't want. Also we don't have ".git/info/exclude" in PG
> > project's so the best place left is projects's .gitignore. That's what was
> > patched.
>
> As Peter said, we're not going to do that.  The intention with
> the project's .gitignore files is to ignore files that are
> intentionally built by our "make" targets (and, hopefully, will be
> removed by "make maintainer-clean").  Anything else that you want
> git to ignore should be in a personal ignore list; especially
> files that are platform-specific.  The fact that it's reasonable
> to ignore ".vs" files when working with your toolset doesn't mean
> that it's reasonable to ignore them when working on some other
> platform.
>
> If we used some other policy, we'd have tons of debates about
> which files were reasonable to exclude.  For myself, for example,
> I exclude "*~" (Emacs backup files) and "*.orig" (patch(1)
> backup files) but those choices are very much dependent on the
> set of tools I choose to use.  Other developers have other
> personal exclusion lists.  If we tried to make the project's
> files be the union of all those lists, we'd be at serious risk
> of ignoring stuff we absolutely shouldn't ignore in some contexts.

But this is different. If I understand it well, just by following
https://www.postgresql.org/docs/16/install-windows-full.html you'll
get those files no matter what is your specific environment (or
specific set of tools).

> regards, tom lane

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Andrew Dunstan

On 2024-05-18 Sa 15:43, Yasir wrote:

On Sat, May 18, 2024 at 7:27 PM Josef Šimánek 
 wrote:

pá 17. 5. 2024 v 8:09 odesílatel Yasir
 napsal:
>
> Hi Hackers,
>
> I have been playing with PG on the Windows platform recently. An
annoying thing I faced is that a lot of Visual Studio's temp files
kept appearing in git changed files. Therefore, I am submitting
this very trivial patch to ignore these temp files.

see

https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
for various strategies

We can add it to "~/.config/git/ignore" as it will ignore globally on 
windows which we don't want. Also we don't have ".git/info/exclude" in 
PG project's so the best place left is projects's .gitignore. That's 
what was patched.

eh? git creates .git/info/exclude in every git repository AFAIK. And 
it's referred to here: 

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Yasir

On Sun, May 19, 2024 at 1:45 AM Andrew Dunstan  wrote:

>
> On 2024-05-18 Sa 15:43, Yasir wrote:
>
>
>
> On Sat, May 18, 2024 at 7:27 PM Josef Šimánek 
> wrote:
>
>> pá 17. 5. 2024 v 8:09 odesílatel Yasir 
>> napsal:
>> >
>> > Hi Hackers,
>> >
>> > I have been playing with PG on the Windows platform recently. An
>> annoying thing I faced is that a lot of Visual Studio's temp files kept
>> appearing in git changed files. Therefore, I am submitting this very
>> trivial patch to ignore these temp files.
>>
>> see
>> https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
>> for various strategies
>>
>>
> We can add it to "~/.config/git/ignore" as it will ignore globally on
> windows which we don't want. Also we don't have ".git/info/exclude" in PG
> project's so the best place left is projects's .gitignore. That's what was
> patched.
>
>
>
>
> eh? git creates .git/info/exclude in every git repository AFAIK. And it's
> referred to here: 
> 
>
>
> Yes, git creates .git/info/exclude but point is, it is not in PG
maintained codebase repo. So, no point adding to it.

BTW, Tom and Peter said it's not going to be added anyway!


> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>

Re: Multiple startup messages over the same connection

2024-05-18 Thread Vladimir Churyukin

On Mon, Jan 22, 2024 at 11:43 PM Heikki Linnakangas  wrote:

> On 22/01/2024 21:58, Vladimir Churyukin wrote:
> > A question about protocol design - would it be possible to extend the
> > protocol, so it can handle multiple startup / authentication messages
> > over a single connection? Are there any serious obstacles? (possible
> > issues with re-initialization of backends, I guess?)
> > If that is possible, it could improve one important edge case - where
> > you have to talk to multiple databases on a single host currently, you
> > need to open a separate connection to each of them. In some cases
> > (multitenancy for example), you may have thousands of databases on a
> > host, which leads to inefficient connection utilization on clients (on
> > the db side too). A lot of other RDBMSes  don't have this limitation.
>
> The protocol and the startup message are the least of your problems.
> Yeah, it would be nice if you could switch between databases, but the
> assumption that one backend operates on one database is pretty deeply
> ingrained in the code.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>
Sorry to revive this old thread, just want to check on one thing:
Let's say we keep one database per backend rule, I understand at this point
it would be really hard to change.
What if on a new startup message we just signal the postmaster about it, so
it takes over the socket and spawns a new backend.
After that we terminate the old one. How does it sound like in terms of
implementation complexity?
I guess the process of passing control from child processes to the parent
could be a bit tricky for that one, but doable?
Is there anything I'm missing that can be a no-go for this?
The end goal is to minimize a large overhead for clients having to deal
with a large number of connections on multi-tenant systems (say, one client
deals with thousands of databases on the same database server).

-Vladimir Churyukin

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Tom Lane

=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> But this is different. If I understand it well, just by following
> https://www.postgresql.org/docs/16/install-windows-full.html you'll
> get those files no matter what is your specific environment (or
> specific set of tools).

Hm?  Visual Studio seems like quite a specific tool from here.

I did some googling around the question of project .gitignore
files ignoring .vs/, and was amused to come across this:

https://github.com/github/gitignore/blob/main/VisualStudio.gitignore

which seems like a mighty fine example of where we *don't*
want to go.

regards, tom lane

Re: date_trunc function in interval version

2024-05-18 Thread Przemysław Sztoch


Robert Haas wrote on 5/15/2024 9:29 PM:

On Mon, Mar 4, 2024 at 5:03 AM Przemysław Sztoch  wrote:

Apparently the functionality is identical to date_bin.
When I saw date_bin in the documentation, I thought it solved all my problems.
Unfortunately, DST problems have many corner cases.
I tried to change date_bin several times, but unfortunately in some cases it 
would start working differently than before.

So, first of all, thanks for taking an interest and sending a patch.

In order for the patch to have a chance of being accepted, we would
need to have a clear understanding of exactly how this patch is
different from the existing date_bin(). If we knew that, we could
decide either that (a) date_bin does the right thing and your patch
does the wrong thing and therefore we should reject your patch, or we
could decide that (b) date_bin does the wrong thing and therefore we
should fix it, or we could decide that (c) both date_bin and what this
patch does are correct, in the sense of being sensible things to do,
and there is a reason to have both. But if we don't really understand
how they are different, which seems to be the case right now, then we
can't make any decisions. And what that means in practice is that
nobody is going to be willing to commit anything, and we're just going
to go around in circles.

Typically, this kind of research is the responsibility of the patch
author: you're the one who wants something changed, so that means you
need to provide convincing evidence that it should be. If someone else
volunteers to do it, that's also cool, but it absolutely has to be
done in order for there to be a chance of progress here. No committer
is going to say "well, we already have date_bin, but Przemysław says
his date_trunc is different somehow, so let's have both without
understanding how exactly they're different." That's just not a
realistic scenario. Just to name one problem, how would we document
each of them? Users would expect the documentation to explain how two
closely-related functions differ, but we will be unable to explain
that if we don't know the answer ourselves.

If you can't figure out exactly what the differences are by code
inspection, then maybe one thing you could do to help unblock things
here is provide some very clear examples of when they deliver the same
results and when they deliver different results. Although there are no
guarantees, that might lead somebody else to jump in and suggest an
explanation, or further avenues of analysis, or some other helpful
comment.

Personally, what I suspect is that there's already a way to do what
you want using date_bin(), maybe in conjunction with some casting or
some calls to other functions that we already have. But it's hard to
be sure because we just don't have the details. "DST problems have
many corner cases" and "in some cases [date_bin] would start working
differently than before" may be true statements as far as they go, but
they're not very specific complaints. If you can describe *exactly*
how date_bin fails to meet your expectations, there is a much better
chance that something useful will happen here.


I would also like to thank Robert for presenting the matter in detail.

My function date_trunc ( interval, timestamp, ...) is similar to 
original function date_trunc ( text, timestamp ...) .


My extension only gives more granularity.
We don't have a jump from hour to day. We can use 6h and 12h. It's the 
same with minutes.

We can round to 30 minutes, 20minutes, 15 minutes, etc.

Using date_bin has a similar effect, but requires specifying the origin. 
According to this origin,
subsequent buckets are then calculated. The need to provide this origin 
is sometimes a very big problem.
Especially since you cannot use one origin when changing from summer to 
winter time.


If we use one origin for example begin of year: 2024-01-01 00:00:00 then:
# SET timezone='Europe/Warsaw';
# SELECT date_bin('1 day', '2024-03-05 11:22:33', '2024-01-01 
00:00:00'), date_trunc('day', '2024-03-05 11:22:33'::timestamptz);
2024-03-05 00:00:00+01 2024-03-05 00:00:00+01    date_bin works ok, 
because we are before DST
# SELECT date_bin('1 day', '2024-05-05 11:22:33', '2024-01-01 
00:00:00'), date_trunc('day', '2024-05-05 11:22:33'::timestamptz);
2024-05-05 01:00:00+02 2024-05-05 00:00:00+02    date_bin has problem, 
because we are in May after DST


If anyone has an idea how to make date_bin work like date_trunc, please 
provide an example.


--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Josef Šimánek

so 18. 5. 2024 v 23:16 odesílatel Tom Lane  napsal:
>
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > But this is different. If I understand it well, just by following
> > https://www.postgresql.org/docs/16/install-windows-full.html you'll
> > get those files no matter what is your specific environment (or
> > specific set of tools).
>
> Hm?  Visual Studio seems like quite a specific tool from here.

I initially thought the .vs folder is created just by compiling
PostgreSQL using build.bat (like without opening Visual Studio at
all). But I'm not 100% sure, I'll take a look and report back.

> I did some googling around the question of project .gitignore
> files ignoring .vs/, and was amused to come across this:
>
> https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
>
> which seems like a mighty fine example of where we *don't*
> want to go.

That's clearly a nightmare to maintain. But in this case it should be
all hidden within one .vs folder.

> regards, tom lane

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Yasir

On Sun, May 19, 2024 at 2:16 AM Tom Lane  wrote:

> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > But this is different. If I understand it well, just by following
> > https://www.postgresql.org/docs/16/install-windows-full.html you'll
> > get those files no matter what is your specific environment (or
> > specific set of tools).
>
> Hm?  Visual Studio seems like quite a specific tool from here.
>
> I did some googling around the question of project .gitignore
> files ignoring .vs/, and was amused to come across this:
>
> https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
>
>
This is funny Tom. Adding an entry for each type of temp file in .gitignore
is a childish thing, obviously.

which seems like a mighty fine example of where we *don't*
> want to go.
>

I agree we don't want to go in this direction.


>
> regards, tom lane
>

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Yasir

On Sun, May 19, 2024 at 2:23 AM Josef Šimánek 
wrote:

> so 18. 5. 2024 v 23:16 odesílatel Tom Lane  napsal:
> >
> > =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > > But this is different. If I understand it well, just by following
> > > https://www.postgresql.org/docs/16/install-windows-full.html you'll
> > > get those files no matter what is your specific environment (or
> > > specific set of tools).
> >
> > Hm?  Visual Studio seems like quite a specific tool from here.
>
> I initially thought the .vs folder is created just by compiling
> PostgreSQL using build.bat (like without opening Visual Studio at
> all). But I'm not 100% sure, I'll take a look and report back.
>

.vs folder is not created just by compiling PG. It is created if you open
any of .sln, .vcproj or .vcxproj files.
I have verified it.


>
> > I did some googling around the question of project .gitignore
> > files ignoring .vs/, and was amused to come across this:
> >
> > https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
> >
> > which seems like a mighty fine example of where we *don't*
> > want to go.
>
> That's clearly a nightmare to maintain. But in this case it should be
> all hidden within one .vs folder.
>
> > regards, tom lane
>

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Josef Šimánek

so 18. 5. 2024 v 23:29 odesílatel Yasir  napsal:
>
>
>
> On Sun, May 19, 2024 at 2:23 AM Josef Šimánek  wrote:
>>
>> so 18. 5. 2024 v 23:16 odesílatel Tom Lane  napsal:
>> >
>> > =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
>> > > But this is different. If I understand it well, just by following
>> > > https://www.postgresql.org/docs/16/install-windows-full.html you'll
>> > > get those files no matter what is your specific environment (or
>> > > specific set of tools).
>> >
>> > Hm?  Visual Studio seems like quite a specific tool from here.
>>
>> I initially thought the .vs folder is created just by compiling
>> PostgreSQL using build.bat (like without opening Visual Studio at
>> all). But I'm not 100% sure, I'll take a look and report back.
>
>
> .vs folder is not created just by compiling PG. It is created if you open any 
> of .sln, .vcproj or .vcxproj files.
> I have verified it.

Yes, I can confirm. Just running build.bat doesn't create .vs. I'm
sorry for confusion and I do agree ignoring ".vs" directory is a local
environment thing and doesn't belong to Postgres .gitignore.

>>
>>
>> > I did some googling around the question of project .gitignore
>> > files ignoring .vs/, and was amused to come across this:
>> >
>> > https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
>> >
>> > which seems like a mighty fine example of where we *don't*
>> > want to go.
>>
>> That's clearly a nightmare to maintain. But in this case it should be
>> all hidden within one .vs folder.
>>
>> > regards, tom lane

Re: Why is citext/regress failing on hamerkop?

2024-05-18 Thread Andrew Dunstan




On 2024-05-16 Th 17:34, Tom Lane wrote:

Andrew Dunstan  writes:

On 2024-05-16 Th 17:15, Tom Lane wrote:

I'd rather have some visible status on the BF dashboard.  Invariably,
with a problem like this, the animal's owner is unaware there's a
problem.  If it's just silently not reporting, then no one else will
notice either, and we effectively lose an animal (despite it still
burning electricity to perform those rejected runs).

Fair enough. That will mean some database changes and other stuff, so it
will take a bit longer.

Sure, I don't think it's urgent.



I've pushed a small change, that should just mark with an asterisk any 
gitref that is more than 2 days older than the tip of the branch at the 
time of reporting.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Andrew Dunstan

On 2024-05-18 Sa 16:54, Yasir wrote:

On Sun, May 19, 2024 at 1:45 AM Andrew Dunstan  
wrote:

On 2024-05-18 Sa 15:43, Yasir wrote:

On Sat, May 18, 2024 at 7:27 PM Josef Šimánek
 wrote:

pá 17. 5. 2024 v 8:09 odesílatel Yasir
 napsal:
>
> Hi Hackers,
>
> I have been playing with PG on the Windows platform
recently. An annoying thing I faced is that a lot of Visual
Studio's temp files kept appearing in git changed files.
Therefore, I am submitting this very trivial patch to ignore
these temp files.

see

https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
for various strategies

We can add it to "~/.config/git/ignore" as it will ignore
globally on windows which we don't want. Also we don't have
".git/info/exclude" in PG project's so the best place left is
projects's .gitignore. That's what was patched.

eh? git creates .git/info/exclude in every git repository AFAIK.
And it's referred to here: 

Yes, git creates .git/info/exclude but point is, it is not in PG 
maintained codebase repo. So, no point adding to it.

BTW, Tom and Peter said it's not going to be added anyway!

You've completely missed my point, which is that *you* should be adding 
it to that file, as an alternative to using a (locally) global gitignore 
file.

I agree with Tom and Peter.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: First draft of PG 17 release notes

2024-05-18 Thread Bruce Momjian

On Fri, May 17, 2024 at 09:22:59PM +0800, jian he wrote:
> On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
> >
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html
> >
> > It will be improved until the final release.  The item count is 188,
> > which is similar to recent releases:
> >
> 
> This thread mentioned performance.
> actually this[1] refactored some interval aggregation related functions,
> which will make these two aggregate function: avg(interval), sum(interval)
> run faster, especially avg(interval).
> see [2].
> well, I guess, this is a kind of niche performance improvement to be
> mentioned separately.
> 
> 
> these 3 items need to be removed, because of
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=8aee330af55d8a759b2b73f5a771d9d34a7b887f
> 
> >> Add stratnum GiST support function (Paul A. Jungwirth)
> 
> >> Allow foreign keys to reference WITHOUT OVERLAPS primary keys (Paul A. 
> >> Jungwirth)
> >> The keyword PERIOD is used for this purpose.
> 
> >> Allow PRIMARY KEY and UNIQUE constraints to use WITHOUT OVERLAPS for 
> >> non-overlapping exclusion constraints (Paul A. Jungwirth)
> 
> 
> [1] 
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=519fc1bd9e9d7b408903e44f55f83f6db30742b7
> [2] 
> https://www.postgresql.org/message-id/CAEZATCUJ0xjyQUL7SHKxJ5a%2BDm5pjoq-WO3NtkDLi6c76rh58w%40mail.gmail.com

Agreed, I have applied the attached patch to make the release notes
current.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 9c511848943..0bc1c9a14ad 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -6,7 +6,7 @@
 
   
Release date:
-   2024-??-??, AS OF 2024-05-14
+   2024-??-??, AS OF 2024-05-18
   
 
   
@@ -480,17 +480,6 @@ Author: Tomas Vondra 
 
 Allow BRIN indexes to be created using parallel workers (Tomas Vondra, Matthias van de Meent)
 
-
-
-
-
-
-
-Add stratnum GiST support function (Paul A. Jungwirth)
-
 
 
  
@@ -1467,34 +1456,6 @@ Add DEFAULT setting for ALTER TABLE .. SET ACCESS METHOD (Michael Paquier)
 
 
 
-
-
-
-
-Allow foreign keys to reference WITHOUT OVERLAPS primary keys (Paul A. Jungwirth)
-
-
-
-The keyword PERIOD is used for this purpose.
-
-
-
-
-
-
-
-Allow PRIMARY KEY and UNIQUE constraints to use WITHOUT OVERLAPS for non-overlapping exclusion constraints (Paul A. Jungwirth)
-
-
-

Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Yasir

On Sun, May 19, 2024 at 2:35 AM Andrew Dunstan  wrote:

>
> On 2024-05-18 Sa 16:54, Yasir wrote:
>
>
>
> On Sun, May 19, 2024 at 1:45 AM Andrew Dunstan 
> wrote:
>
>>
>> On 2024-05-18 Sa 15:43, Yasir wrote:
>>
>>
>>
>> On Sat, May 18, 2024 at 7:27 PM Josef Šimánek 
>> wrote:
>>
>>> pá 17. 5. 2024 v 8:09 odesílatel Yasir 
>>> napsal:
>>> >
>>> > Hi Hackers,
>>> >
>>> > I have been playing with PG on the Windows platform recently. An
>>> annoying thing I faced is that a lot of Visual Studio's temp files kept
>>> appearing in git changed files. Therefore, I am submitting this very
>>> trivial patch to ignore these temp files.
>>>
>>> see
>>> https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
>>> for various strategies
>>>
>>>
>> We can add it to "~/.config/git/ignore" as it will ignore globally on
>> windows which we don't want. Also we don't have ".git/info/exclude" in PG
>> project's so the best place left is projects's .gitignore. That's what was
>> patched.
>>
>>
>>
>>
>> eh? git creates .git/info/exclude in every git repository AFAIK. And it's
>> referred to here: 
>> 
>>
>>
>> Yes, git creates .git/info/exclude but point is, it is not in PG
> maintained codebase repo. So, no point adding to it.
>
> BTW, Tom and Peter said it's not going to be added anyway!
>
>
>>
>>
> You've completely missed my point, which is that *you* should be adding it
> to that file, as an alternative to using a (locally) global gitignore file.
>
My bad Andrew.

> I agree with Tom and Peter.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>

Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-18 Thread Noah Misch

On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote:
> On Sat, May 4, 2024 at 7:36 AM Joe Conway  wrote:
> > On 5/3/24 11:44, Peter Eisentraut wrote:
> > > On 03.05.24 16:13, Tom Lane wrote:
> > >> Peter Eisentraut  writes:
> > >>> On 30.04.24 19:29, Tom Lane wrote:
> >  Also, the bigger picture here is the seeming assumption that "if
> >  we change pg_trgm then it will be safe to replicate from x86 to
> >  arm".  I don't believe that that's a good idea and I'm unwilling
> >  to promise that it will work, regardless of what we do about
> >  char signedness.  That being the case, I don't want to invest a
> >  lot of effort in the signedness issue.  Option (1) is clearly
> >  a small change with little if any risk of future breakage.
> > >>
> > >>> But note that option 1 would prevent some replication that is currently
> > >>> working.
> > >>
> > >> The point of this thread though is that it's working only for small
> > >> values of "work".  People are rightfully unhappy if it seems to work
> > >> and then later they get bitten by compatibility problems.
> > >>
> > >> Treating char signedness as a machine property in pg_control would
> > >> signal that we don't intend to make it work, and would ensure that
> > >> even the most minimal testing would find out that it doesn't work.
> > >>
> > >> If we do not do that, it seems to me we have to buy into making
> > >> it work.  That would mean dealing with the consequences of an
> > >> incompatible change in pg_trgm indexes, and then going through
> > >> the same dance again the next time(s) similar problems are found.
> > >
> > > Yes, that is understood.  But anecdotally, replicating between x86-64 
> > > arm64 is
> > > occasionally used for upgrades or migrations.  In practice, this appears 
> > > to have
> > > mostly worked.  If we now discover that it won't work with certain index
> > > extension modules, it's usable for most users. Even if we say, you have to
> > > reindex everything afterwards, it's probably still useful for these 
> > > scenarios.

Like you, I would not introduce a ControlFileData block for sign-of-char,
given the signs of breakage in extension indexing only.  That would lose much
useful migration capability.  I'm sympathetic to Tom's point, which I'll
attempt to summarize as: a ControlFileData block is a promise we know how to
keep, whereas we should expect further bug discoveries without it.  At the
same time, I put more weight on the apparently-wide span of functionality that
works fine.  (I wonder whether any static analyzer does or practically could
detect char sign dependence with a decent false positive rate.)

> +1
> 
> How about extending amcheck to support GIN and GIst indexes so that it
> can detect potential data incompatibility due to changing 'char' to
> 'unsigned char'? I think these new tests would be useful also for
> users to check if they really need to reindex indexes due to such
> changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18.
> Users who upgraded to PG18 can run the new amcheck tests on the
> primary as well as the standby.

If I were standardizing pg_trgm on one or the other notion of "char", I would
choose signed char, since I think it's still the majority.  More broadly, I
see these options to fix pg_trgm:

1. Change to signed char.  Every arm64 system needs to scan pg_trgm indexes.
2. Change to unsigned char.  Every x86 system needs to scan pg_trgm indexes.
3. Offer both, as an upgrade path.  For example, pg_trgm could have separate
   operator classes gin_trgm_ops and gin_trgm_ops_unsigned.  Running
   pg_upgrade on an unsigned-char system would automatically map v17
   gin_trgm_ops to v18 gin_trgm_ops_unsigned.  This avoids penalizing any
   architecture with upgrade-time scans.

Independently, having amcheck for GIN and/or GiST would be great.

Re: First draft of PG 17 release notes

2024-05-18 Thread Bruce Momjian

On Fri, May 17, 2024 at 01:30:03PM -0700, Jeff Davis wrote:
> On Thu, 2024-05-09 at 00:03 -0400, Bruce Momjian wrote:
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> > 
> > https://momjian.us/pgsql_docs/release-17.html
> 
> For this item:
> 
> Create a "builtin" collation provider similar to libc's C
> locale (Jeff Davis)
> 
> It uses a "C" locale which is identical but independent of
> libc, but it allows the use of non-"C" collations like "en_US"
> and "C.UTF-8" with the "C" locale, which libc does not. MORE? 
> 
> I suggest something more like:
> 
> New, platform-independent "builtin" collation
> provider. (Jeff Davis)
> 
> Currently, it offers the "C" and "C.UTF-8" locales. The
> "C.UTF-8" locale combines stable and fast code point order
> collation with Unicode character semantics.

Okay, I went with the attached applied patch.  Adjustments?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 0bc1c9a14ad..5d2fcd7d7de 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -886,11 +886,11 @@ Author: Jeff Davis 
 
 
 
-Create a "builtin" collation provider similar to libc's C locale (Jeff Davis)
+Create "builtin" collation provider similar to libc's C locale (Jeff Davis)
 
 
 
-It uses a "C" locale which is identical but independent of libc, but it allows the use of non-"C" collations like "en_US" and "C.UTF-8" with the "C" locale, which libc does not.  MORE?
+While its C locale is similar but independent of libc, its C.UTF-8 locale sorts by Unicode code points and has Unicode-based case folding.

Re: date_trunc function in interval version

2024-05-18 Thread Yasir

On Sun, May 19, 2024 at 2:20 AM Przemysław Sztoch 
wrote:

> Robert Haas wrote on 5/15/2024 9:29 PM:
>
> On Mon, Mar 4, 2024 at 5:03 AM Przemysław Sztoch  
>  wrote:
>
> Apparently the functionality is identical to date_bin.
> When I saw date_bin in the documentation, I thought it solved all my problems.
> Unfortunately, DST problems have many corner cases.
> I tried to change date_bin several times, but unfortunately in some cases it 
> would start working differently than before.
>
> So, first of all, thanks for taking an interest and sending a patch.
>
> In order for the patch to have a chance of being accepted, we would
> need to have a clear understanding of exactly how this patch is
> different from the existing date_bin(). If we knew that, we could
> decide either that (a) date_bin does the right thing and your patch
> does the wrong thing and therefore we should reject your patch, or we
> could decide that (b) date_bin does the wrong thing and therefore we
> should fix it, or we could decide that (c) both date_bin and what this
> patch does are correct, in the sense of being sensible things to do,
> and there is a reason to have both. But if we don't really understand
> how they are different, which seems to be the case right now, then we
> can't make any decisions. And what that means in practice is that
> nobody is going to be willing to commit anything, and we're just going
> to go around in circles.
>
> Typically, this kind of research is the responsibility of the patch
> author: you're the one who wants something changed, so that means you
> need to provide convincing evidence that it should be. If someone else
> volunteers to do it, that's also cool, but it absolutely has to be
> done in order for there to be a chance of progress here. No committer
> is going to say "well, we already have date_bin, but Przemysław says
> his date_trunc is different somehow, so let's have both without
> understanding how exactly they're different." That's just not a
> realistic scenario. Just to name one problem, how would we document
> each of them? Users would expect the documentation to explain how two
> closely-related functions differ, but we will be unable to explain
> that if we don't know the answer ourselves.
>
> If you can't figure out exactly what the differences are by code
> inspection, then maybe one thing you could do to help unblock things
> here is provide some very clear examples of when they deliver the same
> results and when they deliver different results. Although there are no
> guarantees, that might lead somebody else to jump in and suggest an
> explanation, or further avenues of analysis, or some other helpful
> comment.
>
> Personally, what I suspect is that there's already a way to do what
> you want using date_bin(), maybe in conjunction with some casting or
> some calls to other functions that we already have. But it's hard to
> be sure because we just don't have the details. "DST problems have
> many corner cases" and "in some cases [date_bin] would start working
> differently than before" may be true statements as far as they go, but
> they're not very specific complaints. If you can describe *exactly*
> how date_bin fails to meet your expectations, there is a much better
> chance that something useful will happen here.
>
>
> I would also like to thank Robert for presenting the matter in detail.
>
> My function date_trunc ( interval, timestamp, ...) is similar to original
> function date_trunc ( text, timestamp ...) .
>
> My extension only gives more granularity.
> We don't have a jump from hour to day. We can use 6h and 12h. It's the
> same with minutes.
> We can round to 30 minutes, 20 minutes, 15 minutes, etc.
>
> Using date_bin has a similar effect, but requires specifying the origin.
> According to this origin,
> subsequent buckets are then calculated. The need to provide this origin is
> sometimes a very big problem.
> Especially since you cannot use one origin when changing from summer to
> winter time.
>
> If we use one origin for example begin of year: 2024-01-01 00:00:00 then:
> # SET timezone='Europe/Warsaw';
> # SELECT date_bin('1 day', '2024-03-05 11:22:33', '2024-01-01 00:00:00'),
> date_trunc('day', '2024-03-05 11:22:33'::timestamptz);
> 2024-03-05 00:00:00+01 2024-03-05 00:00:00+01date_bin works ok,
> because we are before DST
> # SELECT date_bin('1 day', '2024-05-05 11:22:33', '2024-01-01 00:00:00'),
> date_trunc('day', '2024-05-05 11:22:33'::timestamptz);
> 2024-05-05 01:00:00+02 2024-05-05 00:00:00+02date_bin has
> problem, because we are in May after DST
>
> If anyone has an idea how to make date_bin work like date_trunc, please
> provide an example.
>
>
Here is an example which will make date_bin() to behave like date_trunc():
# SELECT date_bin('1 day', '2024-05-05 11:22:33', '0001-01-01'::timestamp),
date_trunc('day', '2024-05-05 11:22:33'::timestamptz);
  date_bin   |   date_trunc
-+
 20

Re: Why is citext/regress failing on hamerkop?

2024-05-18 Thread Tom Lane

Andrew Dunstan  writes:
> I've pushed a small change, that should just mark with an asterisk any 
> gitref that is more than 2 days older than the tip of the branch at the 
> time of reporting.

Thanks!

regards, tom lane

Re: pg_combinebackup does not detect missing files

2024-05-18 Thread David Steele


On 5/18/24 21:06, Tomas Vondra wrote:


On 5/17/24 14:20, Robert Haas wrote:

On Fri, May 17, 2024 at 1:18 AM David Steele  wrote:

However, I think allowing the user to optionally validate the input
would be a good feature. Running pg_verifybackup as a separate step is
going to be a more expensive then verifying/copying at the same time.
Even with storage tricks to copy ranges of data, pg_combinebackup is
going to aware of files that do not need to be verified for the current
operation, e.g. old copies of free space maps.


In cases where pg_combinebackup reuses a checksums from the input
manifest rather than recomputing it, this could accomplish something.
However, for any file that's actually reconstructed, pg_combinebackup
computes the checksum as it's writing the output file. I don't see how
it's sensible to then turn around and verify that the checksum that we
just computed is the same one that we now get. It makes sense to run
pg_verifybackup on the output of pg_combinebackup at a later time,
because that can catch bits that have been flipped on disk in the
meanwhile. But running the equivalent of pg_verifybackup during
pg_combinebackup would amount to doing the exact same checksum
calculation twice and checking that it gets the same answer both
times.


One more thing occurs to me -- if data checksums are enabled then a
rough and ready output verification would be to test the checksums
during combine. Data checksums aren't very good but something should be
triggered if a bunch of pages go wrong, especially since the block
offset is part of the checksum. This would be helpful for catching
combine bugs.


I don't know, I'm not very enthused about this. I bet pg_combinebackup
has some bugs, and it's possible that one of them could involve
putting blocks in the wrong places, but it doesn't seem especially
likely. Even if it happens, it's more likely to be that
pg_combinebackup thinks it's putting them in the right places but is
actually writing them to the wrong offset in the file, in which case a
block-checksum calculation inside pg_combinebackup is going to think
everything's fine, but a standalone tool that isn't confused will be
able to spot the damage.


Perhaps more importantly, can you even verify data checksums before the
recovery is completed? I don't think you can (pg_checksums certainly
does not allow doing that). Because who knows in what shape you copied
the block?


Yeah, you'd definitely need a list of blocks you knew to be valid at 
backup time, which sounds like a lot more work that just some overall 
checksumming scheme.


Regards,
-David

Re: Requiring LLVM 14+ in PostgreSQL 18

2024-05-18 Thread Ole Peder Brandtzæg

On Wed, May 15, 2024 at 07:20:09AM +0200, Peter Eisentraut wrote:
> Yes, let's get that v3-0001 patch into PG17.

Upon seeing this get committed in 4dd29b6833, I noticed that the docs
still advertise the llvm-config-$version search dance. That's still
correct for Meson-based builds since we use their config-tool machinery,
but no longer holds for configure-based builds. The attached patch
updates the docs accordingly.

-- 
Ole Peder Brandtzæg
In any case, these nights just ain't getting any easier
And who could judge us
For seeking comfort in the hazy counterfeit land of memory
>From 61dfbf5a252b53697cce17cd4885ecddb7665814 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ole=20Peder=20Brandtz=C3=A6g?= 
Date: Sun, 19 May 2024 00:29:09 +0200
Subject: [PATCH] doc: remove llvm-config search from configure documentation

As of 4dd29b6833, we no longer attempt to locate any other llvm-config
variant than plain llvm-config in configure-based builds; update the
documentation accordingly. (For Meson-based builds, we still use Meson's
LLVMDependencyConfigTool [0], which runs through a set of possible
suffixes [1], so no need to update the documentation there.)

[0]: https://github.com/mesonbuild/meson/blob/7d28ff29396f9d7043204de8ddc52226b9903811/mesonbuild/dependencies/dev.py#L184
[1]: https://github.com/mesonbuild/meson/blob/7d28ff29396f9d7043204de8ddc52226b9903811/mesonbuild/environment.py#L183
---
 doc/src/sgml/installation.sgml | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 1b32d5ca62..19abec2c34 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -941,12 +941,9 @@ build-postgresql:

  llvm-configllvm-config
  will be used to find the required compilation options.
- llvm-config, and then
- llvm-config-$major-$minor for all supported
- versions, will be searched for in your PATH.  If
- that would not yield the desired program,
- use LLVM_CONFIG to specify a path to the
- correct llvm-config. For example
+ llvm-config will be searched for in your PATH.
+ If that would not yield the desired program, use LLVM_CONFIG to
+ specify a path to the correct llvm-config. For example

 ./configure ... --with-llvm LLVM_CONFIG='/path/to/llvm/bin/llvm-config'

-- 
2.43.0

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-18 Thread Bruce Momjian

On Fri, May 17, 2024 at 11:44:40AM -0400, Melanie Plageman wrote:
> So, anyway, I'd argue that we need a parking lot for patches which we
> all agree are important and have a path forward but need someone to do
> the last 20-80% of the work. To avoid this being a dumping ground,
> patches should _only_ be allowed in the parking lot if they have a
> clear path forward. Patches which haven't gotten any interest don't go
> there. Patches in which the author has clearly not addressed feedback
> that is reasonable for them to address don't go there. These are
> effectively community TODOs which we agree need to be done -- if only
> someone had the time.

When I am looking to commit something, I have to consider:

*  do we want the change
*  are there concerns
*  are the docs good
*  does it need tests
*  is it only a proof-of-concept

When people review commit fest entries, they have to figure out what is
holding the patch back from being complete, so they have to read the
thread from the beginning.  Should there be a clearer way in the commit
fest app to specify what is missing?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: speed up a logical replica setup

2024-05-18 Thread Thomas Munro

040_pg_createsubscriber.pl seems to be failing occasionally on
culicidae near "--dry-run on node S".  I couldn't immediately see why.
That animal is using EXEC_BACKEND and I also saw a one-off failure a
bit like that on my own local Linux + EXEC_BACKEND test run
(sorry I didn't keep the details around).  Coincidence?

Re: Requiring LLVM 14+ in PostgreSQL 18

2024-05-18 Thread Thomas Munro

On Sun, May 19, 2024 at 10:46 AM Ole Peder Brandtzæg
 wrote:
> On Wed, May 15, 2024 at 07:20:09AM +0200, Peter Eisentraut wrote:
> > Yes, let's get that v3-0001 patch into PG17.
>
> Upon seeing this get committed in 4dd29b6833, I noticed that the docs
> still advertise the llvm-config-$version search dance. That's still
> correct for Meson-based builds since we use their config-tool machinery,
> but no longer holds for configure-based builds. The attached patch
> updates the docs accordingly.

Oops, right I didn't know we had that documented.  Thanks.  Will hold
off doing anything until the thaw.

Hmm, I also didn't know that Meson had its own list like our just-removed one:

https://github.com/mesonbuild/meson/blob/master/mesonbuild/environment.py#L183

Unsurprisingly, it suffers from maintenance lag, priority issues etc
(new major versions pop out every 6 months):

https://github.com/mesonbuild/meson/issues/10483

Re: Requiring LLVM 14+ in PostgreSQL 18

2024-05-18 Thread Ole Peder Brandtzæg

On Sun, May 19, 2024 at 11:05:49AM +1200, Thomas Munro wrote:
> Oops, right I didn't know we had that documented.  Thanks.  Will hold
> off doing anything until the thaw.

No worries, thanks!

> Hmm, I also didn't know that Meson had its own list like our just-removed one:
> 
> https://github.com/mesonbuild/meson/blob/master/mesonbuild/environment.py#L183

I didn't either before writing the doc patch, which led me to
investigate why it *just works* when doing meson setup and then I saw
the 40 odd "Trying a default llvm-config fallback…" lines in
meson-log.txt =) 

-- 
Ole Peder Brandtzæg
It's raining triple sec in Tchula
and the radio plays "Crazy Train"

Re: Requiring LLVM 14+ in PostgreSQL 18

2024-05-18 Thread Tom Lane

Thomas Munro  writes:
> Oops, right I didn't know we had that documented.  Thanks.  Will hold
> off doing anything until the thaw.

FWIW, I don't think the release freeze precludes docs-only fixes.
But if you prefer to sit on this, that's fine too.

regards, tom lane

Re: Streaming read-ready sequential scan code

2024-05-18 Thread Thomas Munro

On Sun, May 19, 2024 at 7:00 AM Alexander Lakhin  wrote:
> With blocknums[1], timing is changed, but the effect is not persistent.
> 10 query15 executions in a row, b7b0f3f27:
> 277.932 ms
> 281.805 ms
> 278.335 ms
> 281.565 ms
> 284.167 ms
> 283.171 ms
> 281.165 ms
> 281.615 ms
> 285.394 ms
> 277.301 ms

The bad time 10/10.

> b7b0f3f27~1:
> 159.789 ms
> 165.407 ms
> 160.893 ms
> 159.343 ms
> 160.936 ms
> 161.577 ms
> 161.637 ms
> 163.421 ms
> 163.143 ms
> 167.109 ms

The good time 10/10.

> b7b0f3f27 + blocknums[1]:
> 164.133 ms
> 280.920 ms
> 160.748 ms
> 163.182 ms
> 161.709 ms
> 161.998 ms
> 161.239 ms
> 276.256 ms
> 161.601 ms
> 160.384 ms

The good time 8/10, the bad time 2/10.

Thanks for checking!  I bet all branches can show that flip/flop
instability in these adverse conditions, depending on random
scheduling details.  I will start a new thread with a patch for the
root cause of that, ie problem #2 (this will need back-patching), and
post a fix for #3 (v17 blocknums[N] tweak affecting
fairness/likelihood, which was probably basically a bit of ill-advised
premature optimisation) here in a few days.

PATCH: Add query for operators unusable with RLS to documentation

2024-05-18 Thread Josh Snyder

When deploying RLS, I was surprised to find that certain queries which used 
only builtin indexes and operators had dramatically different query plans when 
a policy is applied. In my case, the query `tsvector @@ tsquery` over a GIN
index was no longer able to use that index. I was able to find one other
instance [1] of someone being surprised by this behavior on the mailing lists.

The docs already discuss the LEAKPROOF semantics in the abstract, but I think 
they place not enough focus on the idea that builtin operators can be (and
frequently are) not leakproof. Based on the query given in the attached patch,
I found that 387 operators are not leakproof versus 588 that are.

The attached patch updates the documentation to provide an easy query over
system catalogs as a way of determining which operators will no longer perform
well under RLS or a security-barrier view.

Thanks,
Josh

[1] 
https://www.postgresql.org/message-id/CAGrP7a2t%2BJbeuxpQY%2BRSvNe4fr3%2B%3D%3DUmyimwV0GCD%2BwcrSSb%3Dw%40mail.gmail.comFrom db382697f14bd12dd9e29606c314d7a35bd290ea Mon Sep 17 00:00:00 2001
From: Josh Snyder 
Date: Sat, 18 May 2024 15:50:47 -0700
Subject: [PATCH] Add query for operators unusable with RLS

---
 doc/src/sgml/rules.sgml | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/doc/src/sgml/rules.sgml b/doc/src/sgml/rules.sgml
index 7a928bd7b9..3b1283c002 100644
--- a/doc/src/sgml/rules.sgml
+++ b/doc/src/sgml/rules.sgml
@@ -2167,6 +2167,32 @@ CREATE VIEW phone_number WITH (security_barrier) AS
 view's row filters.
 
 
+
+When working with security_barrier views or row-level
+security policies, a query author can check in advance which index-access
+operators cannot cross the security barrier by querying system catalogs:
+
+
+
+SELECT
+  amname,
+  format('%s(%s,%s) [%s]', oprcode, lefttype.typname, righttype.typname, oprname)
+FROM pg_operator
+  JOIN pg_amop ON pg_operator.oid = pg_amop.amopopr
+  JOIN pg_am ON pg_amop.amopmethod = pg_am.oid
+  JOIN pg_proc ON pg_operator.oprcode = pg_proc.oid
+  JOIN pg_type lefttype ON pg_amop.amoplefttype = lefttype.oid
+  JOIN pg_type righttype ON pg_amop.amoprighttype = righttype.oid
+WHERE proleakproof = false;
+
+
+
+Any operator returned by the above query will not be able to perform an
+indexed lookup through the security barrier. In that case, the view's own
+conditions will be applied first, followed by conditions added by the query
+author.
+
+
 
 It is important to understand that even a view created with the
 security_barrier option is intended to be secure only
-- 
2.40.1

Re: Sort functions with specialized comparators

2024-05-18 Thread Ranier Vilela

Em sáb., 18 de mai. de 2024 às 15:52, Andrey M. Borodin <
x4...@yandex-team.ru> escreveu:

> Hi!
>
> In a thread about sorting comparators[0] Andres noted that we have
> infrastructure to help compiler optimize sorting. PFA attached PoC
> implementation. I've checked that it indeed works on the benchmark from
> that thread.
>
> postgres=# CREATE TABLE arrays_to_sort AS
>SELECT array_shuffle(a) arr
>FROM
>(SELECT ARRAY(SELECT generate_series(1, 100)) a),
>generate_series(1, 10);
>
> postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- original
> Time: 990.199 ms
> postgres=# SELECT (sort(arr))[1] FROM arrays_to_sort; -- patched
> Time: 696.156 ms
>
> The benefit seems to be on the order of magnitude with 30% speedup.
>
> There's plenty of sorting by TransactionId, BlockNumber, OffsetNumber, Oid
> etc. But this sorting routines never show up in perf top or something like
> that.
>
> Seems like in most cases we do not spend much time in sorting. But
> specialization does not cost us much too, only some CPU cycles of a
> compiler. I think we can further improve speedup by converting inline
> comparator to value extractor: more compilers will see what is actually
> going on. But I have no proofs for this reasoning.
>
> What do you think?
>
Makes sense.

Regarding the patch.
You could change the style to:

+sort_int32_asc_cmp(const int32 *a, const int32 *b)
+sort_int32_desc_cmp(const int32 *a, const int32 *b)

We must use const in all parameters that can be const.

best regards,
Ranier Vilela

Re: generic plans and "initial" pruning

2024-05-18 Thread David Rowley

On Fri, 20 Jan 2023 at 08:39, Tom Lane  wrote:
> I spent some time re-reading this whole thread, and the more I read
> the less happy I got.  We are adding a lot of complexity and introducing
> coding hazards that will surely bite somebody someday.  And after awhile
> I had what felt like an epiphany: the whole problem arises because the
> system is wrongly factored.  We should get rid of AcquireExecutorLocks
> altogether, allowing the plancache to hand back a generic plan that
> it's not certain of the validity of, and instead integrate the
> responsibility for acquiring locks into executor startup.  It'd have
> to be optional there, since we don't need new locks in the case of
> executing a just-planned plan; but we can easily add another eflags
> bit (EXEC_FLAG_GET_LOCKS or so).  Then there has to be a convention
> whereby the ExecInitNode traversal can return an indicator that
> "we failed because the plan is stale, please make a new plan".

I also reread the entire thread up to this point yesterday. I've also
been thinking about this recently as Amit has mentioned it to me a few
times over the past few months.

With the caveat of not yet having looked at the latest patch, my
thoughts are that having the executor startup responsible for taking
locks is a bad idea and I don't think we should go down this path. My
reasons are:

1. No ability to control the order that the locks are obtained. The
order in which the locks are taken will be at the mercy of the plan
the planner chooses.
2. It introduces lots of complexity regarding how to cleanly clean up
after a failed executor startup which is likely to make exec startup
slower and the code more complex
3. It puts us even further down the path of actually needing an
executor startup phase.

For #1, the locks taken for SELECT queries are less likely to conflict
with other locks obtained by PostgreSQL, but at least at the moment if
someone is getting deadlocks with a DDL type operation, they can
change their query or DDL script so that locks are taken in the same
order.  If we allowed executor startup to do this then if someone
comes complaining that PG18 deadlocks when PG17 didn't we'd just have
to tell them to live with it.  There's a comment at the bottom of
find_inheritance_children_extended() just above the qsort() which
explains about the deadlocking issue.

I don't have much extra to say about #2.  As mentioned, I've not
looked at the patch. On paper, it sounds possible, but it also sounds
bug-prone and ugly.

For #3, I've been thinking about what improvements we can do to make
the executor more efficient. In [1], Andres talks about some very
interesting things. In particular, in his email items 3) and 5) are
relevant here. If we did move lots of executor startup code into the
planner, I think it would be possible to one day get rid of executor
startup and have the plan record how much memory is needed for the
non-readonly part of the executor state and tag each plan node with
the offset in bytes they should use for their portion of the executor
working state. This would be a single memory allocation for the entire
plan.  The exact details are not important here, but I feel like if we
load up executor startup with more responsibilities, it'll just make
doing something like this harder.  The init run-time pruning code that
I worked on likely already has done that, but I don't think it's
closed the door on it as it might just mean allocating more executor
state memory than we need to. Providing the plan node records the
offset into that memory, I think it could be made to work, just with
the inefficiency of having a (possibly) large unused hole in that
state memory.

As far as I understand it, your objection to the original proposal is
just on the grounds of concerns about introducing hazards that could
turn into bugs.  I think we could come up with some way to make the
prior method of doing pruning before executor startup work. I think
what Amit had before your objection was starting to turn into
something workable and we should switch back to working on that.

David

[1] 
https://www.postgresql.org/message-id/20180525033538.6ypfwcqcxce6zkjj%40alap3.anarazel.de

Re: generic plans and "initial" pruning

2024-05-18 Thread Tom Lane

David Rowley  writes:
> With the caveat of not yet having looked at the latest patch, my
> thoughts are that having the executor startup responsible for taking
> locks is a bad idea and I don't think we should go down this path.

OK, it's certainly still up for argument, but ...

> 1. No ability to control the order that the locks are obtained. The
> order in which the locks are taken will be at the mercy of the plan
> the planner chooses.

I do not think I buy this argument, because plancache.c doesn't
provide any "ability to control the order" today, and never has.
The order in which AcquireExecutorLocks re-gets relation locks is only
weakly related to the order in which the parser/planner got them
originally.  The order in which AcquirePlannerLocks re-gets the locks
is even less related to the original.  This doesn't cause any big
problems that I'm aware of, because these locks are fairly weak.

I think we do have a guarantee that for partitioned tables, parents
will be locked before children, and that's probably valuable.
But an executor-driven lock order could preserve that property too.

> 2. It introduces lots of complexity regarding how to cleanly clean up
> after a failed executor startup which is likely to make exec startup
> slower and the code more complex

Perhaps true, I'm not sure.  But the patch we'd been discussing
before this proposal was darn complex as well.

> 3. It puts us even further down the path of actually needing an
> executor startup phase.

Huh?  We have such a thing already.

> For #1, the locks taken for SELECT queries are less likely to conflict
> with other locks obtained by PostgreSQL, but at least at the moment if
> someone is getting deadlocks with a DDL type operation, they can
> change their query or DDL script so that locks are taken in the same
> order.  If we allowed executor startup to do this then if someone
> comes complaining that PG18 deadlocks when PG17 didn't we'd just have
> to tell them to live with it.  There's a comment at the bottom of
> find_inheritance_children_extended() just above the qsort() which
> explains about the deadlocking issue.

The reason it's important there is that function is (sometimes)
used for lock modes that *are* exclusive.

> For #3, I've been thinking about what improvements we can do to make
> the executor more efficient. In [1], Andres talks about some very
> interesting things. In particular, in his email items 3) and 5) are
> relevant here. If we did move lots of executor startup code into the
> planner, I think it would be possible to one day get rid of executor
> startup and have the plan record how much memory is needed for the
> non-readonly part of the executor state and tag each plan node with
> the offset in bytes they should use for their portion of the executor
> working state.

I'm fairly skeptical about that idea.  The entire reason we have an
issue here is that we want to do runtime partition pruning, which
by definition can't be done at plan time.  So I doubt it's going
to play nice with what we are trying to accomplish in this thread.

Moreover, while "replace a bunch of small pallocs with one big one"
would save some palloc effort, what are you going to do to ensure
that that memory has the right initial contents?  I think this idea is
likely to make the executor a great deal more notationally complex
without actually buying all that much.  Maybe Andres can make it work,
but I don't want to contort other parts of the system design on the
purely hypothetical basis that this might happen.

> I think what Amit had before your objection was starting to turn into
> something workable and we should switch back to working on that.

The reason I posted this idea was that I didn't think the previously
existing patch looked promising at all.

regards, tom lane

Re: generic plans and "initial" pruning

2024-05-18 Thread David Rowley

On Sun, 19 May 2024 at 13:27, Tom Lane  wrote:
>
> David Rowley  writes:
> > 1. No ability to control the order that the locks are obtained. The
> > order in which the locks are taken will be at the mercy of the plan
> > the planner chooses.
>
> I do not think I buy this argument, because plancache.c doesn't
> provide any "ability to control the order" today, and never has.
> The order in which AcquireExecutorLocks re-gets relation locks is only
> weakly related to the order in which the parser/planner got them
> originally.  The order in which AcquirePlannerLocks re-gets the locks
> is even less related to the original.  This doesn't cause any big
> problems that I'm aware of, because these locks are fairly weak.

It may not bite many people, it's just that if it does, I don't see
what we could do to help those people. At the moment we could tell
them to adjust their DDL script to obtain the locks in the same order
as their query.  With your idea that cannot be done as the order could
change when the planner switches the join order.

> I think we do have a guarantee that for partitioned tables, parents
> will be locked before children, and that's probably valuable.
> But an executor-driven lock order could preserve that property too.

I think you'd have to lock the parent before the child. That would
remain true and consistent anyway when taking locks during a
breadth-first plan traversal.

> > For #3, I've been thinking about what improvements we can do to make
> > the executor more efficient. In [1], Andres talks about some very
> > interesting things. In particular, in his email items 3) and 5) are
> > relevant here. If we did move lots of executor startup code into the
> > planner, I think it would be possible to one day get rid of executor
> > startup and have the plan record how much memory is needed for the
> > non-readonly part of the executor state and tag each plan node with
> > the offset in bytes they should use for their portion of the executor
> > working state.
>
> I'm fairly skeptical about that idea.  The entire reason we have an
> issue here is that we want to do runtime partition pruning, which
> by definition can't be done at plan time.  So I doubt it's going
> to play nice with what we are trying to accomplish in this thread.

I think we could have both, providing there was a way to still
traverse the executor state tree in EXPLAIN. We'd need a way to skip
portions of the plan that are not relevant or could be invalid for the
current execution. e.g can't show Index Scan because index has been
dropped.

> > I think what Amit had before your objection was starting to turn into
> > something workable and we should switch back to working on that.
>
> The reason I posted this idea was that I didn't think the previously
> existing patch looked promising at all.

Ok.  It would be good if you could expand on that so we could
determine if there's some fundamental reason it can't work or if
that's because you were blinded by your epiphany and didn't give that
any thought after thinking of the alternative idea.

I've gone to effort to point out things that I think are concerning
with your idea. It would be good if you could do the same for the
previous patch other than "it didn't look promising". It's pretty hard
for me to argue with that level of detail.

David

Minor fixes for couple some comments around MERGE RETURNING

2024-05-18 Thread David Rowley

I noticed that PlannedStmt.hasReturning and hasModifyingCTE have an
outdated comment now that MERGE supports RETURNING (per commit
c649fa24a)

i.e. these two:

> bool hasReturning; /* is it insert|update|delete RETURNING? */

> bool hasModifyingCTE; /* has insert|update|delete in WITH? */

transformWithClause() has:

/* must be a data-modifying statement */
Assert(IsA(cte->ctequery, InsertStmt) ||
   IsA(cte->ctequery, UpdateStmt) ||
   IsA(cte->ctequery, DeleteStmt) ||
   IsA(cte->ctequery, MergeStmt));

pstate->p_hasModifyingCTE = true;

which eventually makes it into PlannedStmt.hasModifyingCTE.

The attached trivial patch fixes these.

David


merge_returning_comments.patch
Description: Binary data

Re: First draft of PG 17 release notes

2024-05-18 Thread David Rowley

On Sun, 19 May 2024 at 02:40, Bruce Momjian  wrote:
>
> On Thu, May 16, 2024 at 03:35:17PM +1200, David Rowley wrote:
> > "Additionally, vacuum no longer silently imposes a 1GB tuple reference
> > limit even when maintenance_work_mem or autovacuum_work_mem are set to
> > higher values"

> Slightly adjusted wording patch attached and applied.

Thanks for adjusting.

It's a minor detail, but I'll mention it because you went to the
effort to adjust it away from what I'd written...

I didn't make a random choice to use "or" between the two GUCs.
Changing it to "and", IMO, isn't an improvement.  Using "and" implies
that the silent limited was only imposed when both of these GUCs were
set >= 1GB. That's not true. For the case we're talking about here, if
autovacuum_work_mem is set to anything apart from -1 then the value of
maintenance_work_mem does not matter.

David

Re: Underscore in positional parameters?

2024-05-18 Thread Alexander Lakhin


Hello Erik,

18.05.2024 04:31, Erik Wienhold wrote:

On 2024-05-17 02:06 +0200, Michael Paquier wrote:

On Thu, May 16, 2024 at 08:41:11AM +0200, Peter Eisentraut wrote:

On this specific patch, maybe reword "parameter too large" to "parameter
number too large".

WFM here.

Done in v3.


Thank you for working on this!

I encountered anomalies that you address with this patch too.
And I can confirm that it fixes most cases, but there is another one:
SELECT $3 \bind 'foo' \g
ERROR:  invalid memory alloc request size 12

Maybe you would find this worth fixing as well.

Best regards,
Alexander

Re: Multiple startup messages over the same connection

2024-05-18 Thread Jelte Fennema-Nio

On Sat, 18 May 2024 at 23:10, Vladimir Churyukin  wrote:
> I guess the process of passing control from child processes to the parent 
> could be a bit tricky for that one, but doable?
> Is there anything I'm missing that can be a no-go for this?

One seriously difficult/possibly impossible thing is passing SSL
session state between processes using shared memory.

60 matches

Mail list logo