Re: Views no longer in rangeTabls?

2023-06-09 Thread Amit Langote
On Sat, Jun 10, 2023 at 15:51 David Steele  wrote:

> On 6/9/23 19:14, Tom Lane wrote:
> > David Steele  writes:
> >> Thank you, this was very helpful. I am able to get the expected result
> >> now with:
> >
> >> /* We only care about tables/views and can ignore subqueries, etc. */
> >> if (!(rte->rtekind == RTE_RELATION ||
> >>(rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid
> >>   continue;
> >
> > Right, that matches places like add_rtes_to_flat_rtable().
>
> Good to have confirmation of that, thanks!
>
> >> One thing, though, rte->relkind is not set for views, so I still need to
> >> call get_rel_relkind(rte->relid). Not a big deal, but do you think it
> >> would make sense to set rte->relkind for views?
> >
> > If you see "rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)",
> > it's dead certain that relid refers to a view, so you could just wire
> > in that knowledge.
>
> Yeah, that's a good trick. Even so, I wonder why relkind is not set when
> relid is set?


I too have been thinking that setting relkind might be a good idea, even if
only as a crosscheck that only view relations can look like that in the
range table.

> --
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: Views no longer in rangeTabls?

2023-06-09 Thread David Steele

On 6/9/23 19:14, Tom Lane wrote:

David Steele  writes:

Thank you, this was very helpful. I am able to get the expected result
now with:



/* We only care about tables/views and can ignore subqueries, etc. */
if (!(rte->rtekind == RTE_RELATION ||
   (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid
  continue;


Right, that matches places like add_rtes_to_flat_rtable().


Good to have confirmation of that, thanks!


One thing, though, rte->relkind is not set for views, so I still need to
call get_rel_relkind(rte->relid). Not a big deal, but do you think it
would make sense to set rte->relkind for views?


If you see "rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)",
it's dead certain that relid refers to a view, so you could just wire
in that knowledge.


Yeah, that's a good trick. Even so, I wonder why relkind is not set when 
relid is set?


Regards,
-David




Re: Cleaning up threading code

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-10 14:23:52 +1200, Thomas Munro wrote:
> 1.  We should completely remove --disable-thread-safety and the
> various !defined(ENABLE_THREAD_SAFETY) code paths.

Yes, please!


> 2.  I don't like the way we have to deal with POSIX vs Windows at
> every site where we use threads, and each place has a different style
> of wrappers.  I considered a few different approaches to cleaning this
> up:
> 
> * provide centralised and thorough pthread emulation for Windows; I
> don't like this, I don't even like all of pthreads and there are many
> details to get lost in
> * adopt C11 ; unfortunately it is too early, so you'd need
> to write/borrow replacements for at least 3 of our 11 target systems
> * invent our own mini-abstraction for a carefully controlled subset of stuff
> 
> Here is an attempt at that last thing.  Since I don't really like
> making up new names for things, I just used all the C11 names but with
> pg_ in front, and declared it all in "port/pg_threads.h" (actually I
> tried to write C11 replacements and then noped out and added the pg_
> prefixes).  I suppose the idea is that it and the prefixes might
> eventually go away.  Note: here I am talking only about very basic
> operations like create, exit, join, explicit thread locals -- the
> stuff that we actually use today in frontend code.

Unsurprisingly, I like this.


> I'm not talking
> about other stuff like C11 atomics, memory models, and the
> thread_local storage class, which are all very good and interesting
> topics for another day.

Hm. I agree on C11 atomics and memory models, but I don't see a good reason to
not add support for thread_local?

In fact, I'd rather add support for thread_local and not add support for
"thread keys" than the other way round. Afaict most, if not all, the places
using keys would look simpler with thread_local than with keys.


> One mystery still eludes me on Windows: while trying to fix the
> ancient bug that ECPG leaks memory on Windows, because it doesn't call
> thread-local destructors, I discovered that if you use FlsAlloc
> instead of TlsAlloc you can pass in a destructor (that's
> "fibre-local", but we're not using fibres so it's works out the same
> as thread local AFAICT).  It seems to crash in strange ways if you
> uncomment the line FlsAlloc(destructor).

Do you have a backtrace available?


> Subject: [PATCH 1/5] Remove obsolete comments and code from fe-auth.c.
> Subject: [PATCH 2/5] Rename port/thread.c to port/user.c.

LGTM


> -###
> -# Threading
> -###
> -
> -# XXX: About to rely on thread safety in the autoconf build, so not worth
> -# implementing a fallback.
> -cdata.set('ENABLE_THREAD_SAFETY', 1)

I wonder if we should just unconditionally set that in c.h or such? It'd not
be crazy for external projects to rely on that being set.



> From ca74df4ff11ce0fd1e51786eccaeca810921fc6d Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Sat, 10 Jun 2023 09:14:07 +1200
> Subject: [PATCH 4/5] Add port/pg_threads.h for a common threading API.
> 
> Loosely based on C11's , but with pg_ prefixes, this will
> allow us to clean up many places that have to cope with POSIX and
> Windows threads.
> ---
>  src/include/port/pg_threads.h| 252 +++
>  src/port/Makefile|   1 +
>  src/port/meson.build |   1 +
>  src/port/pg_threads.c| 117 ++
>  src/tools/pgindent/typedefs.list |   7 +
>  5 files changed, 378 insertions(+)
>  create mode 100644 src/include/port/pg_threads.h
>  create mode 100644 src/port/pg_threads.c
> 
> diff --git a/src/include/port/pg_threads.h b/src/include/port/pg_threads.h
> new file mode 100644
> index 00..1706709994
> --- /dev/null
> +++ b/src/include/port/pg_threads.h
> @@ -0,0 +1,252 @@
> +/*
> + * A multi-threading API abstraction loosely based on the C11 standard's
> + *  header.  The identifiers have a pg_ prefix.  Perhaps one day
> + * we'll use standard C threads directly, and we'll drop the prefixes.
> + *
> + * Exceptions:
> + *  - pg_thrd_barrier_t is not based on C11
> + */
> +
> +#ifndef PG_THREADS_H
> +#define PG_THREADS_H
> +
> +#ifdef WIN32

I guess we could try to use C11 thread.h style threading on msvc 2022:
https://developercommunity.visualstudio.com/t/finalize-c11-support-for-threading/1629487


> +#define WIN32_LEAN_AND_MEAN

Why do we need this again here - shouldn't the define in
src/include/port/win32_port.h already take care of this?


> +#include 
> +#include 
> +#include 
> +#include 
> +#else
> +#include 
> +#include "port/pg_pthread.h"
> +#endif

Seems somewhat odd to have port pg_threads.h and pg_pthread.h - why not merge
these?


> +#include 

I think we widely rely stdint.h, errno.h to be included via c.h.


> +#ifdef WIN32
> +typedef HANDLE pg_thrd_t;
> +typedef CRITICAL_SECTION pg_mtx_t;
> +typedef CONDITION_VARIABLE pg_cnd_t;
> 

Re: Cleaning up nbtree after logical decoding on standby work

2023-06-09 Thread Peter Geoghegan
On Fri, Jun 9, 2023 at 9:40 PM Andres Freund  wrote:
> I don't think minimizing heaprel being passed around is a worthwhile goal, the
> contrary actually: It just makes it painful to use heaprel anywhere, because
> it causes precisely these cascading changes of adding/removing the parameter
> to a bunch of functions. If anything we should do the opposite.
>
>
> > It's not like nbtree ever really "used P_NEW". It doesn't actually
> > depend on any of the P_NEW handling inside bufmgr.c. It looks a little
> > like it might, but that's just an accident.
>
> That part I am entirely on board with, as mentioned earlier. It doesn't seem
> like 16 material though.

Obviously you shouldn't need a heaprel to lock a page. As it happened
GiST already worked without this sort of P_NEW idiom, which is why
commit 61b313e4 hardly made any changes at all to GiST, even though
the relevant parts of GiST are heavily based on nbtree. Did you just
forget to plaster similar heaprel arguments all over GiST and SP-GiST?

I'm really disappointed that you're still pushing back here, even
after I got a +1 on backpatching from Heikki. This should have been
straightforward.

-- 
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-09 12:23:36 -0700, Peter Geoghegan wrote:
> On Fri, Jun 9, 2023 at 12:03 PM Andres Freund  wrote:
> > > My new plan is to commit this tomorrow, since the clear consensus is
> > > that we should go ahead with this for 16.
> >
> > I'm not sure there is that concensus (for me half the changes shouldn't be
> > done, the rest should be in 17), but in the end it doesn't matter that much.
> 
> Really? What parts are you opposed to in principle? I don't see why
> you wouldn't support everything or nothing for 17 (questions of style
> aside). I don't see what's ambiguous about what we should do here,
> barring the 16-or-17 question.

I don't think minimizing heaprel being passed around is a worthwhile goal, the
contrary actually: It just makes it painful to use heaprel anywhere, because
it causes precisely these cascading changes of adding/removing the parameter
to a bunch of functions. If anything we should do the opposite.


> It's not like nbtree ever really "used P_NEW". It doesn't actually
> depend on any of the P_NEW handling inside bufmgr.c. It looks a little
> like it might, but that's just an accident.

That part I am entirely on board with, as mentioned earlier. It doesn't seem
like 16 material though.

Greetings,

Andres Freund




Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-06 21:53:00 +0200, Alvaro Herrera wrote:
> On 2023-Apr-24, Andres Freund wrote:
> 
> > A prototype of that approach is attached. I pushed the retry handling into 
> > the
> > pg_* routines where applicable.  I guess we could add pg_* routines for
> > FileFallocate(), FilePrewarm() etc as well, but I didn't do that here.
> > 
> > Christoph, could you verify this fixes your issue?
> 
> So, is anyone making progress on this?  I don't see anything in the
> thread.

Thanks for bringing it up again, I had lost track. I now added an open items
entry.

My gut feeling is that we should go with something quite minimal at this
stage.


> On adding the missing pg_* wrappers: I think if we don't (and we leave
> the retry loops at the File* layer), then the risk is that some external
> code would add calls to the underlying File* routines trusting them to
> do the retrying, which would then become broken when we move the retry
> loops to the pg_* wrappers when we add them.  That doesn't seem terribly
> serious to me.

I'm not too worried about that either.


Unless somebody strongly advocates a different path, I plan to push something
along the lines of the prototype I had posted. After reading over it a bunch
more times, some of the code is a bit finnicky.


I wish we had some hack that made syscalls EINTR at a random intervals, just
to make it realistic to test these paths...

Greetings,

Andres Freund




Re: pg_stat_io for the startup process

2023-06-09 Thread Andres Freund
Hi,

On 2023-05-22 13:36:42 +0900, Kyotaro Horiguchi wrote:
> At Sun, 21 May 2023 15:14:23 -0700, Andres Freund  wrote 
> in 
> > Hi,
> > 
> > I don't really feel confident we're going to get the timeout approach solid
> > enough for something going into the tree around beta 1.
> > 
> > How about this, imo a lot simpler, approach: We flush stats whenever 
> > replaying
> > a XLOG_RUNNING_XACTS record. Unless the primary is idle, it will log those 
> > at
> > a regular interval. If the primary is idle, we don't need to flush stats in
> > the startup process, because we'll not have done any io.
> > 
> > We only log XLOG_RUNNING_XACTS when wal_level >= replica, so stats wouldn't
> > get regularly flushed if wal_level = minimal - but in that case the stats 
> > are
> > also not accessible, so that's not a problem.
> 
> I concur with the three aspects, interval regularity, reliability and
> about the case of the minimal wal_level. While I can't confidently
> claim it is the best, its simplicilty gives us a solid reason to
> proceed with it.  If necessary, we can switch to alternative timing
> sources in the future without causing major disruptions for users, I
> believe.
> 
> > It's not the prettiest solution, but I think the simplicity is worth a lot.
> > 
> > Greetings,
> 
> +1.

Attached a patch implementing this approach.

It's imo always a separate bug that we were using
GetCurrentTransactionStopTimestamp() when force was passed in - that timestamp
can be quite out of date in some cases. But I don't immediately see a
noticeable consequence, so ...

Greetings,

Andres Freund
>From 249c811da9d257c5a71e42016584fc3db0c9a99b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 9 Jun 2023 21:19:36 -0700
Subject: [PATCH v1] Report stats when replaying XLOG_RUNNING_XACTS

Previously stats in the startup process would only get reported during
shutdown of the startup process. It has been that way for a long time, but
became a lot more noticeable with the new pg_stat_io view, which separates out
IO done by different backend types...

While replaying after every XLOG_RUNNING_XACTS isn't the prettiest approach,
it has the advantage of being quite easy. Given that we're well past feature
freeze...

It's not a problem that we don't report stats more frequently with
wal_level=minimal, in that case stats can't be read before the stats process
has shut down.

Reported-by: Fujii Masao 
Discussion: https://postgr.es/m/5315aedc-fbca-1556-c5de-dc2e00b23...@oss.nttdata.com
---
 src/backend/storage/ipc/standby.c   |  9 +
 src/backend/utils/activity/pgstat.c | 17 ++---
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index ffe5e1563f5..70b0da50fc1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -1193,6 +1193,15 @@ standby_redo(XLogReaderState *record)
 		running.xids = xlrec->xids;
 
 		ProcArrayApplyRecoveryInfo(&running);
+
+		/*
+		 * The startup process currently has no convenient way to schedule
+		 * stats to be reported. XLOG_RUNNING_XACTS records issued at a
+		 * regular cadence, making this a convenient location to report
+		 * stats. While these records aren't generated with wal_level=minimal,
+		 * stats also cannot be accessed during WAL replay.
+		 */
+		pgstat_report_stat(true);
 	}
 	else if (info == XLOG_INVALIDATIONS)
 	{
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 0cdb552631e..d743fc0b289 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -615,10 +615,21 @@ pgstat_report_stat(bool force)
 	 */
 	Assert(!pgStatLocal.shmem->is_shutdown);
 
-	now = GetCurrentTransactionStopTimestamp();
-
-	if (!force)
+	if (force)
 	{
+		/*
+		 * Stats reports are forced either when it's been too long since stats
+		 * have been reported or in processes that force stats reporting to
+		 * happen at specific points (including shutdown). In the former case
+		 * the transaction stop time might be quite old, in the latter it
+		 * would never get cleared.
+		 */
+		now = GetCurrentTimestamp();
+	}
+	else
+	{
+		now = GetCurrentTransactionStopTimestamp();
+
 		if (pending_since > 0 &&
 			TimestampDifferenceExceeds(pending_since, now, PGSTAT_MAX_INTERVAL))
 		{
-- 
2.38.0



Re: Documentation for building with meson

2023-06-09 Thread Andres Freund
Hi,

On 2023-03-28 12:27:26 -0700, samay sharma wrote:
> +   
> +If you want to build the docs, you can type:
> +
> +ninja docs
> +
> +   

To me 'you can type' sounds odd. To me even just "To build the docs:" would
sound better. But the make docs do it "your" way, so I'll just go along with
it for now.


> From c5e637a54c2b83e5bd8c4155784d97e82937eb51 Mon Sep 17 00:00:00 2001
> From: Samay Sharma 
> Date: Mon, 6 Feb 2023 16:09:42 -0800
> Subject: [PATCH v9 2/5] Add data layout options sub-section in installation
>  docs
>
> This commit separates out blocksize, segsize and wal_blocksize
> options into a separate Data layout options sub-section in both
> the make and meson docs. They were earlier in a miscellaneous
> section which included several unrelated options. This change
> also helps reduce the repetition of the warnings that changing
> these parameters breaks on-disk compatibility.

I still like this change, but ISTM that the "Data Layout" section should
follow the "PostgreSQL Features" section, rather than follow "Anti Features",
"Build Process Details" and "Miscellaneous". I realize some of these are
reorganized later on, but even then "Build Process Details"

Would anybody mind if I swapped these around?


> +  

I don't quite understand the "-with" added to the ids?


> From 11d82aa49efb3d1cbc08f14562a757f115053c8b Mon Sep 17 00:00:00 2001
> From: Samay Sharma 
> Date: Mon, 13 Feb 2023 16:23:52 -0800
> Subject: [PATCH v9 3/5] Remove Anti-Features section from Installation from
>  source docs
>
> Currently, several meson setup options are listed in anti-features.
> However, they are similar to most other options in the postgres
> features list as they are 'auto' features themselves. Also, other
> options are likely better suited to the developer options section.
> This commit, therefore, moves the options listed in the anti-features
> section into other sections and removes that section.
> For consistency, this reorganization has been done on the make section
> of the docs as well.
> ---
>  doc/src/sgml/installation.sgml | 140 ++---
>  1 file changed, 57 insertions(+), 83 deletions(-)
>
> diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
> index 7e65cdd72e..d7ab0c205e 100644
> --- a/doc/src/sgml/installation.sgml
> +++ b/doc/src/sgml/installation.sgml
> @@ -1214,23 +1214,6 @@ build-postgresql:
> 
>
>
> - 
> -
> -   
> -
> -   
> -Anti-Features
> -
> -
> - The options described in this section allow disabling
> - certain PostgreSQL features that are built
> - by default, but which might need to be turned off if the required
> - software or system features are not available.  Using these options is
> - not recommended unless really necessary.
> -
> -
> - 
> -
>
> --without-readline
> 

I don't think this is quite right. The section above the list says

"The options described in this section enable building of various PostgreSQL
features that are not built by default. Most of these are non-default only
because they require additional software, as described in Section 17.1."

So just merging --without-icu, --without-readline, --without-zlib,
--disable-thread-safety, in with the rest doesn't quite seem right.

I suspect that the easiest way for that is to just move --disable-atomics,
--disable-spinlocks to the developer section and then to leave the
anti-features section around for autoconf.

Any better idea?

Greetings,

Andres Freund




Re: Use COPY for populating all pgbench tables

2023-06-09 Thread Gurjeet Singh
On Fri, Jun 9, 2023 at 6:20 PM Gurjeet Singh  wrote:
> On Fri, Jun 9, 2023 at 5:42 PM Gregory Smith  wrote:
> > On Fri, Jun 9, 2023 at 1:25 PM Gurjeet Singh  wrote:
> >>
> >> > $ pgbench -i -I dtGvp -s 500
> >>
> >> The steps are severely under-documented in pgbench --help output.
> >
> > I agree it's not easy to find information.  I just went through double 
> > checking I had the order recently enough to remember what I did.  The man 
> > pages have this:
> >
> > > Each step is invoked in the specified order. The default is dtgvp.
> >
> > Which was what I wanted to read.  Meanwhile the --help output says:
> >
> > >  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
> >
> > %T$%%Which has the information without a lot of context for what it's used 
> > for.  I'd welcome some text expansion that added a minimal but functional 
> > improvement to that the existing help output; I don't have such text.
>
> Please see attached 2 variants of the patch. Variant 1 simply tells
> the reader to consult pgbench documentation. The second variant
> provides a description for each of the letters, as the documentation
> does. The committer can pick  the one they find suitable.

(I was short on time, so had to keep it short in the above email. To
justify this additional email, I have added 2 more options to choose
from. :)

The text ", in the specified order" is an important detail, that
should be included irrespective of the rest of the patch.

My preference would be to use the first variant, since the second one
feels too wordy for --help output. I believe we'll have to choose
between these two; the alternatives will not make anyone happy.

These two variants show the two extremes; bare minimum vs. everything
but the kitchen sink. So one may feel the need to find a middle ground
and provide a "sufficient, but not too much" variant. I have attempted
that in variants 3 and 4; see attached.

The third variant does away with the list of steps, and uses a
paragraph to describe the letters. And the fourth variant makes that
paragraph terse.

In the order of preference I'd choose variant 1, then 2. Variants 3
and 4 feel like a significant degradation from variant 2.

Attached samples.txt shows the snippets of --help output of each of
the variants/patches, for comparison.

Best regards,
Gurjeet
http://Gurje.et
 variant-1-brief-pointer-to-docs.patch
  -i, --initialize invokes initialization mode
  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
   run selected initialization steps, in the specified 
order
   see pgbench documentation for a description of these 
steps
  -F, --fillfactor=NUM set fill factor

 variant-2-detailed-description.patch
  -i, --initialize invokes initialization mode
  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
   run selected initialization steps, in the specified 
order
   d: Drop any existing pgbench tables
   t: Create the tables used by the standard pgbench 
scenario
   g: Generate data, client-side
   G: Generate data, server-side
   v: Invoke VACUUM on the standard tables
   p: Create primary key indexes on the standard tables
   f: Create foreign key constraints between the 
standard tables
  -F, --fillfactor=NUM set fill factor

 variant-3-details-not-list.patch
  -i, --initialize invokes initialization mode
  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
   run selected initialization steps, in the specified 
order.
   These steps operate on standard tables used by 
pgbench
   'd' drops the tables, 't' creates the tables, 'g' 
generates
   the data on client-side, 'G' generates data on 
server-side,
   'v' vaccums the tables, 'p' creates primary key 
constraints,
   and 'f' creates foreign key constraints
  -F, --fillfactor=NUM set fill factor

 variant-4-terse-details-not-list.patch
  -i, --initialize invokes initialization mode
  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
   run selected initialization steps, in the specified 
order.
   These steps operate on standard tables used by 
pgbench
   'd' drop table, 't' create tables, 'g' generate data
   client-side, 'G' generate data server-side, 'v' 
vaccum
   tables, 'p' create primary keys, 'f' create foreign 
keys
  -F, --fillfactor=NUM set fill factor
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 508ed218e8..c74a596b86 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -876,7 +876,8 @@ usage(void)
 		   "\nInitialization options:\n"

Re: [PATCH] Missing dep on Catalog.pm in meson rules

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-09 13:58:46 -0500, Tristan Partin wrote:
> Patch looks good to me!

Thanks for the report Ilmari and the review Tristan! Pushed the fix.

Regards,

Andres




Re: WL_SOCKET_ACCEPT fairness on Windows

2023-06-09 Thread Thomas Munro
On Thu, May 18, 2023 at 8:53 PM Wei Wang (Fujitsu)
 wrote:
> On Sat, April 1, 2023 at 11:00 AM Thomas Munro  wrote:
> I tried to test this patch on Windows. And I did cover the new code path 
> below:
> ```
> +   /* We have another event to decode. */
> +   cur_event = &set->events[next_pos + (rc - WAIT_OBJECT_0)];
> ```
>
> But I have one thing want to confirm:
> In my tests, I think the scenario with two different events (e.g., one ending
> socket connection and one incoming socket connection) has been optimized.
> However, it seems that when there are multiple incoming socket connections, 
> the
> function WaitEventSetWaitBlock is called multiple times instead of being 
> called
> once. Is this our expected result?

Thanks for testing!  Maybe I am misunderstanding something: what I
expect to happen is that we call *WaitForMultipleObjects()* one extra
time (to see if there is another event available immediately, and
usually there is not), but not WaitEventSetWaitBlock().

> Here are my test details:
> I use the debugger to ensure that multiple events occur when the function
> WaitEventSetWaitBlock is called. First, I added a breakpoint in the below 
> code:
> ```
> /*
> * Sleep.
> *
> * Need to wait for ->nevents + 1, because signal handle is in 
> [0].
> */
> b   rc = WaitForMultipleObjects(set->nevents + 1, set->handles, 
> FALSE,
> 
> cur_timeout);
> ```
> And then make sure that the postmaster process enters the function
> WaitForMultipleObjects. (I think the postmaster process will only return from
> the function WaitForMultipleObjects when any object is signaled or timeout
> occurs). Before the timeout occurs, I set up multiple socket connections using
> psql (the first connection makes the process returns from the function
> WaitForMultipleObjects). Then, as I continued debugging, multiple socket
> connections were handled by different calls of the function
> WaitEventSetWaitBlock.

This is a good test.  Also, to test the exact scenario I was worrying
about, you could initiate 4 psql sessions while the server is blocked
in a debugger, 2 on a Unix domain socket, and 2 on a TCP socket, and
then when you "continue" the server with a break on (say) accept() so
that you can "continue" each time, you should see that it alternates
between the two sockets accepting from both "fairly", instead of
draining one socket entirely first.

> Please let me know if I am missing something.

I think your report already shows that it basically works correctly.
Do you think it's worth committing for 17 to make it work more like
Unix, or was I just being paranoid?




Add last_commit_lsn to pg_stat_database

2023-06-09 Thread James Coleman
I've previously noted in "Add last commit LSN to
pg_last_committed_xact()" [1] that it's not possible to monitor how
many bytes of WAL behind a logical replication slot is (computing such
is obviously trivial for physical slots) because the slot doesn't need
to replicate beyond the last commit. In some cases it's possible for
the current WAL location to be far beyond the last commit. A few
examples:

- An idle server with checkout_timeout at a lower value (so lots of
empty WAL advance).
- Very large transactions: particularly insidious because committing a
1 GB transaction after a small transaction may show almost zero time
lag even though quite a bit of data needs to be processed and sent
over the wire (so time to replay is significantly different from
current lag).
- A cluster with multiple databases complicates matters further,
because while physical replication is cluster-wide, the LSNs that
matter for logical replication are database specific.

Since we don't expose the most recent commit's LSN there's no way to
say "the WAL is currently 1250, the last commit happened at 1000, the
slot has flushed up to 800, therefore there are at most 200 bytes
replication needs to read through to catch up.

In the aforementioned thread [1] I'd proposed a patch that added a SQL
function pg_last_commit_lsn() to expose the most recent commit's LSN.
Robert Haas didn't think the initial version's modifications to
commit_ts made sense, and a subsequent approach adding the value to
PGPROC didn't have strong objections, from what I can see, but it also
didn't generate any enthusiasm.

As I was thinking about how to improve things, I realized that this
information (since it's for monitoring anyway) fits more naturally
into the stats system. I'd originally thought of exposing it in
pg_stat_wal, but that's per-cluster rather than per-database (indeed,
this is a flaw I hadn't considered in the original patch), so I think
pg_stat_database is the correct location.

I've attached a patch to track the latest commit's LSN in pg_stat_database.

Regards,
James Coleman

1: 
https://www.postgresql.org/message-id/flat/caaaqye9qbiau+j8rbun_jkbre-3hekluhfvvsyfsxqg0vql...@mail.gmail.com


v1-0001-Add-last-commit-s-LSN-to-pg_stat_database.patch
Description: Binary data


Re: Fix search_path for all maintenance commands

2023-06-09 Thread Jeff Davis
On Fri, 2023-06-09 at 16:29 -0700, Gurjeet Singh wrote:
> I'm not sure if mine is a valid concern, and it has been a long time
> since I looked at the search_path's and switching Role's implications
> (CVE-2009-4136) so pardon my ignorance.
> 
> It feels a bit late in release cycle to introduce this breaking
> change.

That's a valid concern. It just needs to be weighed against the
potential problems of running maintenance code with different search
paths, and the interaction with the new MAINTAIN privilege.

> I feel more thought needs to be given to the impact of this change,
> and we should to give others more time for feedback.

For context, I initially posted to the -security list in case it needed
to be addressed there, and got some feedback on the patch before
posting to -hackers two weeks ago. So it has been seen by at least 4
people.

But I'm happy to hear more input and I'll backtrack if necessary.

Here are my thoughts:

Lazy VACUUM is by far the most important for the overall system. It's
unaffected by this change; see comment in vacuum_rel():

  /*  
   * Switch to the table owner's userid, so that any index functions
are run
   * as that user.  Also lock down security-restricted operations and
   * arrange to make GUC variable changes local to this command. (This
is
   * unnecessary, but harmless, for lazy VACUUM.)
   */

REINDEX, CLUSTER, and VACUUM FULL are potentially affected because of
index functions, but only if the index functions are quite broken (or
at least very fragile) already.

REFRESH MATERIALIZED VIEW is the most likely to be affected because it
is more likely to call "interesting" functions and the author may not
anticipate a different search path.

A normal dump/reload cycle for upgrade testing will catch these
problems because it will create indexes after loading the data
(DefineIndex sets the search path), and it will also call REFRESH
MATERIALIZED VIEW. If using pg_upgrade instead, a post-upgrade ANALYZE
will catch index function problems, but I suppose not MV problems.

So there is some risk to this change. It feels fairly narrow to me, but
non-zero. Perhaps we can do more?

> Short of that, it'd be prudent to allow the user to somehow fall back
> to old behaviour; a command-line option, or GUC, etc. That way we can
> mark the old behaviour "deprecated", with a workaround for those who
> may desperately need it, and in another release or so, finally pull
> the plug on old behaviour.

That sounds wise, though others may not like the idea of a GUC just for
this change.

Regards,
Jeff Davis





Re: Use COPY for populating all pgbench tables

2023-06-09 Thread Gurjeet Singh
On Fri, Jun 9, 2023 at 5:42 PM Gregory Smith  wrote:
>
> On Fri, Jun 9, 2023 at 1:25 PM Gurjeet Singh  wrote:
>>
>> > $ pgbench -i -I dtGvp -s 500
>>
>> The steps are severely under-documented in pgbench --help output.
>
>
> I agree it's not easy to find information.  I just went through double 
> checking I had the order recently enough to remember what I did.  The man 
> pages have this:
>
> > Each step is invoked in the specified order. The default is dtgvp.
>
> Which was what I wanted to read.  Meanwhile the --help output says:
>
> >  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")
>
> %T$%%Which has the information without a lot of context for what it's used 
> for.  I'd welcome some text expansion that added a minimal but functional 
> improvement to that the existing help output; I don't have such text.

Please see attached 2 variants of the patch. Variant 1 simply tells
the reader to consult pgbench documentation. The second variant
provides a description for each of the letters, as the documentation
does. The committer can pick  the one they find suitable.

Best regards,
Gurjeet
http://Gurje.et


variant-2-detailed-description.patch
Description: Binary data


variant-1-brief-pointer-to-docs.patch
Description: Binary data


Re: PG 16 draft release notes ready

2023-06-09 Thread Bruce Momjian
On Thu, Jun  8, 2023 at 02:23:33PM +0900, Masahiko Sawada wrote:
> Hi,
> 
> On Fri, May 19, 2023 at 5:49 AM Bruce Momjian  wrote:
> >
> > I have completed the first draft of the PG 16 release notes.  You can
> > see the output here:
> >
> > https://momjian.us/pgsql_docs/release-16.html
> >
> > I will adjust it to the feedback I receive;  that URL will quickly show
> > all updates.
> >
> 
> 
> 
> 
> 
> Remove pg_walinspect functions
> pg_get_wal_records_info_till_end_of_wal() and
> pg_get_wal_stats_till_end_of_wal().
> 
> 
> 
> I found that this item misses the author, Bharath Rupireddy. Please
> add his name.

Thanks, fixed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Use COPY for populating all pgbench tables

2023-06-09 Thread Gregory Smith
On Fri, Jun 9, 2023 at 1:25 PM Gurjeet Singh  wrote:

> > $ pgbench -i -I dtGvp -s 500
>
> The steps are severely under-documented in pgbench --help output.
>

I agree it's not easy to find information.  I just went through double
checking I had the order recently enough to remember what I did.  The man
pages have this:

> Each step is invoked in the specified order. The default is dtgvp.

Which was what I wanted to read.  Meanwhile the --help output says:

>  -I, --init-steps=[dtgGvpf]+ (default "dtgvp")

%T$%%Which has the information without a lot of context for what it's used
for.  I'd welcome some text expansion that added a minimal but functional
improvement to that the existing help output; I don't have such text.


Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Bruce Momjian
On Thu, Jun  8, 2023 at 11:37:00AM +1200, Thomas Munro wrote:
> It's old, but this describes the 4 main models and which well known
> RDBMSes use them in section 2.3:
> 
> https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf
> 
> TL;DR DB2 is the winner, it can do process-per-connection,
> thread-per-connection, process-pool or thread-pool.
> 
> I understand this thread to be about thread-per-connection (= backend,
> session, socket) for now.

I am quite confused that few people seem to care about which model,
processes or threads, is better for Oracle, and how having both methods
available can be a reasonable solution to maintain.  Someone suggested
they abstracted the differences so the maintenance burden was minor, but
that seems very hard to me.

Did these vendors start with processes, add threads, and then find that
threads had downsides so they had to keep both?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Bruce Momjian
On Wed, Jun  7, 2023 at 06:38:38PM +0530, Ashutosh Bapat wrote:
> With multiple processes, we can use all the available cores (at least
> theoretically if all those processes are independent). But is that
> guaranteed with single process multi-thread model? Google didn't throw
> any definitive answer to that. Usually it depends upon the OS and
> architecture.
> 
> Maybe a good start is to start using threads instead of parallel
> workers e.g. for parallel vacuum, parallel query and so on while
> leaving the processes for connections and leaders. that itself might
> take significant time. Based on that experience move to a completely
> threaded model. Based on my experience with other similar products, I
> think we will settle on a multi-process multi-thread model.

I think we have a few known problem that we might be able to solve
without threads, but can help us eventually move to threads if we find
it useful:

1)  Use threads for background workers rather than processes
2)  Allow sessions to be stopped and started by saving their state

Ideally we would solve the problem of making shared structures
resizable, but I am not sure how that can be easily done without
threads.
 
-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Fix search_path for all maintenance commands

2023-06-09 Thread Gurjeet Singh
On Fri, Jun 9, 2023 at 2:00 PM Jeff Davis  wrote:
>
> On Thu, 2023-06-08 at 21:55 -0700, Nathan Bossart wrote:
> > On Thu, Jun 08, 2023 at 06:08:08PM -0400, Greg Stark wrote:
> > > I guess that's pretty narrow and a reasonable thing to desupport.
> > > Users could just mark those functions with search_path or schema
> > > qualify the object references in them. Perhaps we should also be
> > > picking up cases like that sooner so users realize they've created
> > > a
> > > footgun for themselves?
>
> Many cases will be picked up, for instance CREATE INDEX will error if
> the safe search path is not good enough.
>
> > I'm inclined to agree that this is reasonable to desupport.
>
> Committed.

I'm not sure if mine is a valid concern, and it has been a long time
since I looked at the search_path's and switching Role's implications
(CVE-2009-4136) so pardon my ignorance.

It feels a bit late in release cycle to introduce this breaking
change. If they depended on search_path, people and utilities that use
these maintenance commands may see failures. Although I can't think of
a scenario where such a failure may cause an outage, sometimes these
maintenance operations are performed while the users are staring down
the barrel of a gun (imminent danger of running out of space, bad
statistics causing absurd query plans, etc.). So, if not directly, a
failure of these operations may indirectly cause an outage.

I feel more thought needs to be given to the impact of this change,
and we should to give others more time for feedback.

Short of that, it'd be prudent to allow the user to somehow fall back
to old behaviour; a command-line option, or GUC, etc. That way we can
mark the old behaviour "deprecated", with a workaround for those who
may desperately need it, and in another release or so, finally pull
the plug on old behaviour.

My 2bits.

Best regards,
Gurjeet
http://Gurje.et




add non-option reordering to in-tree getopt_long

2023-06-09 Thread Nathan Bossart
While working on 2dcd157, I noticed cfbot failures for Windows due to tests
with commands that had non-options specified before options.  For example,
"createuser myrole -a myadmin" specified a non-option (myrole) before the
option/argument pair (-a admin).  To get the tests passing for Windows,
non-options must be placed at the end (e.g., "createuser -a myadmin
myrole").  This same problem was encountered while working on 08951a7 [0],
but it was again fortunately caught with cfbot.  Others have not been so
lucky [1] [2] [3].

The reason for this discrepancy is because Windows uses the in-tree
implementation of getopt_long(), which differs from the other
implementations I've found in that it doesn't reorder non-options to the
end of argv by default.  Instead, it returns -1 as soon as the first
non-option is found, even if there are other options listed afterwards.  By
moving non-options to the end of argv, getopt_long() can parse all
specified options and return -1 when only non-options remain.  The
implementations I reviewed all reorder argv unless the POSIXLY_CORRECT
environment variable is set or the "optstring" argument begins with '+'.

The best reasons I can think of to keep the current behavior are 1)
reordering involves writing to the original argv array, which could be
risky (as noted by Tom [4]) and 2) any systems with a getopt_long()
implementation that doesn't reorder non-options could begin failing tests
that take advantage of this behavior.  However, this quirk in the in-tree
getopt_long() is periodically missed by hackers, the only systems I'm aware
of that use it are Windows and AIX, and every other implementation of
getopt_long() I saw reorders non-options by default.  Furthermore, C99
omits const decorations in main()'s signature, so modifying argv might not
be too scary.

Thus, I propose introducing this non-option reordering behavior but
allowing it to be disabled via the POSIXLY_CORRECT environment variable.
The attached patch is my first attempt at implementing this proposal.  I
don't think we need to bother with handling '+' at the beginning of
"optstring" since it seems unlikely to be used in PostgreSQL, but it would
probably be easy enough to add if folks want it.

I briefly looked at getopt() and concluded that we should probably retain
its POSIX-compliant behavior for non-options, as reordering support seems
much less universal than with getopt_long().  AFAICT all client utilities
use getopt_long(), anyway.

Thoughts?

[0] 
https://postgr.es/m/20220525.110752.305692011781436338.horikyota.ntt%40gmail.com
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=869aa40
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ffd3980
[3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d9ddc50
[4] https://postgr.es/m/20539.1237314382%40sss.pgh.pa.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 68c866da205592a370279d6b2a180e0888b0587c Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 9 Jun 2023 15:51:58 -0700
Subject: [PATCH v1 1/1] Teach in-tree getopt_long() to move non-options to the
 end of argv.

---
 src/bin/scripts/t/040_createuser.pl | 10 ---
 src/port/getopt_long.c  | 43 ++---
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/src/bin/scripts/t/040_createuser.pl b/src/bin/scripts/t/040_createuser.pl
index 9ca282181d..ba40ab11a3 100644
--- a/src/bin/scripts/t/040_createuser.pl
+++ b/src/bin/scripts/t/040_createuser.pl
@@ -42,9 +42,9 @@ $node->issues_sql_like(
 	'add a role as a member with admin option of the newly created role');
 $node->issues_sql_like(
 	[
-		'createuser', '-m',
-		'regress_user3', '-m',
-		'regress user #4', 'REGRESS_USER5'
+		'createuser', 'REGRESS_USER5',
+		'-m', 'regress_user3',
+		'-m', 'regress user #4'
 	],
 	qr/statement: CREATE ROLE "REGRESS_USER5" NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN NOREPLICATION NOBYPASSRLS ROLE regress_user3,"regress user #4";/,
 	'add a role as a member of the newly created role');
@@ -73,11 +73,13 @@ $node->issues_sql_like(
 	qr/statement: CREATE ROLE regress_user11 NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN NOREPLICATION NOBYPASSRLS IN ROLE regress_user1;/,
 	'--role');
 $node->issues_sql_like(
-	[ 'createuser', '--member-of', 'regress_user1', 'regress_user12' ],
+	[ 'createuser', 'regress_user12', '--member-of', 'regress_user1' ],
 	qr/statement: CREATE ROLE regress_user12 NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN NOREPLICATION NOBYPASSRLS IN ROLE regress_user1;/,
 	'--member-of');
 
 $node->command_fails([ 'createuser', 'regress_user1' ],
 	'fails if role already exists');
+$node->command_fails([ 'createuser', 'regress_user1', '-m', 'regress_user2', 'regress_user3' ],
+	'fails for too many non-options');
 
 done_testing();
diff --git a/src/port/getopt_long.c b/src/port/getopt_long.c
index c989276988..94e76769a3 100644
--- a/src/port/getopt_long.c
+++ b/src/port/getopt_long.c

Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Stephen Frost
Greetings,

* Dave Cramer (davecramer@postgres.rocks) wrote:
> One thing I can think of is upgrading. AFAIK dump and restore is the only
> way to change the on disk format.
> Presuming that eventually we will be forced to change the on disk format it
> would be nice to be able to do so in a manner which does not force long
> down times

There is an ongoing effort moving in this direction.  The $subject isn't
great, but this patch set (which we are currently working on
updating...): https://commitfest.postgresql.org/43/3986/ attempts
changing a lot of currently compile-time block-size pieces to be
run-time which would open up the possibility to have a different page
format for, eg, different tablespaces.  Possibly even different block
sizes.  We'd certainly welcome discussion from others who are
interested.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: index prefetching

2023-06-09 Thread Gregory Smith
On Thu, Jun 8, 2023 at 11:40 AM Tomas Vondra 
wrote:

> We already do prefetching for bitmap index scans, where the bitmap heap
> scan prefetches future pages based on effective_io_concurrency. I'm not
> sure why exactly was prefetching implemented only for bitmap scans


At the point Greg Stark was hacking on this, the underlying OS async I/O
features were tricky to fix into PG's I/O model, and both of us did much
review work just to find working common ground that PG could plug into.
Linux POSIX advisories were completely different from Solaris's async
model, the other OS used for validation that the feature worked, with the
hope being that designing against two APIs would be better than just
focusing on Linux.  Since that foundation was all so brittle and limited,
scope was limited to just the heap scan, since it seemed to have the best
return on time invested given the parts of async I/O that did and didn't
scale as expected.

As I remember it, the idea was to get the basic feature out the door and
gather feedback about things like whether the effective_io_concurrency knob
worked as expected before moving onto other prefetching.  Then that got
lost in filesystem upheaval land, with so much drama around Solaris/ZFS and
Oracle's btrfs work.  I think it's just that no one ever got back to it.

I have all the workloads that I use for testing automated into
pgbench-tools now, and this change would be easy to fit into testing on
them as I'm very heavy on block I/O tests.  To get PG to reach full read
speed on newer storage I've had to do some strange tests, like doing index
range scans that touch 25+ pages.  Here's that one as a pgbench script:

\set range 67 * (:multiplier + 1)
\set limit 10 * :scale
\set limit :limit - :range
\set aid random(1, :limit)
SELECT aid,abalance FROM pgbench_accounts WHERE aid >= :aid ORDER BY aid
LIMIT :range;

And then you use '-Dmultiplier=10' or such to crank it up.  Database 4X
RAM, multiplier=25 with 16 clients is my starting point on it when I want
to saturate storage.  Anything that lets me bring those numbers down would
be valuable.

--
Greg Smith  greg.sm...@crunchydata.com
Director of Open Source Strategy


Re: Fix search_path for all maintenance commands

2023-06-09 Thread Jeff Davis
On Thu, 2023-06-08 at 21:55 -0700, Nathan Bossart wrote:
> On Thu, Jun 08, 2023 at 06:08:08PM -0400, Greg Stark wrote:
> > I guess that's pretty narrow and a reasonable thing to desupport.
> > Users could just mark those functions with search_path or schema
> > qualify the object references in them. Perhaps we should also be
> > picking up cases like that sooner so users realize they've created
> > a
> > footgun for themselves?

Many cases will be picked up, for instance CREATE INDEX will error if
the safe search path is not good enough.

> I'm inclined to agree that this is reasonable to desupport.

Committed.

> I bet we could skip forcing the search_path for maintenance commands
> run as
> the table owner, but such a discrepancy seems likely to cause far
> more
> confusion than anything else.

Agreed.

Regards,
Jeff Davis





Re: win32ver data in meson-built postgres.exe

2023-06-09 Thread Noah Misch
On Thu, Jun 08, 2023 at 01:10:00PM +0200, Magnus Hagander wrote:
> On Thu, Jun 8, 2023 at 3:45 AM Noah Misch  wrote:
> > On Wed, Jun 07, 2023 at 04:47:26PM -0700, Andres Freund wrote:
> > > On 2023-06-07 16:14:07 -0700, Noah Misch wrote:
> > > > postgres.exe is icon-free.
> > >
> > > We could also just change that.
> >
> > I would be +1 for that (only if done for all build systems).  Showing the
> > elephant in task manager feels better than showing the generic-exe icon.
> 
> I think this decision goes back all the way to the ancient times, and
> the argument was then "user should not use the postgres.exe file when
> clicking around" sort of. Back then, task manager didn't show the icon
> at all, regardless. It does now, so I'm +1 to add the icon (in all the
> build systems).

That sounds good, and it's the attached one-byte change.  That also simplifies
the Meson fix; new version attached.
Author: Noah Misch 
Commit: Noah Misch 

Give postgres.exe the icon of other executables.

We had left it icon-free since users won't achieve much by opening it
from Windows Explorer.  Subsequent to that decision, Task Manager
started to show the icon.  That shifts the balance in favor of attaching
the icon, so do so.  No back-patch, but make this late addition to v16.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/Makefile b/src/backend/Makefile
index e4bf0fe..3c42003 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -10,8 +10,7 @@
 #-
 
 PGFILEDESC = "PostgreSQL Server"
-# This is a daemon process, which is why it is not labeled as an executable
-#PGAPPICON=win32
+PGAPPICON=win32
 
 subdir = src/backend
 top_builddir = ../..
Author: Noah Misch 
Commit: Noah Misch 

Add win32ver data to meson-built postgres.exe.

As in the older build systems, the resources object is not an input to
postgres.def.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/meson.build b/src/backend/meson.build
index ccfc382..88a35e9 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -5,6 +5,7 @@ backend_sources = []
 backend_link_with = [pgport_srv, common_srv]
 
 generated_backend_sources = []
+post_export_backend_sources = []
 
 subdir('access')
 subdir('archive')
@@ -133,8 +134,15 @@ if dtrace.found() and host_system != 'darwin'
   )
 endif
 
+if host_system == 'windows'
+  post_export_backend_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
+'--NAME', 'postgres',
+'--FILEDESC', 'PostgreSQL Server',])
+endif
+
 postgres = executable('postgres',
   backend_input,
+  sources: post_export_backend_sources,
   objects: backend_objs,
   link_args: backend_link_args,
   link_with: backend_link_with,


Re: Cleaning up nbtree after logical decoding on standby work

2023-06-09 Thread Peter Geoghegan
On Fri, Jun 9, 2023 at 12:03 PM Andres Freund  wrote:
> From what I can tell it's largely consistent with other parameters of the
> respective function. E.g. btinsert(), _bt_doinsert() use camelCase for most
> parameters, so heapRel fits in.  There are a few cases where it's not obvious
> what the pattern is intended to be :/.

It's not 100% clear what the underlying principle is, but we mix
camelCase and underscore styles all the time, so that's always kinda
true.

> > My new plan is to commit this tomorrow, since the clear consensus is
> > that we should go ahead with this for 16.
>
> I'm not sure there is that concensus (for me half the changes shouldn't be
> done, the rest should be in 17), but in the end it doesn't matter that much.

Really? What parts are you opposed to in principle? I don't see why
you wouldn't support everything or nothing for 17 (questions of style
aside). I don't see what's ambiguous about what we should do here,
barring the 16-or-17 question.

It's not like nbtree ever really "used P_NEW". It doesn't actually
depend on any of the P_NEW handling inside bufmgr.c. It looks a little
like it might, but that's just an accident.

> > --- a/src/include/utils/tuplesort.h
> > +++ b/src/include/utils/tuplesort.h
> > @@ -399,9 +399,7 @@ extern Tuplesortstate *tuplesort_begin_heap(TupleDesc 
> > tupDesc,
> > 
> >   int workMem, SortCoordinate coordinate,
> > 
> >   int sortopt);
> >  extern Tuplesortstate *tuplesort_begin_cluster(TupleDesc tupDesc,
> > -   
> >  Relation indexRel,
> > -   
> >  Relation heaprel,
> > -   
> >  int workMem,
> > +   
> >  Relation indexRel, int workMem,
> > 
> >  SortCoordinate coordinate,
> > 
> >  int sortopt);
>
> I think we should continue to provide the table here, even if we don't need it
> today.

I don't see why, but okay. I'll do it that way.

-- 
Peter Geoghegan




Adding a pg_get_owned_sequence function?

2023-06-09 Thread Dagfinn Ilmari Mannsåker
Hi hackers,

I've always been annoyed by the fact that pg_get_serial_sequence takes
the table and returns the sequence as strings rather than regclass. And
since identity columns were added, the name is misleading as well (which
is even acknowledged in the docs, together with a suggestion for a
better name).

So, instead of making excuses in the documentation, I thought why not
add a new function which addresses all of these issues, and document the
old one as a backward-compatibilty wrapper?

Please see the attached patch for my stab at this.

- ilmari

>From 7fd37c15920b7d2e87edef4351932559e2c9ef4f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Fri, 9 Jun 2023 19:55:58 +0100
Subject: [PATCH] Add pg_get_owned_sequence function

This is the name the docs say `pg_get_serial_sequence` sholud have
had, and gives us the opportunity to change the return and table
argument types to `regclass` and the column argument to `name`,
instead of using `text` everywhere.  This matches what's in catalogs,
and requires less explaining than the rules for
`pg_get_serial_sequence`.
---
 doc/src/sgml/func.sgml | 46 -
 src/backend/utils/adt/ruleutils.c  | 69 +++---
 src/include/catalog/pg_proc.dat|  3 ++
 src/test/regress/expected/identity.out |  6 +++
 src/test/regress/expected/sequence.out |  6 +++
 src/test/regress/sql/identity.sql  |  1 +
 src/test/regress/sql/sequence.sql  |  1 +
 7 files changed, 102 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a47ce4343..687a8480e6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24327,6 +24327,35 @@

   
 
+  
+   
+
+ pg_get_owned_sequence
+
+pg_get_owned_sequence ( table regclass, column name )
+regclass
+   
+   
+Returns the sequence associated with a column, or NULL if no sequence
+is associated with the column.  If the column is an identity column,
+the associated sequence is the sequence internally created for that
+column.  For columns created using one of the serial types
+(serial, smallserial, bigserial),
+it is the sequence created for that serial column definition.  In the
+latter case, the association can be modified or removed
+with ALTER SEQUENCE OWNED BY.  The result is
+suitable for passing to the sequence functions (see
+).
+   
+   
+A typical use is in reading the current value of the sequence for an
+identity or serial column, for example:
+
+SELECT currval(pg_get_owned_sequence('sometable', 'id'));
+
+   
+  
+
   

 
@@ -24336,19 +24365,10 @@
 text


-Returns the name of the sequence associated with a column,
-or NULL if no sequence is associated with the column.
-If the column is an identity column, the associated sequence is the
-sequence internally created for that column.
-For columns created using one of the serial types
-(serial, smallserial, bigserial),
-it is the sequence created for that serial column definition.
-In the latter case, the association can be modified or removed
-with ALTER SEQUENCE OWNED BY.
-(This function probably should have been
-called pg_get_owned_sequence; its current name
-reflects the fact that it has historically been used with serial-type
-columns.)  The first parameter is a table name with optional
+A backwards-compatibility wrapper
+for pg_get_owned_sequence, which
+uses text for the table and sequence names instead of
+regclass.  The first parameter is a table name with optional
 schema, and the second parameter is a column name.  Because the first
 parameter potentially contains both schema and table names, it is
 parsed per usual SQL rules, meaning it is lower-cased by default.
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index d3a973d86b..b20a1e7583 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -518,6 +518,7 @@ static char *generate_qualified_type_name(Oid typid);
 static text *string_to_text(char *str);
 static char *flatten_reloptions(Oid relid);
 static void get_reloptions(StringInfo buf, Datum reloptions);
+static Oid pg_get_owned_sequence_internal(Oid tableOid, char *column);
 
 #define only_marker(rte)  ((rte)->inh ? "" : "ONLY ")
 
@@ -2763,6 +2764,28 @@ pg_get_userbyid(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ * pg_get_owned_sequence
+ *		Get the sequence used by an identity or serial column.
+ */
+Datum
+pg_get_owned_sequence(PG_FUNCTION_ARGS)
+{
+	Oid			tableOid = PG_GETARG_OID(0);
+	char	   *column = NameStr(*PG_GETARG_NAME(1));
+	Oid			sequenceId;
+
+	sequenceId = pg_get_owned_sequence

Re: Cleaning up nbtree after logical decoding on standby work

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-08 08:50:31 -0700, Peter Geoghegan wrote:
> On Thu, Jun 8, 2023 at 7:22 AM Alvaro Herrera  wrote:
> > IMO this kind of change definitely does not have place in a post-beta1
> > restructuring patch.  We rarely indulge in case-fixing exercises at any
> > other time, and I don't see any good argument why post-beta1 is a better
> > time for it.
>
> There is a glaring inconsistency. Now about half of the relevant
> functions in nbtree.h use "heaprel", while the other half use
> "heapRel". Obviously that's not the end of the world, but it's
> annoying. It's exactly the kind of case-fixing exercise that does tend
> to happen.

>From what I can tell it's largely consistent with other parameters of the
respective function. E.g. btinsert(), _bt_doinsert() use camelCase for most
parameters, so heapRel fits in.  There are a few cases where it's not obvious
what the pattern is intended to be :/.



> My new plan is to commit this tomorrow, since the clear consensus is
> that we should go ahead with this for 16.

I'm not sure there is that concensus (for me half the changes shouldn't be
done, the rest should be in 17), but in the end it doesn't matter that much.



> --- a/src/include/utils/tuplesort.h
> +++ b/src/include/utils/tuplesort.h
> @@ -399,9 +399,7 @@ extern Tuplesortstate *tuplesort_begin_heap(TupleDesc 
> tupDesc,
>   
> int workMem, SortCoordinate coordinate,
>   
> int sortopt);
>  extern Tuplesortstate *tuplesort_begin_cluster(TupleDesc tupDesc,
> - 
>Relation indexRel,
> - 
>Relation heaprel,
> - 
>int workMem,
> + 
>Relation indexRel, int workMem,
>   
>SortCoordinate coordinate,
>   
>int sortopt);

I think we should continue to provide the table here, even if we don't need it
today.

Greetings,

Andres Freund




Re: [PATCH] Missing dep on Catalog.pm in meson rules

2023-06-09 Thread Tristan Partin
Patch looks good to me!

-- 
Tristan Partin
Neon (https://neon.tech)




Re: [PATCH] Missing dep on Catalog.pm in meson rules

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-02 10:13:44 -0500, Tristan Partin wrote:
> On Fri Jun 2, 2023 at 8:47 AM CDT, Andres Freund wrote:
> > Hi,
> >
> > On 2023-06-02 08:10:43 -0500, Tristan Partin wrote:
> > > > I wonder if we instead could just make perl output the files it loads 
> > > > and
> > > > handle dependencies automatically that way? But that's more work, so 
> > > > it's
> > > > probably the right thing to go for the manual path for now.
> > > 
> > > I am not familar with Perl enough (at all haha) to know if that is
> > > possible. I don't know exactly what these Perl files do, but perhaps it
> > > might make sense to have some global lookup table that is setup near the
> > > beginning of the script.
> >
> > It'd be nice to have something more general - there are other perl modules 
> > we
> > load, e.g.
> > ./src/backend/catalog/Catalog.pm
> > ./src/backend/utils/mb/Unicode/convutils.pm
> > ./src/tools/PerfectHash.pm
> >
> >
> > > perl_files = {
> > >   'Catalog.pm': files('path/to/Catalog.pm'),
> > >   ...
> > > }
> >
> > I think you got it, but just to make sure: I was thinking of generating a
> > depfile from within perl. Something like what you propose doesn't quite 
> > seems
> > like a sufficient improvement.
> 
> Whatever I am proposing is definitely subpar to generating a depfile. So
> if that can be done, that is the best option!

I looked for a bit, but couldn't find an easy way to do so. I would still like
to pursue going towards dep files for the perl scripts, even if that requires
explicit support in the perl scripts, but that's a change for later.


> > > Otherwise, manual as it is in the original patch seems like an alright
> > > compromise for now.
> >
> > Yea. I'm working on a more complete version, also dealing with dependencies 
> > on
> > PerfectHash.pm.
> 
> Good to hear. Happy to review any patches :).

Attached.

Greetings,

Andres Freund
>From e64cd6233ee8637e2e66f9e31085f5c2c5c5c388 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 9 Jun 2023 11:41:42 -0700
Subject: [PATCH v2] meson: Add dependencies to perl modules to various script
 invocations
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reported-by: Dagfinn Ilmari Mannsåker 
Discussion: https://postgr.es/m/87v8g7s6bf@wibble.ilmari.org
---
 meson.build | 14 ++
 src/include/catalog/meson.build |  2 +-
 src/include/nodes/meson.build   |  1 +
 src/include/utils/meson.build   |  1 +
 src/common/meson.build  |  4 ++--
 src/common/unicode/meson.build  |  3 +++
 src/pl/plpgsql/src/meson.build  |  7 ---
 src/interfaces/ecpg/preproc/meson.build | 19 ---
 8 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/meson.build b/meson.build
index 16b2e866462..82f2782673e 100644
--- a/meson.build
+++ b/meson.build
@@ -2681,6 +2681,20 @@ gen_export_kwargs = {
 
 
 
+###
+### Helpers for custom targets used across the tree
+###
+
+catalog_pm = files('src/backend/catalog/Catalog.pm')
+perfect_hash_pm = files('src/tools/PerfectHash.pm')
+gen_kwlist_deps = [perfect_hash_pm]
+gen_kwlist_cmd = [
+  perl, '-I', '@SOURCE_ROOT@/src/tools',
+  files('src/tools/gen_keywordlist.pl'),
+  '--output', '@OUTDIR@', '@INPUT@']
+
+
+
 ###
 ### windows resources related stuff
 ###
diff --git a/src/include/catalog/meson.build b/src/include/catalog/meson.build
index 3179be09d3d..c3fd05d0279 100644
--- a/src/include/catalog/meson.build
+++ b/src/include/catalog/meson.build
@@ -111,7 +111,7 @@ generated_catalog_headers = custom_target('generated_catalog_headers',
   output: output_files,
   install_dir: output_install,
   input: input,
-  depend_files: bki_data_f,
+  depend_files: bki_data_f + catalog_pm,
   build_by_default: true,
   install: true,
   command: [
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index 9a8e85c4a5e..626dc696d51 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -50,6 +50,7 @@ node_support_install = [
 generated_nodes = custom_target('nodetags.h',
   input: node_support_input,
   output: node_support_output,
+  depend_files: catalog_pm,
   command: [
 perl, files('../../backend/nodes/gen_node_support.pl'),
 '-o', '@OUTDIR@',
diff --git a/src/include/utils/meson.build b/src/include/utils/meson.build
index 2fed1e3f8e9..c212c4091f7 100644
--- a/src/include/utils/meson.build
+++ b/src/include/utils/meson.build
@@ -44,6 +44,7 @@ fmgrtab_output = ['fmgroids.h', 'fmgrprotos.h', 'fmgrtab.c']
 fmgrtab_target = custom_target('fmgrtab',
   input: '../catalog/pg_proc.dat',
   output : fmgrtab_output,
+  depend_files: catalog_pm,
   command: [perl, '-I', '@SOURCE_ROOT@/src/backend/catalog/', files('../../backend/utils/Gen_fmgrtab.pl'), '--include-path=@SOURCE_ROOT@/src/include', '--output=@OUTDIR@', '@INPUT@'],
   install: true,
   install_dir: [dir_include_server / 'utils', dir_include_server / 'utils', false],
diff 

Re: Meson build updates

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-09 13:15:27 -0500, Tristan Partin wrote:
> On Fri Jun 9, 2023 at 11:41 AM CDT, Andres Freund wrote:
> > I like the current version better - depending on the meson version makes it
> > easy to find what needs to be removed when we upgrade the meson minimum
> > version.
> 
> I think the savings of not looking up selinux/systemd on non-Linux
> systems is pretty big. Would you accept a change of something like:

For selinux it's default disabled, so it doesn't make a practical
difference. Outside of linux newer versions of meson are more common IME, so
I'm not really worried about it for systemd.


> if meson.version().version_compare('>=0.59')
>   # do old stuff
> else if host_system == 'linux'
>   # do new stuff
> endif
> 
> Otherwise, I am happy to remove the patch.

Hm, I don't quite know how this would end up looking like.


> > > From 189d3ac3d5593ce3e475813e4830a29bb4e96f70 Mon Sep 17 00:00:00 2001
> > > From: Tristan Partin 
> > > Date: Wed, 17 May 2023 10:36:52 -0500
> > > Subject: [PATCH postgres v1 16/17] Add Meson overrides
> > > 
> > > Meson has the ability to do transparent overrides when projects are used
> > > as subprojects. For instance, say I am building a Postgres extension. I
> > > can define Postgres to be a subproject of my extension given the
> > > following wrap file:
> > > 
> > > [wrap-git]
> > > url = https://git.postgresql.org/git/postgresql.git
> > > revision = master
> > > depth = 1
> > > 
> > > [provide]
> > > dependency_names = libpq
> > > 
> > > Then in my extension (root project), I can have the following line
> > > snippet:
> > > 
> > > libpq = dependency('libpq')
> > > 
> > > This will tell Meson to transparently compile libpq prior to it
> > > compiling my extension (because I depend on libpq) if libpq isn't found
> > > on the host system.
> > > 
> > > I have also added overrides for the various public-facing exectuables.
> > > Though I don't expect them to get much usage, might as well go ahead and
> > > override them. They can be used by adding the following line to the
> > > aforementioned wrap file:
> >
> > I think adding more boilerplate to all these places does have some cost. So
> > I'm not really convinced this is worth doign.
> 
> Could you explain more about what costs you foresee?

Repetitive code that needs to be added to each further binary we add. I don't
mind doing that if it has a use case, but I'm not sure I see the use case for
random binaries...


> > > From 5ee13f09e4101904dbc9887bd4844eb5f1cb4fea Mon Sep 17 00:00:00 2001
> > > From: Tristan Partin 
> > > Date: Wed, 17 May 2023 10:54:53 -0500
> > > Subject: [PATCH postgres v1 17/17] Remove Meson program options for 
> > > specifying
> > >  paths
> > > 
> > > Meson has a built-in way to override paths without polluting project
> > > build options called machine files.
> >
> > I think meson machine files are just about unusable. For one, you can't
> > actually change any of the options without creating a new build dir. For
> > another, it's not something that easily can be done on the commandline.
> >
> > So I really don't want to remove these options.
> 
> I felt like this would be the most controversial change. What could be
> done in upstream Meson to make this a more enjoyable experience?

I think *requiring* separate files is a really poor experience when you come
from some other system, where those could trivially be overwritten on the
commandline.

The biggest change to make them more usable would be to properly reconfigure
when the contents of machine file change. IIRC configure is rerun, but the
changes aren't taken into account.


> I do however disagree with the usability of machine files. Could you add a
> little context about what you find unusable about them?

I can quickly change a meson option and run a build and tests. Trivially
scriptable. Whereas with a machine file I need to write code to edit a machine
file, re-configure from scratch, and only then can build / run tests.

It's particularly bad for cross builds, where unfortunately cross files can't
be avoided. It's imo the one area where autoconf beats meson handily.
--host=x86_64-w64-mingw32 works for autoconf. Whereas for meson I need to
manually write a cross file with a bunch of binaries set.
https://github.com/anarazel/postgres/commit/ae53f21d06b4dadf8e6b90df84000fad9a769eaf#diff-3420ebab4f1dbe2ba7102565b0b84e4d6d01fb8b3c1e375bd439eed604e743f8R1

There's some helpers aiming to generate cross files, but I've not been able to
generate something useful for cross compiling to windows, for example.

I've been unable to generate something like referenced in the above commit in
a reasonably concise way with env2mfile.


In the end, I also just don't see a meaningful benefit in forcing the use of
machine files.

Greetings,

Andres Freund




Re: index prefetching

2023-06-09 Thread Peter Geoghegan
On Fri, Jun 9, 2023 at 3:45 AM Tomas Vondra
 wrote:
> > What the exact historical timeline is may not be that important. My
> > emphasis on ScalarArrayOpExpr is partly due to it being a particularly
> > compelling case for both parallel index scan and prefetching, in
> > general. There are many queries that have huge in() lists that
> > naturally benefit a great deal from prefetching. Plus they're common.
> >
>
> Did you mean parallel index scan or bitmap index scan?

I meant parallel index scan (also parallel bitmap index scan). Note
that nbtree parallel index scans have special ScalarArrayOpExpr
handling code.

ScalarArrayOpExpr is kind of special -- it is simultaneously one big
index scan (to the executor), and lots of small index scans (to
nbtree). Unlike the queries that you've looked at so far, which really
only have one plausible behavior at execution time, there are many
ways that ScalarArrayOpExpr index scans can be executed at runtime --
some much faster than others. The nbtree implementation can in
principle reorder how it processes ranges from the key space (i.e.
each range of array elements) with significant flexibility.

> I think I understand, although maybe my mental model is wrong. I agree
> it seems inefficient, but I'm not sure why would it make prefetching
> hopeless. Sure, it puts index scans at a disadvantage (compared to
> bitmap scans), but it we pick index scan it should still be an
> improvement, right?

Hopeless might have been too strong of a word. More like it'd fall far
short of what is possible to do with a ScalarArrayOpExpr with a given
high end server.

The quality of the implementation (including prefetching) could make a
huge difference to how well we make use of the available hardware
resources. A really high quality implementation of ScalarArrayOpExpr +
prefetching can keep the system busy with useful work, which is less
true with other types of queries, which have inherently less
predictable I/O (and often have less I/O overall). What could be more
amenable to predicting I/O patterns than a query with a large IN()
list, with many constants that can be processed in whatever order
makes sense at runtime?

What I'd like to do with ScalarArrayOpExpr is to teach nbtree to
coalesce together those "small index scans" into "medium index scans"
dynamically, where that makes sense. That's the main part that's
missing right now. Dynamic behavior matters a lot with
ScalarArrayOpExpr stuff -- that's where the challenge lies, but also
where the opportunities are. Prefetching builds on all that.

> I guess I need to do some testing on a range of data sets / queries, and
> see how it works in practice.

If I can figure out a way of getting ScalarArrayOpExpr to visit each
leaf page exactly once, that might be enough to make things work
really well most of the time. Maybe it won't even be necessary to
coordinate very much, in the end. Unsure.

I've already done a lot of work that tries to minimize the chances of
regular (non-ScalarArrayOpExpr) queries accessing more than a single
leaf page, which will help your strategy of just prefetching items
from a single leaf page at a time -- that will get you pretty far
already. Consider the example of the tenk2_hundred index from the
bt_page_items documentation. You'll notice that the high key for the
page shown in the docs (and every other page in the same index) nicely
makes the leaf page boundaries "aligned" with natural keyspace
boundaries, due to suffix truncation. That helps index scans to access
no more than a single leaf page when accessing any one distinct
"hundred" value.

We are careful to do the right thing with the "boundary cases" when we
descend the tree, too. This _bt_search behavior builds on the way that
suffix truncation influences the on-disk structure of indexes. Queries
such as "select * from tenk2 where hundred = ?" will each return 100
rows spread across almost as many heap pages. That's a fairly large
number of rows/heap pages, but we still only need to access one leaf
page for every possible constant value (every "hundred" value that
might be specified as the ? in my point query example). It doesn't
matter if it's the leftmost or rightmost item on a leaf page -- we
always descend to exactly the correct leaf page directly, and we
always terminate the scan without having to move to the right sibling
page (we check the high key before going to the right page in some
cases, per the optimization added by commit 29b64d1d).

The same kind of behavior is also seen with the TPC-C line items
primary key index, which is a composite index. We want to access the
items from a whole order in one go, from one leaf page -- and we
reliably do the right thing there too (though with some caveats about
CREATE INDEX). We should never have to access more than one leaf page
to read a single order's line items. This matters because it's quite
natural to want to access whole orders with that particular
table/workload (it's also unnatural to only

Re: Meson build updates

2023-06-09 Thread Tristan Partin
On Fri Jun 9, 2023 at 11:41 AM CDT, Andres Freund wrote:
> Hi,
>
> On 2023-05-18 10:36:59 -0500, Tristan Partin wrote:
> > From b35ecb2c8dcd71608f98af1e0ec19d965099ceab Mon Sep 17 00:00:00 2001
> > From: Tristan Partin 
> > Date: Wed, 17 May 2023 09:40:02 -0500
> > Subject: [PATCH postgres v1 12/17] Make finding pkg-config(python3) more
> >  robust
> > 
> > It is a possibility that the installation can't be found. Checking for
> > Python.h is redundant with what Meson does internally.
>
> That's not what I saw - we had cases where Python.h was missing, but the
> python dependency was found. It's possible that this is dependent on the
> meson version.

Eli corrected me on this. Please see version 2 of the patchset.
>
> > From 47394ffd113d4170e955bc033841cb7e18fd3ac4 Mon Sep 17 00:00:00 2001
> > From: Tristan Partin 
> > Date: Wed, 17 May 2023 09:44:49 -0500
> > Subject: [PATCH postgres v1 14/17] Reduce branching on Meson version
> > 
> > This code had a branch depending on Meson version. Instead, we can just
> > move the system checks to the if statement. I believe this also keeps
> > selinux and systemd from being looked for on non-Linux systems when
> > using Meson < 0.59. Before they would be checked, but obviously fail.
>
> I like the current version better - depending on the meson version makes it
> easy to find what needs to be removed when we upgrade the meson minimum
> version.

I think the savings of not looking up selinux/systemd on non-Linux
systems is pretty big. Would you accept a change of something like:

if meson.version().version_compare('>=0.59')
  # do old stuff
else if host_system == 'linux'
  # do new stuff
endif

Otherwise, I am happy to remove the patch.

> > From 189d3ac3d5593ce3e475813e4830a29bb4e96f70 Mon Sep 17 00:00:00 2001
> > From: Tristan Partin 
> > Date: Wed, 17 May 2023 10:36:52 -0500
> > Subject: [PATCH postgres v1 16/17] Add Meson overrides
> > 
> > Meson has the ability to do transparent overrides when projects are used
> > as subprojects. For instance, say I am building a Postgres extension. I
> > can define Postgres to be a subproject of my extension given the
> > following wrap file:
> > 
> > [wrap-git]
> > url = https://git.postgresql.org/git/postgresql.git
> > revision = master
> > depth = 1
> > 
> > [provide]
> > dependency_names = libpq
> > 
> > Then in my extension (root project), I can have the following line
> > snippet:
> > 
> > libpq = dependency('libpq')
> > 
> > This will tell Meson to transparently compile libpq prior to it
> > compiling my extension (because I depend on libpq) if libpq isn't found
> > on the host system.
> > 
> > I have also added overrides for the various public-facing exectuables.
> > Though I don't expect them to get much usage, might as well go ahead and
> > override them. They can be used by adding the following line to the
> > aforementioned wrap file:
>
> I think adding more boilerplate to all these places does have some cost. So
> I'm not really convinced this is worth doign.

Could you explain more about what costs you foresee? I thought this was
a pretty free change :). Most of the meson.build files seems to be
pretty much copy/pastes of each other, so if a new executable came
around, then someone would just get the override line for essentially
free, minus changing the binary name/executable name.

> > From 5ee13f09e4101904dbc9887bd4844eb5f1cb4fea Mon Sep 17 00:00:00 2001
> > From: Tristan Partin 
> > Date: Wed, 17 May 2023 10:54:53 -0500
> > Subject: [PATCH postgres v1 17/17] Remove Meson program options for 
> > specifying
> >  paths
> > 
> > Meson has a built-in way to override paths without polluting project
> > build options called machine files.
>
> I think meson machine files are just about unusable. For one, you can't
> actually change any of the options without creating a new build dir. For
> another, it's not something that easily can be done on the commandline.
>
> So I really don't want to remove these options.

I felt like this would be the most controversial change. What could be
done in upstream Meson to make this a more enjoyable experience? I do
however disagree with the usability of machine files. Could you add a
little context about what you find unusable about them?

Happy to revert the change after continuing the discussion, of course.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: Use COPY for populating all pgbench tables

2023-06-09 Thread Tristan Partin
On Fri Jun 9, 2023 at 12:25 PM CDT, Gurjeet Singh wrote:
> On Fri, Jun 9, 2023 at 6:24 AM Gregory Smith  wrote:
> >
> > Unfortunately there's no simple command line option to change just that one 
> > thing about how pgbench runs.  You have to construct a command line that 
> > documents each and every step you want instead.  You probably just want 
> > this form:
> >
> > $ pgbench -i -I dtGvp -s 500
>
> The steps are severely under-documented in pgbench --help output.
> Grepping that output I could not find any explanation of these steps,
> so I dug through the code and found them in runInitSteps(). Just as I
> was thinking of writing a patch to remedy that, just to be sure, I
> checked online docs and sure enough, they are well-documented under
> pgbench [1].
>
> I think at least a pointer to the the pgbench docs should be mentioned
> in the pgbench --help output; an average user may not rush to read the
> code to find the explanation, but a hint to where to find more details
> about what the letters in --init-steps mean, would save them a lot of
> time.
>
> [1]: https://www.postgresql.org/docs/15/pgbench.html
>
> Best regards,
> Gurjeet
> http://Gurje.et

I think this would be nice to have if you wanted to submit a patch.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: win32ver data in meson-built postgres.exe

2023-06-09 Thread Andres Freund
Hi,

On 2023-06-07 18:45:07 -0700, Noah Misch wrote:
> On Wed, Jun 07, 2023 at 04:47:26PM -0700, Andres Freund wrote:
> > On 2023-06-07 16:14:07 -0700, Noah Misch wrote:
> > > A postgres.exe built with meson, ninja, and MSVC lacks the version 
> > > metadata
> > > that postgres.exe gets under non-meson build systems.  Patch attached.
> >
> > I dimly recall that we discussed that and basically decided that it doesn't
> > really make sense to attach this information to postgres.exe.
>
> I looked for a discussion behind that, but I didn't find it.  A key
> user-visible consequence is whether the task manager "Name" column shows (1)
> "PostgreSQL Server" (version data present) vs. (2) "postgres.exe" (no version
> data).  While (2) is not terrible, (1) is more typical on Windows.  I don't
> see cause to migrate to (2) after N years of sending (1).  Certainly this part
> of the user experience should not depend on one's choice of build system.

I think I misremembered some details... I guess I was remembering the icon for
dlls, in
https://postgr.es/m/20220829221314.pepagj3i5mj43niy%40awork3.anarazel.de

Touching the rc stuff always makes me feel dirty enough that I want to swap it
out of my brain, if not run some deep erase tool :)


> > > This preserves two quirks of the older build systems.  First,
> > > postgres.exe is icon-free.
> >
> > We could also just change that.
>
> I would be +1 for that (only if done for all build systems).  Showing the
> elephant in task manager feels better than showing the generic-exe icon.

Let's do that then.

Greetings,

Andres Freund




Re: Use COPY for populating all pgbench tables

2023-06-09 Thread Gurjeet Singh
On Fri, Jun 9, 2023 at 6:24 AM Gregory Smith  wrote:
>
> Unfortunately there's no simple command line option to change just that one 
> thing about how pgbench runs.  You have to construct a command line that 
> documents each and every step you want instead.  You probably just want this 
> form:
>
> $ pgbench -i -I dtGvp -s 500

The steps are severely under-documented in pgbench --help output.
Grepping that output I could not find any explanation of these steps,
so I dug through the code and found them in runInitSteps(). Just as I
was thinking of writing a patch to remedy that, just to be sure, I
checked online docs and sure enough, they are well-documented under
pgbench [1].

I think at least a pointer to the the pgbench docs should be mentioned
in the pgbench --help output; an average user may not rush to read the
code to find the explanation, but a hint to where to find more details
about what the letters in --init-steps mean, would save them a lot of
time.

[1]: https://www.postgresql.org/docs/15/pgbench.html

Best regards,
Gurjeet
http://Gurje.et




Re: Partial aggregates pushdown

2023-06-09 Thread Bruce Momjian
On Tue, Jun  6, 2023 at 03:08:50AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> > I've seen this message. But introduction of new built-in function will break
> > requests to old servers only if this new function is used in the request 
> > (this
> > means that query changes). However, this patch changes the behavior of old
> > queries, which worked prior to update. This seems to be different to me.
> 
> You are right.
> However, for now, partial aggregates pushdown is mainly used when using 
> built-in sharding in PostgreSQL.
> I believe when using built-in sharding in PostgreSQL, the version of the 
> primary node server and
> the version of the remote server will usually be the same.
> So I think it would be sufficient to include in the documentation a note 
> about such problem
> and a workaround for them.

I agree that this feature is designed for built-in sharding, but it is
possible people could be using aggregates on partitions backed by
foreign tables without sharding.  Adding a requirement for non-sharding
setups to need PG 17+ servers might be unreasonable.

Looking at previous release note incompatibilities, we don't normally
change non-administrative functions in a way that causes errors if run
on older servers.  Based on Alexander's observations, I wonder if we
need to re-add the postgres_fdw option to control partial aggregate
pushdown, and default it to enabled.

If we ever add more function breakage we might need more postgres_fdw
options.  Fortunately, such changes are rare.

Yuki, thank you for writing and updating this patch, and Alexander,
thank you for helping with this patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Meson build updates

2023-06-09 Thread Andres Freund
Hi,

On 2023-05-18 10:36:59 -0500, Tristan Partin wrote:
> I took some time to look at the Meson build for Postgres. I contribute
> some of the time to Meson, itself. Within this patchset you will find
> pretty small changes. Most of the commits are attempting to create more
> consistency with the surrounding code. I think there is more that can be
> done to improve the build a bit like including subprojects for optional
> dependencies like lz4, zstd, etc.

Thanks for looking over these!



> From b35ecb2c8dcd71608f98af1e0ec19d965099ceab Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Wed, 17 May 2023 09:40:02 -0500
> Subject: [PATCH postgres v1 12/17] Make finding pkg-config(python3) more
>  robust
> 
> It is a possibility that the installation can't be found. Checking for
> Python.h is redundant with what Meson does internally.

That's not what I saw - we had cases where Python.h was missing, but the
python dependency was found. It's possible that this is dependent on the
meson version.


> From 47394ffd113d4170e955bc033841cb7e18fd3ac4 Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Wed, 17 May 2023 09:44:49 -0500
> Subject: [PATCH postgres v1 14/17] Reduce branching on Meson version
> 
> This code had a branch depending on Meson version. Instead, we can just
> move the system checks to the if statement. I believe this also keeps
> selinux and systemd from being looked for on non-Linux systems when
> using Meson < 0.59. Before they would be checked, but obviously fail.

I like the current version better - depending on the meson version makes it
easy to find what needs to be removed when we upgrade the meson minimum
version.



> From 189d3ac3d5593ce3e475813e4830a29bb4e96f70 Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Wed, 17 May 2023 10:36:52 -0500
> Subject: [PATCH postgres v1 16/17] Add Meson overrides
> 
> Meson has the ability to do transparent overrides when projects are used
> as subprojects. For instance, say I am building a Postgres extension. I
> can define Postgres to be a subproject of my extension given the
> following wrap file:
> 
> [wrap-git]
> url = https://git.postgresql.org/git/postgresql.git
> revision = master
> depth = 1
> 
> [provide]
> dependency_names = libpq
> 
> Then in my extension (root project), I can have the following line
> snippet:
> 
> libpq = dependency('libpq')
> 
> This will tell Meson to transparently compile libpq prior to it
> compiling my extension (because I depend on libpq) if libpq isn't found
> on the host system.
> 
> I have also added overrides for the various public-facing exectuables.
> Though I don't expect them to get much usage, might as well go ahead and
> override them. They can be used by adding the following line to the
> aforementioned wrap file:

I think adding more boilerplate to all these places does have some cost. So
I'm not really convinced this is worth doign.




> From 5ee13f09e4101904dbc9887bd4844eb5f1cb4fea Mon Sep 17 00:00:00 2001
> From: Tristan Partin 
> Date: Wed, 17 May 2023 10:54:53 -0500
> Subject: [PATCH postgres v1 17/17] Remove Meson program options for specifying
>  paths
> 
> Meson has a built-in way to override paths without polluting project
> build options called machine files.

I think meson machine files are just about unusable. For one, you can't
actually change any of the options without creating a new build dir. For
another, it's not something that easily can be done on the commandline.

So I really don't want to remove these options.

Greetings,

Andres Freund




Re: Views no longer in rangeTabls?

2023-06-09 Thread Tom Lane
David Steele  writes:
> Thank you, this was very helpful. I am able to get the expected result 
> now with:

> /* We only care about tables/views and can ignore subqueries, etc. */
> if (!(rte->rtekind == RTE_RELATION ||
>   (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid
>  continue;

Right, that matches places like add_rtes_to_flat_rtable().

> One thing, though, rte->relkind is not set for views, so I still need to 
> call get_rel_relkind(rte->relid). Not a big deal, but do you think it 
> would make sense to set rte->relkind for views?

If you see "rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)",
it's dead certain that relid refers to a view, so you could just wire
in that knowledge.

regards, tom lane




Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1

2023-06-09 Thread Tom Lane
Richard Guo  writes:
> On Wed, May 31, 2023 at 10:47 AM Richard Guo  wrote:
>> When we transform the first form of identity 3 to the second form, we've
>> converted Pb*c to Pbc in deconstruct_distribute_oj_quals.  But we
>> neglect to consider that rel C might be a RTE_SUBQUERY and contains
>> quals that have lateral references to B.  So the B vars in such quals
>> have wrong nulling bitmaps and we'd finally notice that when we do
>> fix_upper_expr for the NestLoopParam expressions.

Right.  One question that immediately raises is whether it's even safe
to apply the identity if C has lateral references to B, because that
almost certainly means that C will produce different output when
joined to a nulled B row than when joined to a not-nulled row.
I think that's okay because if the B row will fail Pab then it doesn't
matter what row(s) C produces, but maybe I've missed something.

> We can identify in which form of identity 3 the plan is built up by
> checking the relids of the B/C join's outer rel.  If it's in the first
> form, the outer rel's relids must contain the A/B join.  Otherwise it
> should only contain B's relid.  So I'm considering that maybe we can
> adjust the nulling bitmap for nestloop parameters according to that.
> Attached is a patch for that.  Does this make sense?

Hmm.  I don't really want to do it in identify_current_nestloop_params
because that gets applied to all nestloop params, so it seems like
that risks masking bugs of other kinds.  I'd rather do it in
process_subquery_nestloop_params, which we know is only applied to
subquery LATERAL references.  So more or less as attached.

(I dropped the equal() assertions in process_subquery_nestloop_params
because they've never caught anything and it'd be too complicated to
make them still work.)

Thoughts?

regards, tom lane

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1ca26baa25..3585a703fb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2289,9 +2289,11 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
 			 * the outer-join level at which they are used, Vars seen in the
 			 * NestLoopParam expression may have nullingrels that are just a
 			 * subset of those in the Vars actually available from the outer
-			 * side.  Not checking this exactly is a bit grotty, but the work
-			 * needed to make things match up perfectly seems well out of
-			 * proportion to the value.
+			 * side.  Another case that can cause that to happen is explained
+			 * in the comments for process_subquery_nestloop_params.  Not
+			 * checking this exactly is a bit grotty, but the work needed to
+			 * make things match up perfectly seems well out of proportion to
+			 * the value.
 			 */
 			nlp->paramval = (Var *) fix_upper_expr(root,
    (Node *) nlp->paramval,
diff --git a/src/backend/optimizer/util/paramassign.c b/src/backend/optimizer/util/paramassign.c
index 66534c0a78..e94f7e7563 100644
--- a/src/backend/optimizer/util/paramassign.c
+++ b/src/backend/optimizer/util/paramassign.c
@@ -421,8 +421,26 @@ replace_nestloop_param_placeholdervar(PlannerInfo *root, PlaceHolderVar *phv)
  * provide these values.  This differs from replace_nestloop_param_var in
  * that the PARAM_EXEC slots to use have already been determined.
  *
+ * An additional complication is that the subplan_params may contain
+ * nullingrel markers that need adjustment.  This occurs if we have applied
+ * outer join identity 3,
+ *		(A leftjoin B on (Pab)) leftjoin C on (Pb*c)
+ *		= A leftjoin (B leftjoin C on (Pbc)) on (Pab)
+ * and C is a subquery containing lateral references to B.  It's still safe
+ * to apply the identity, but the parser will have created those references
+ * in the form "b*" (i.e., with varnullingrels listing the A/B join), while
+ * what we will have available from the nestloop's outer side is just "b".
+ * We deal with that here by stripping the nullingrels down to what is
+ * available from the outer side according to root->curOuterRels.
+ * That fixes matters for the case of forward application of identity 3.
+ * If the identity was applied in the reverse direction, we will have
+ * subplan_params containing too few nullingrel bits rather than too many.
+ * Currently, that causes no problems because setrefs.c applies only a
+ * subset check to nullingrels in NestLoopParams, but we'd have to work
+ * harder if we ever want to tighten that check.
+ *
  * Note that we also use root->curOuterRels as an implicit parameter for
- * sanity checks.
+ * sanity checks and nullingrel adjustments.
  */
 void
 process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
@@ -449,17 +467,19 @@ process_subquery_nestloop_params(PlannerInfo *root, List *subplan_params)
 nlp = (NestLoopParam *) lfirst(lc2);
 if (nlp->paramno == pitem->paramId)
 {
-	Assert(equal(var, nlp->paramval));
 	/* Present, so 

Re: Order changes in PG16 since ICU introduction

2023-06-09 Thread Jeff Davis
On Fri, 2023-06-09 at 14:12 +0200, Daniel Verite wrote:
> >  I implemented a compromise where initdb will 
> >  change C.UTF-8 to the built-in provider
> 
> $ initdb --locale=C.UTF-8

...

> This setup is not what the user has asked for and leads to that kind
> of
> wrong results:
> 
> $ psql -c "select upper('é')"
>  ?column? 
> --
>  é
> 
> whereas in v15 we would get the correct result 'É'.

I guess where I'm confused is: why would a user actually want their
database collation to be C.UTF-8? It's slower than C, our
implementation doesn't properly version it (as you pointed out), and
the semantics don't seem great ('Z' < 'a').

If the user specifies provider=libc, then of course we should honor
that and C.UTF-8 is a valid locale for libc.

But if they don't specify the provider, isn't it much more likely they
just don't care much about the locale, and would be happier with C? 

Perhaps there's some better compromise here than the one I picked, but
I see this as a fairly small problem in comparison to the big problems
that we're solving.


> In general about the evolution of the patchset, your interpretation
> of "defaulting to ICU" seems to be "avoid libc at any cost", which
> IMV
> is unreasonably user-hostile.

The user can easily get libc behavior by specifying --locale-
provider=libc, so I don't see how you reached this conclusion.


Let me try to understand and address the points you raised here[1] in
more detail:

It looks like you are fine with 0003 applying LOCALE to whatever
provider is chosen, but you'd like to be smarter about choosing the
provider and to choose libc in at least some cases.

That is actually very much like option #2 in the list I presented
here[2], and has the same problems. How should the following behave?

  initdb --locale=C --lc-collate=fr_FR.utf8
  initdb --locale=en --lc-collate=fr_FR.utf8

If we switch to libc in the first case, then --locale will be ignored
and the collation will be fr_FR.utf8. But we will leave the second case
as ICU and the collation will be "en". I'm sure we can come up with
something there, but it feels like there's more room for confusion
along this path, and the builtin provider seems cleaner.

You also suggested that we consider switching the provider to libc any
time ICU doesn't support something. I'm not sure whether you meant a
static list (C, C.UTF-8, POSIX, ...?) or some kind of dynamic test. I'm
skeptical of being too smart here, but I'd like to hear what you mean.
I'm also not clear whether you think we should abandon the built-in
provider, or still select it for C/POSIX.

Regards,
Jeff Davis

[1]
https://www.postgresql.org/message-id/7de2dc15-5211-45b3-afcb-71dcaf7a0...@manitou-mail.org

[2]
https://www.postgresql.org/message-id/daa9f060aa2349ebc8515efece49e7b32c5d.ca...@j-davis.com





Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Matthias van de Meent
On Fri, 9 Jun 2023 at 17:20, Dave Cramer  wrote:
>
> This is somewhat orthogonal to the topic of threading but relevant to the use 
> of resources.
>
> If we are going to undertake some hard problems perhaps we should be looking 
> at other problems that solve other long term issues before we commit to 
> spending resources on changing the process model.

-1. This and that are orthogonal and effort in one does not need to
block the other. If someone is willing to put in the effort, let them.
Last time I checked we, as a project, are not blocking bugfixes for
new features in MAIN either (or vice versa).

> One thing I can think of is upgrading. AFAIK dump and restore is the only way 
> to change the on disk format.
> Presuming that eventually we will be forced to change the on disk format it 
> would be nice to be able to do so in a manner which does not force long down 
> times

I agree that we should improve our upgrade process (and we had a great
discussion on the topic at the PGCon Unconference last week), but in
my view that's not relevant to this discussion.

Kind regards,

Matthias van de Meent
Neon, Inc.




Re: Let's make PostgreSQL multi-threaded

2023-06-09 Thread Dave Cramer
This is somewhat orthogonal to the topic of threading but relevant to the
use of resources.

If we are going to undertake some hard problems perhaps we should be
looking at other problems that solve other long term issues before we
commit to spending resources on changing the process model.

One thing I can think of is upgrading. AFAIK dump and restore is the only
way to change the on disk format.
Presuming that eventually we will be forced to change the on disk format it
would be nice to be able to do so in a manner which does not force long
down times

 Dave

>
>>


Re: Use COPY for populating all pgbench tables

2023-06-09 Thread Tristan Partin
David,

I think you should submit this patch standalone. I don't see any reason
this shouldn't be reviewed and committed when fully fleshed out.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: Use COPY for populating all pgbench tables

2023-06-09 Thread Tristan Partin
On Fri Jun 9, 2023 at 8:24 AM CDT, Gregory Smith wrote:
> On Tue, May 23, 2023 at 1:33 PM Tristan Partin  wrote:
>
> > We (Neon) have noticed that pgbench can be quite slow to populate data
> > in regard to higher latency connections. Higher scale factors exacerbate
> > this problem. Some employees work on a different continent than the
> > databases they might be benchmarking. By moving pgbench to use COPY for
> > populating all tables, we can reduce some of the time pgbench takes for
> > this particular step.
> >
>
> When latency is continent size high, pgbench should be run with server-side
> table generation instead of using COPY at all, for any table.  The default
> COPY based pgbench generation is only intended for testing where the client
> and server are very close on the network.
>
> Unfortunately there's no simple command line option to change just that one
> thing about how pgbench runs.  You have to construct a command line that
> documents each and every step you want instead.  You probably just want
> this form:
>
> $ pgbench -i -I dtGvp -s 500
>
> That's server-side table generation with all the usual steps.  I use this
> instead of COPY in pgbench-tools so much now, basically whenever I'm
> talking to a cloud system, that I have a simple 0/1 config option to switch
> between the modes, and this long weird one is the default now.
>
> Try that out, and once you see the numbers my bet is you'll see extending
> which tables get COPY isn't needed by your use case anymore.  Basically, if
> you are close enough to use COPY instead of server-side generation, you are
> close enough that every table besides accounts will not add up to enough
> time to worry about optimizing the little ones.

Thanks for your input Greg. I'm sure you're correct that server-side data
generation would probably fix the problem. I guess I am trying to
understand if there are any downsides to just committing this anyway. I
haven't done any more testing, but David's email only showed a 4%
performance drop in the workload he tested. Combine this with his hex
patch and we would see an overall performance improvement when
everything is said and done.

It seems like this patch would still be good for client-side high scale
factor data generation (when server and client are close), but I would
need to do testing to confirm.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: Views no longer in rangeTabls?

2023-06-09 Thread David Steele

Hi Amit,

On 6/9/23 14:25, Amit Langote wrote:
On Fri, Jun 9, 2023 at 17:28 David Steele > wrote:


In prior versions of Postgres, views were listed in rangeTabls when
ExecutorCheckPerms_hook() was called but in PG16 the views are no
longer
in this list.

I’m not exactly sure how pgAudit’s code is searching for view relations 
in the range table, but if the code involves filtering on rtekind == 
RTE_RELATION, then yes, such code won’t find views anymore. That’s 
because the rewriter no longer adds extraneous RTE_RELATION RTEs for 
views into the range table. Views are still there, it’s just that their 
RTEs are of kind RTE_SUBQUERY, but they do contain some RELATION fields 
like relid, rellockmode, etc.  So an extension hook’s relation RTE 
filtering code should also consider relid, not just rtekind.


Thank you, this was very helpful. I am able to get the expected result 
now with:


/* We only care about tables/views and can ignore subqueries, etc. */
if (!(rte->rtekind == RTE_RELATION ||
 (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid
continue;

One thing, though, rte->relkind is not set for views, so I still need to 
call get_rel_relkind(rte->relid). Not a big deal, but do you think it 
would make sense to set rte->relkind for views?


Perhaps, we are missing a comment near the hook definition mentioning 
this detail about views.


I don't see any meaningful comments near the hook definition. That would 
certainly be helpful.


Thanks!
-David




Re: Meson build updates

2023-06-09 Thread Tristan Partin
Received a review from a Meson maintainer. Here is a v2.

-- 
Tristan Partin
Neon (https://neon.tech)
From 1ebd8acb56eb0227b09bd7536e1c88ba0059c7ad Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 16 May 2023 07:55:03 -0500
Subject: [PATCH v2 01/17] Remove triple-quoted strings

Triple-quoted strings are for multiline strings in Meson. None of the
descriptions that got changed were multiline and the entire file uses
single-line descriptions.
---
 meson_options.txt | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/meson_options.txt b/meson_options.txt
index 5b44a8829d..1ea9729dc2 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -10,19 +10,19 @@ option('blocksize', type : 'combo',
 option('wal_blocksize', type : 'combo',
   choices: ['1', '2', '4', '8', '16', '32', '64'],
   value: '8',
-  description : '''WAL block size, in kilobytes''')
+  description : 'WAL block size, in kilobytes')
 
 option('segsize', type : 'integer', value : 1,
-  description : '''Segment size, in gigabytes''')
+  description : 'Segment size, in gigabytes')
 
 option('segsize_blocks', type : 'integer', value: 0,
-  description : '''Segment size, in blocks''')
+  description : 'Segment size, in blocks')
 
 
 # Miscellaneous options
 
 option('krb_srvnam', type : 'string', value : 'postgres',
-  description : '''Default Kerberos service principal for GSSAPI''')
+  description : 'Default Kerberos service principal for GSSAPI')
 
 option('system_tzdata', type: 'string', value: '',
   description: 'use system time zone data in specified directory')
@@ -32,7 +32,7 @@ option('system_tzdata', type: 'string', value: '',
 
 option('pgport', type : 'integer', value : 5432,
   min: 1, max: 65535,
-  description : '''Default port number for server and clients''')
+  description : 'Default port number for server and clients')
 
 
 # Developer options
-- 
Tristan Partin
Neon (https://neon.tech)

From ecf70897974b5651575cf7b0f729c99fd436c976 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 16 May 2023 08:03:31 -0500
Subject: [PATCH v2 02/17] Use consistent casing in Meson option descriptions

Meson itself uses capital letters for option descriptions, so follow
that.
---
 meson_options.txt | 90 +++
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/meson_options.txt b/meson_options.txt
index 1ea9729dc2..fa823fd088 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -5,7 +5,7 @@
 option('blocksize', type : 'combo',
   choices : ['1', '2', '4', '8', '16', '32'],
   value : '8',
-  description: 'set relation block size in kB')
+  description: 'Set relation block size in kB')
 
 option('wal_blocksize', type : 'combo',
   choices: ['1', '2', '4', '8', '16', '32', '64'],
@@ -25,7 +25,7 @@ option('krb_srvnam', type : 'string', value : 'postgres',
   description : 'Default Kerberos service principal for GSSAPI')
 
 option('system_tzdata', type: 'string', value: '',
-  description: 'use system time zone data in specified directory')
+  description: 'Use system time zone data in specified directory')
 
 
 # Defaults
@@ -38,7 +38,7 @@ option('pgport', type : 'integer', value : 5432,
 # Developer options
 
 option('cassert', type : 'boolean', value: false,
-  description: 'enable assertion checks (for debugging)')
+  description: 'Enable assertion checks (for debugging)')
 
 option('tap_tests', type : 'feature', value : 'auto',
   description : 'Whether to enable tap tests')
@@ -47,43 +47,43 @@ option('PG_TEST_EXTRA', type : 'string', value: '',
   description: 'Enable selected extra tests')
 
 option('atomics', type : 'boolean', value: true,
-  description: 'whether to use atomic operations')
+  description: 'Whether to use atomic operations')
 
 option('spinlocks', type : 'boolean', value: true,
-  description: 'whether to use spinlocks')
+  description: 'Whether to use spinlocks')
 
 
 # Compilation options
 
 option('extra_include_dirs', type : 'array', value: [],
-  description: 'non-default directories to be searched for headers')
+  description: 'Non-default directories to be searched for headers')
 
 option('extra_lib_dirs', type : 'array', value: [],
-  description: 'non-default directories to be searched for libs')
+  description: 'Non-default directories to be searched for libs')
 
 option('extra_version', type : 'string', value: '',
-  description: 'append STRING to the PostgreSQL version number')
+  description: 'Append STRING to the PostgreSQL version number')
 
 option('darwin_sysroot', type : 'string', value: '',
-  description: 'select a non-default sysroot path')
+  description: 'Select a non-default sysroot path')
 
 option('rpath', type : 'boolean', value: true,
-  description: 'whether to embed shared library search path in executables')
+  description: 'Whether to embed shared library search path in executables')
 
 
 # External dependencies
 
 option('bonjour', type : 'feature', value: 'auto',
-  description: 'build with 

Re: Partial aggregates pushdown

2023-06-09 Thread Alexander Pyhalov

Hi.

+   An aggregate function, called the partial aggregate function for 
partial aggregate
+   that corresponding to the aggregate function, is defined on the 
primary node and

+   the postgres_fdw node.

Something is clearly wrong here.

+   When using built-in sharding feature in PostgreSQL is used,

And here.

Overall the code looks good to me, but I suppose that documentation 
needs further review from some native speaker.



--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Use COPY for populating all pgbench tables

2023-06-09 Thread Gregory Smith
On Tue, May 23, 2023 at 1:33 PM Tristan Partin  wrote:

> We (Neon) have noticed that pgbench can be quite slow to populate data
> in regard to higher latency connections. Higher scale factors exacerbate
> this problem. Some employees work on a different continent than the
> databases they might be benchmarking. By moving pgbench to use COPY for
> populating all tables, we can reduce some of the time pgbench takes for
> this particular step.
>

When latency is continent size high, pgbench should be run with server-side
table generation instead of using COPY at all, for any table.  The default
COPY based pgbench generation is only intended for testing where the client
and server are very close on the network.

Unfortunately there's no simple command line option to change just that one
thing about how pgbench runs.  You have to construct a command line that
documents each and every step you want instead.  You probably just want
this form:

$ pgbench -i -I dtGvp -s 500

That's server-side table generation with all the usual steps.  I use this
instead of COPY in pgbench-tools so much now, basically whenever I'm
talking to a cloud system, that I have a simple 0/1 config option to switch
between the modes, and this long weird one is the default now.

Try that out, and once you see the numbers my bet is you'll see extending
which tables get COPY isn't needed by your use case anymore.  Basically, if
you are close enough to use COPY instead of server-side generation, you are
close enough that every table besides accounts will not add up to enough
time to worry about optimizing the little ones.

--
Greg Smith  greg.sm...@crunchydata.com
Director of Open Source Strategy


Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15

2023-06-09 Thread Gregory Smith
On Fri, Jun 9, 2023 at 4:06 AM Gurjeet Singh  wrote:

> There is no mention of perf or similar utilities in pgbench-tools
> docs. I'm guessing Linux is the primary platform pgbench-tools gets
> used on most. If so, I think it'd be useful to mention these tools and
> snippets in there to make others lives easier.


That's a good idea.  I've written out guides multiple times for customers
who are crashing and need to learn about stack traces, to help them becomes
self-sufficient with the troubleshooting parts it's impractical for me to
do for them.  If I can talk people through gdb, I can teach them perf.  I
have a lot of time to work on pgbench-tools set aside this summer, gonna
finally deprecate the gnuplot backend and make every graph as nice as the
ones I shared here.

I haven't been aggressive about pushing perf because a lot of customers at
Crunchy--a disproportionately larger number than typical I suspect--have
operations restrictions that just don't allow DBAs direct access to a
server's command line.  So perf commands are just out of reach before we
even get to the permissions it requires.  I may have to do something really
wild to help them, like see if the right permissions setup would allow
PL/python3 or similar to orchestrate a perf session in a SQL function.


Re: Order changes in PG16 since ICU introduction

2023-06-09 Thread Daniel Verite
Jeff Davis wrote:

>  I implemented a compromise where initdb will 
>  change C.UTF-8 to the built-in provider

This handling of C.UTF-8 would be felt by users as simply broken.

With the v10 patches:

$ initdb --locale=C.UTF-8

initdb: using locale provider "builtin" for ICU locale "C.UTF-8"
The database cluster will be initialized with this locale configuration:
  default collation provider:  builtin
  default collation locale:C
  LC_COLLATE:  C.UTF-8
  LC_CTYPE:C.UTF-8

This setup is not what the user has asked for and leads to that kind of
wrong results:

$ psql -c "select upper('é')"
 ?column? 
--
 é

whereas in v15 we would get the correct result 'É'.


Then once inside that cluster, trying to create a database:

  postgres=# create database test locale='C.UTF-8';
  ERROR:  locale provider "builtin" does not support locale "C.UTF-8"
  HINT:  The built-in locale provider only supports the "C" and "POSIX"
locales.


That hardly makes sense considering that initdb stated the opposite,
that the "built-in provider" was adequate for C.UTF-8


In general about the evolution of the patchset, your interpretation
of "defaulting to ICU" seems to be "avoid libc at any cost", which IMV
is unreasonably user-hostile.



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Do we want a hashset type?

2023-06-09 Thread Joel Jacobson
On Fri, Jun 9, 2023, at 13:33, jian he wrote:
> Hi, I am quite new about C.
> The following function I have 3 questions.
> 1. 7691,4201, I assume they are just random prime ints?

Yes, 7691 and 4201 are likely chosen as random prime numbers.
In hash functions, prime numbers are often used to help in evenly distributing
the hash values across the range and reduce the chance of collisions.

> 2. I don't get the last return set, even the return type should be bool.

Thanks, you found a mistake!

The line

return set;

is actually unreachable and should be removed.
The function will always return either true or false within the while loop and
never reach the final return statement.

I've attached a new incremental patch with this fix.

> 3. I don't understand 13 in hash = (hash + 13) % set->maxelements;

The value 13 is used for linear probing [1] in handling hash collisions.
Linear probing sequentially checks the next slot in the array when a collision
occurs. 13, being a small prime number not near a power of 2, helps in uniformly
distributing data and ensuring that all slots are probed, as it's relatively 
prime
to the hash table size.

Hm, I realise we actually don't ensure the hash table size and step size (13)
are coprime. I've fixed that in the attached patch as well.

[1] https://en.wikipedia.org/wiki/Linear_probing

/Joel

hashset-1.0.0-joel-0002.patch
Description: Binary data


Re: Do we want a hashset type?

2023-06-09 Thread jian he
On Fri, Jun 9, 2023 at 6:58 PM Joel Jacobson  wrote:

> On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote:
> > Would you be interested in helping with / working on some of that? I
> > don't have immediate need for this stuff, so it's not very high on my
> > TODO list.
>
> Sure, I'm willing to help!
>
> I've attached a patch that works on some of the items on your list,
> including some additions to the README.md.
>
> There were a bunch of places where `maxelements / 8` caused bugs,
> that had to be changed to do proper integer ceiling division:
>
> -   values = (int32 *) (set->data + set->maxelements / 8);
> +   values = (int32 *) (set->data + (set->maxelements + 7) / 8);
>
> Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV
> macros
> to the PostgreSQL source code in general, since it's easy to make this
> mistake,
> and quite verbose/error-prone to write it out manually everywhere.
> Such macros could simplify code in e.g. numeric.c.
>
> > There's a bunch of stuff that needs to be improved to make this properly
> > usable, like:
> >
> > 1) better hash table implementation
> TODO
>
> > 2) input/output functions
> I've attempted to implement these.
> I thought comma separated values wrapped around curly braces felt as the
> most natural format,
> example:
> SELECT '{1,2,3}'::hashset;
>
> > 3) support for other types (now it only works with int32)
> TODO
>
> > 4) I wonder if this might be done as an array-like polymorphic type.
> That would be nice!
> I guess the work-around would be to store the actual value of non-int type
> in a lookup table, and then hash the int-based primary key in such table.
>
> Do you think later implementing polymorphic type support would
> mean a more or less complete rewrite, or can we carry on with int32-support
> and add it later on?
>
> > 5) more efficient storage format, with versioning etc.
> TODO
>
> > 6) regression tests
> I've added some regression tests.
>
> > Right. IMHO the query language is a separate thing, you still need to
> > evaluate the query somehow - which is where hashset applies.
>
> Good point, I fully agree.
>
> /Joel




Hi, I am quite new about C.
The following function I have 3 questions.
1. 7691,4201, I assume they are just random prime ints?
2. I don't get the last return set, even the return type should be bool.
3. I don't understand 13 in hash = (hash + 13) % set->maxelements;

static bool
hashset_contains_element(hashset_t *set, int32 value)
{
int byte;
int bit;
uint32  hash;
char   *bitmap;
int32  *values;

hash = ((uint32) value * 7691 + 4201) % set->maxelements;

bitmap = set->data;
values = (int32 *) (set->data + set->maxelements / 8);

while (true)
{
byte = (hash / 8);
bit = (hash % 8);

/* found an empty slot, value is not there */
if ((bitmap[byte] & (0x01 << bit)) == 0)
return false;

/* is it the same value? */
if (values[hash] == value)
return true;

/* move to the next element */
hash = (hash + 13) % set->maxelements;
}

return set;
}


Re: Views no longer in rangeTabls?

2023-06-09 Thread Amit Langote
Hi David,

On Fri, Jun 9, 2023 at 17:28 David Steele  wrote:

> Hackers,
>
> While updating pgAudit for PG16 I found the following (from our
> perspective) regression.
>
> In prior versions of Postgres, views were listed in rangeTabls when
> ExecutorCheckPerms_hook() was called but in PG16 the views are no longer
> in this list.


I’m not exactly sure how pgAudit’s code is searching for view relations in
the range table, but if the code involves filtering on rtekind ==
RTE_RELATION, then yes, such code won’t find views anymore. That’s because
the rewriter no longer adds extraneous RTE_RELATION RTEs for views into the
range table. Views are still there, it’s just that their RTEs are of kind
RTE_SUBQUERY, but they do contain some RELATION fields like relid,
rellockmode, etc.  So an extension hook’s relation RTE filtering code
should also consider relid, not just rtekind.

I’m away from a computer atm, so I am not able to easily copy-paste an
example of that from the core code, but maybe you can search for code sites
that need to filter out relation RTEs from the range table.

Perhaps, we are missing a comment near the hook definition mentioning this
detail about views.

> --
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: Do we want a hashset type?

2023-06-09 Thread Joel Jacobson
On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote:
> Would you be interested in helping with / working on some of that? I
> don't have immediate need for this stuff, so it's not very high on my
> TODO list.

Sure, I'm willing to help!

I've attached a patch that works on some of the items on your list,
including some additions to the README.md.

There were a bunch of places where `maxelements / 8` caused bugs,
that had to be changed to do proper integer ceiling division:

-   values = (int32 *) (set->data + set->maxelements / 8);
+   values = (int32 *) (set->data + (set->maxelements + 7) / 8);

Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV macros
to the PostgreSQL source code in general, since it's easy to make this mistake,
and quite verbose/error-prone to write it out manually everywhere.
Such macros could simplify code in e.g. numeric.c.

> There's a bunch of stuff that needs to be improved to make this properly
> usable, like:
>
> 1) better hash table implementation
TODO

> 2) input/output functions
I've attempted to implement these.
I thought comma separated values wrapped around curly braces felt as the most 
natural format,
example:
SELECT '{1,2,3}'::hashset;

> 3) support for other types (now it only works with int32)
TODO

> 4) I wonder if this might be done as an array-like polymorphic type.
That would be nice!
I guess the work-around would be to store the actual value of non-int type
in a lookup table, and then hash the int-based primary key in such table.

Do you think later implementing polymorphic type support would
mean a more or less complete rewrite, or can we carry on with int32-support
and add it later on?

> 5) more efficient storage format, with versioning etc.
TODO

> 6) regression tests
I've added some regression tests.

> Right. IMHO the query language is a separate thing, you still need to
> evaluate the query somehow - which is where hashset applies.

Good point, I fully agree.

/Joel

hashset-1.0.0-joel-0001.patch
Description: Binary data


Re: index prefetching

2023-06-09 Thread Tomas Vondra



On 6/9/23 01:38, Peter Geoghegan wrote:
> On Thu, Jun 8, 2023 at 3:17 PM Tomas Vondra
>  wrote:
>> Normal index scans are an even more interesting case but I'm not
>> sure how hard it would be to get that information. It may only be
>> convenient to get the blocks from the last leaf page we looked at,
>> for example.
>>
>> So this suggests we simply started prefetching for the case where the
>> information was readily available, and it'd be harder to do for index
>> scans so that's it.
> 
> What the exact historical timeline is may not be that important. My
> emphasis on ScalarArrayOpExpr is partly due to it being a particularly
> compelling case for both parallel index scan and prefetching, in
> general. There are many queries that have huge in() lists that
> naturally benefit a great deal from prefetching. Plus they're common.
> 

Did you mean parallel index scan or bitmap index scan?

But yeah, I get the point that SAOP queries are an interesting example
of queries to explore. I'll add some to the next round of tests.

>> Even if SAOP (probably) wasn't the reason, I think you're right it may
>> be an issue for prefetching, causing regressions. It didn't occur to me
>> before, because I'm not that familiar with the btree code and/or how it
>> deals with SAOP (and didn't really intend to study it too deeply).
> 
> I'm pretty sure that you understand this already, but just in case:
> ScalarArrayOpExpr doesn't even "get the blocks from the last leaf
> page" in many important cases. Not really -- not in the sense that
> you'd hope and expect. We're senselessly processing the same index
> leaf page multiple times and treating it as a different, independent
> leaf page. That makes heap prefetching of the kind you're working on
> utterly hopeless, since it effectively throws away lots of useful
> context. Obviously that's the fault of nbtree ScalarArrayOpExpr
> handling, not the fault of your patch.
> 

I think I understand, although maybe my mental model is wrong. I agree
it seems inefficient, but I'm not sure why would it make prefetching
hopeless. Sure, it puts index scans at a disadvantage (compared to
bitmap scans), but it we pick index scan it should still be an
improvement, right?

I guess I need to do some testing on a range of data sets / queries, and
see how it works in practice.

>> So if you're planning to work on this for PG17, collaborating on it
>> would be great.
>>
>> For now I plan to just ignore SAOP, or maybe just disabling prefetching
>> for SAOP index scans if it proves to be prone to regressions. That's not
>> great, but at least it won't make matters worse.
> 
> Makes sense, but I hope that it won't come to that.
> 
> IMV it's actually quite reasonable that you didn't expect to have to
> think about ScalarArrayOpExpr at all -- it would make a lot of sense
> if that was already true. But the fact is that it works in a way
> that's pretty silly and naive right now, which will impact
> prefetching. I wasn't really thinking about regressions, though. I was
> actually more concerned about missing opportunities to get the most
> out of prefetching. ScalarArrayOpExpr really matters here.
> 

OK

>> I guess something like this might be a "nice" bad case:
>>
>> insert into btree_test mod(i,10), md5(i::text)
>>   from generate_series(1, $ROWS) s(i)
>>
>> select * from btree_test where a in (999, 1000, 1001, 1002)
>>
>> The values are likely colocated on the same heap page, the bitmap scan
>> is going to do a single prefetch. With index scan we'll prefetch them
>> repeatedly. I'll give it a try.
> 
> This is the sort of thing that I was thinking of. What are the
> conditions under which bitmap index scan starts to make sense? Why is
> the break-even point whatever it is in each case, roughly? And, is it
> actually because of laws-of-physics level trade-off? Might it not be
> due to implementation-level issues that are much less fundamental? In
> other words, might it actually be that we're just doing something
> stoopid in the case of plain index scans? Something that is just
> papered-over by bitmap index scans right now?
> 

Yeah, that's partially why I do this kind of testing on a wide range of
synthetic data sets - to find cases that behave in unexpected way (say,
seem like they should improve but don't).

> I see that your patch has logic that avoids repeated prefetching of
> the same block -- plus you have comments that wonder about going
> further by adding a "small lru array" in your new index_prefetch()
> function. I asked you about this during the unconference presentation.
> But I think that my understanding of the situation was slightly
> different to yours. That's relevant here.
> 
> I wonder if you should go further than this, by actually sorting the
> items that you need to fetch as part of processing a given leaf page
> (I said this at the unconference, you may recall). Why should we
> *ever* pin/access the same heap page more than once per leaf pa

Re: Error in calculating length of encoded base64 string

2023-06-09 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> Gurjeet Singh  writes:
>> On Thu, Jun 8, 2023 at 7:35 AM Tom Lane  wrote:
>>> This bug is very ancient, dating to commit 79d78bb26 which
>>> added encode.c.  (The other instances were presumably copied
>>> from there.)  Still, it doesn't quite seem worth back-patching.
>
>> Is it worth investing time in trying to unify these 3 occurrences of
>> base64 length (and possibly other relevant) code to one place? If yes,
>> I can volunteer for it.
>
> I wondered about that too.  It seems really silly that we made
> a copy in src/common and did not replace the others with calls
> to that.

Also, while we're at it, how about some unit tests that both encode and
calculate the encoded length of strings of various lengths and check
that they match?

- ilmari




Re: index prefetching

2023-06-09 Thread Tomas Vondra
On 6/9/23 02:06, Andres Freund wrote:
> Hi,
> 
> On 2023-06-08 17:40:12 +0200, Tomas Vondra wrote:
>> At pgcon unconference I presented a PoC patch adding prefetching for
>> indexes, along with some benchmark results demonstrating the (pretty
>> significant) benefits etc. The feedback was quite positive, so let me
>> share the current patch more widely.
> 
> I'm really excited about this work.
> 
> 
>> 1) pairing-heap in GiST / SP-GiST
>>
>> For most AMs, the index state is pretty trivial - matching items from a
>> single leaf page. Prefetching that is pretty trivial, even if the
>> current API is a bit cumbersome.
>>
>> Distance queries on GiST and SP-GiST are a problem, though, because
>> those do not just read the pointers into a simple array, as the distance
>> ordering requires passing stuff through a pairing-heap :-(
>>
>> I don't know how to best deal with that, especially not in the simple
>> API. I don't think we can "scan forward" stuff from the pairing heap, so
>> the only idea I have is actually having two pairing-heaps. Or maybe
>> using the pairing heap for prefetching, but stashing the prefetched
>> pointers into an array and then returning stuff from it.
>>
>> In the patch I simply prefetch items before we add them to the pairing
>> heap, which is good enough for demonstrating the benefits.
> 
> I think it'd be perfectly fair to just not tackle distance queries for now.
> 

My concern is that if we cut this from v0 entirely, we'll end up with an
API that'll not be suitable for adding distance queries later.

> 
>> 2) prefetching from executor
>>
>> Another question is whether the prefetching shouldn't actually happen
>> even higher - in the executor. That's what Andres suggested during the
>> unconference, and it kinda makes sense. That's where we do prefetching
>> for bitmap heap scans, so why should this happen lower, right?
> 
> Yea. I think it also provides potential for further optimizations in the
> future to do it at that layer.
> 
> One thing I have been wondering around this is whether we should not have
> split the code for IOS and plain indexscans...
> 

Which code? We already have nodeIndexscan.c and nodeIndexonlyscan.c? Or
did you mean something else?

> 
>> 4) per-leaf prefetching
>>
>> The code is restricted only prefetches items from one leaf page. If the
>> index scan needs to scan multiple (many) leaf pages, we have to process
>> the first leaf page first before reading / prefetching the next one.
>>
>> I think this is acceptable limitation, certainly for v0. Prefetching
>> across multiple leaf pages seems way more complex (particularly for the
>> cases using pairing heap), so let's leave this for the future.
> 
> Hm. I think that really depends on the shape of the API we end up with. If we
> move the responsibility more twoards to the executor, I think it very well
> could end up being just as simple to prefetch across index pages.
> 

Maybe. I'm open to that idea if you have idea how to shape the API to
make this possible (although perhaps not in v0).

> 
>> 5) index-only scans
>>
>> I'm not sure what to do about index-only scans. On the one hand, the
>> point of IOS is not to read stuff from the heap at all, so why prefetch
>> it. OTOH if there are many allvisible=false pages, we still have to
>> access that. And if that happens, this leads to the bizarre situation
>> that IOS is slower than regular index scan. But to address this, we'd
>> have to consider the visibility during prefetching.
> 
> That should be easy to do, right?
> 

It doesn't seem particularly complicated (famous last words), and we
need to do the VM checks anyway so it seems like it wouldn't add a lot
of overhead either

> 
> 
>> Benchmark / TPC-H
>> -
>>
>> I ran the 22 queries on 100GB data set, with parallel query either
>> disabled or enabled. And I measured timing (and speedup) for each query.
>> The speedup results look like this (see the attached PDF for details):
>>
>> queryserialparallel
>> 1  101% 99%
>> 2  119%100%
>> 3  100% 99%
>> 4  101%100%
>> 5  101%100%
>> 6   12% 99%
>> 7  100%100%
>> 8   52% 67%
>> 10 102%101%
>> 11 100% 72%
>> 12 101%100%
>> 13 100%101%
>> 14  13%100%
>> 15 101%100%
>> 16  99% 99%
>> 17  95%101%
>> 18 101%106%
>> 19  30% 40%
>> 20  99%100%
>> 21 101%100%
>> 22 101%107%
>>
>> The percentage is (timing patched / master, so <100% means faster, >100%
>> means slower).
>>
>> The different queries are affected depending on the query plan - many
>> queries are close to 100%, which means "no difference". For the serial
>> case, there

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

2023-06-09 Thread Drouvot, Bertrand

Hi,

On 6/9/23 11:20 AM, Masahiro Ikeda wrote:

Hi,

On 2023-06-09 13:26, Drouvot, Bertrand wrote:

Hi,

On 6/9/23 1:15 AM, Michael Paquier wrote:

On Thu, Jun 08, 2023 at 10:57:55AM +0900, Masahiro Ikeda wrote:

(Excuse me for cutting in, and this is not directly related to the thread.)
+1. I'm interested in the feature.

This is just a example and it probable be useful for other users. IMO, at
least, it's better to improve the specification that "Extension"
wait event type has only the "Extension" wait event.


I hope that nobody would counter-argue you here.  In my opinion, we
should just introduce an API that allows extensions to retrieve wait
event numbers that are allocated by the backend under
PG_WAIT_EXTENSION, in a fashion similar to GetNamedLWLockTranche().
Say something like:
int GetExtensionWaitEvent(const char *wait_event_name);


+1, that's something I've in mind to work on once/if this patch is/get
committed.


Thanks for replying. If you are ok, I'll try to make a patch
to allow extensions to define custom wait events.


Great, thank you!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Support logical replication of DDLs

2023-06-09 Thread Ajin Cherian
On Thu, Jun 8, 2023 at 10:02 PM shveta malik  wrote:
> Please find new set of patches addressing below:
> a) Issue mentioned by Wang-san in [1],
> b) Comments from Peter given in [2]
> c) Comments from Amit given in the last 2 emails.
>
> [1]: 
> https://www.postgresql.org/message-id/OS3PR01MB62750D43D4F7F075B33BD2609E52A%40OS3PR01MB6275.jpnprd01.prod.outlook.com
> [2]: 
> https://www.postgresql.org/message-id/CAHut%2BPv9vPbUQc0fzrKmDkKOsS_bj-hup_E%2BsLHNEX%2B6F%2BSY5Q%40mail.gmail.com
>
> Thank You Vignesh for handling (a), Ajin for handling (b), Shi-san and
> Hou-san for contributing in (c).

On Fri, May 5, 2023 at 8:10 PM Peter Smith  wrote:
>
> I revisited the 0005 patch to verify all changes made to address my
> previous review comments [1][2][3][4] were OK.
>
> Not all changes were made quite as expected, and there were a few
> other things I noticed in passing.
>
> ==
>
> 1. General
>
> I previously [1] wrote a comment:
> Use consistent uppercase for JSON and DDL instead of sometimes json
> and ddl. There are quite a few random examples in the commit message
> but might be worth searching the entire patch to make all comments
> also consistent case.
>
> Now I still find a number of lowercase "json" and "ddl" strings.
>

fixed


>
> 3. Commit message
>
> Executing a non-immutable expression during the table rewrite phase is not
> allowed, as it may result in different data between publisher and subscriber.
> While some may suggest converting the rewrite inserts to updates and replicate
> them afte the ddl command to maintain data consistency. But it doesn't work if
> the replica identity column is altered in the command. This is because the
> rewrite inserts do not contain the old values and therefore cannot be 
> converted
> to update.
>
> ~
>
> 3a.
> Grammar and typo need fixing for "While some may suggest converting
> the rewrite inserts to updates and replicate them afte the ddl command
> to maintain data consistency. But it doesn't work if the replica
> identity column is altered in the command."
>
> ~
>
> 3b.
> "rewrite inserts to updates"
> Consider using uppercase for the INSERTs and UPDATEs
>
> ~~~
>
> 4.
> LIMIT:
>
> --> LIMITATIONS: (??)
>

Fixed.

>
> ==
> contrib/test_decoding/sql/ddl.sql
>
> 5.
> +SELECT 'ddl msg2' FROM pg_logical_emit_ddl_message('ddl msg2', 16394,
> 1, '{"fmt": "CREATE SCHEMA %{if_not_exists}s %{name}I
> %{authorization}s", "name": "foo", "authorization": {"fmt":
> "AUTHORIZATION %{authorization_role}I", "present": false,
> "authorization_role": null}, "if_not_exists": ""}');
>
> Previously ([4]#1) I had asked what is the point of setting a JSON
> payload here when the JSON payload is never used. You might as well
> pass the string "banana" to achieve the same thing AFAICT. I think the
> reply [5] to the question was wrong. If this faked JSON is used for
> some good reason then there ought to be a test comment to say the
> reason. Otherwise, the fake JSON just confuses the purpose of this
> test so it should be removed/simplified.
>

added a comment explainig this

> ==
> contrib/test_decoding/test_decoding.c
>
> 6. pg_decode_ddl_message
>
> Previously ([4] #4b) I asked if it was necessary to use
> appendBinaryStringInfo, instead of just appendStringInfo. I guess it
> doesn't matter much, but I think the question was not answered.
>

Although we're using plain strings, the API supports binary. Other plugins
could do use binary.

> ==
> doc/src/sgml/catalogs.sgml
>
> 7.
> + 
> +  
> +   evtisinternal char
> +  
> +  
> +   True if the event trigger is internally generated.
> +  
> + 
>
> Why was this called a 'char' type instead of a 'bool' type?
>

fixed.

> ==
> doc/src/sgml/logical-replication.sgml
>
> 8.
> +  
> +For example, a CREATE TABLE command executed on the publisher gets
> +WAL-logged, and forwarded to the subscriber to replay; a subsequent 
> "ALTER
> +SUBSCRIPTION ... REFRESH PUBLICATION" is performed on the
> subscriber database so any
> +following DML changes on the new table can be replicated.
> +  
>
> In my previous review comments ([2] 11b) I suggested for this to say
> "then an implicit ALTER..." instead of "a subsequent ALTER...". I
> think the "implicit" part got accidentally missed.
>

fixed.

> ~~~
>
> 9.
> +
> +  
> +In ADD COLUMN ... DEFAULT clause and
> +ALTER COLUMN TYPE clause of ALTER
> +TABLE command, the functions and operators used in
> +expression must be immutable.
> +  
> +
>
> IMO this is hard to read. It might be easier if expressed as 2
> separate bullet points.
>
> SUGGESTION
> For ALTER TABLE ... ADD COLUMN ... DEFAULT, the functions and
> operators used in expressions must be immutable.
>
> For ALTER TABLE ... ADD COLUMN TYPE, the functions and operators used
> in expressions must be immutable.
>

fixed.

> ~~~
>
> 10.
> +  
> +To change the column type, first a

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

2023-06-09 Thread Masahiro Ikeda

Hi,

On 2023-06-09 13:26, Drouvot, Bertrand wrote:

Hi,

On 6/9/23 1:15 AM, Michael Paquier wrote:

On Thu, Jun 08, 2023 at 10:57:55AM +0900, Masahiro Ikeda wrote:
(Excuse me for cutting in, and this is not directly related to the 
thread.)

+1. I'm interested in the feature.

This is just a example and it probable be useful for other users. 
IMO, at

least, it's better to improve the specification that "Extension"
wait event type has only the "Extension" wait event.


I hope that nobody would counter-argue you here.  In my opinion, we
should just introduce an API that allows extensions to retrieve wait
event numbers that are allocated by the backend under
PG_WAIT_EXTENSION, in a fashion similar to GetNamedLWLockTranche().
Say something like:
int GetExtensionWaitEvent(const char *wait_event_name);


+1, that's something I've in mind to work on once/if this patch is/get
committed.


Thanks for replying. If you are ok, I'll try to make a patch
to allow extensions to define custom wait events.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: Inconsistent results with libc sorting on Windows

2023-06-09 Thread Daniel Verite
Juan José Santamaría Flecha wrote:

> Just to make sure we are all seeing the same problem, does the attached
> patch fix your test?

The problem of the random changes in sorting disappears for all libc
locales in pg_collation, so this is very promising.

However it persists for the default collation.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Remove distprep

2023-06-09 Thread Peter Eisentraut
Per discussion at the unconference[0], I started to write a patch that 
removes the make distprep target.  A more detailed explanation of the 
rationale is also in the patch.


It needs some polishing around the edges, but I wanted to put it out 
here to get things moving and avoid duplicate work.


One thing in particular that isn't clear right now is how "make world" 
should behave if the documentation tools are not found.  Maybe we should 
make a build option, like "--with-docs", to mirror the meson behavior.


[0]: 
https://wiki.postgresql.org/wiki/PgCon_2023_Developer_Unconference#Build_SystemFrom b187ce08fe140225f9ff24bf3ae4d2e97f57221d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 9 Jun 2023 10:53:51 +0200
Subject: [PATCH v1] Remove distprep

A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl

FIXME: What should happen in configure if xmllint and xsltproc are not
found?
---
 GNUmakefile.in|  6 +-
 config/Makefile   |  2 -
 config/meson.build|  2 +-
 config/missing| 54 
 config/perl.m4|  9 +-
 config/programs.m4| 21 +
 configure | 62 ++
 contrib/cube/Makefile |  7 +-
 contrib/fuzzystrmatch/Makefile|  9 +-
 contrib/seg/Makefile  |  7 +-
 doc/Makefile  |  2 +-
 doc/src/Makefile  |  2 +-
 doc/src/sgml/Makefile | 31 +--
 doc/src/sgml/installation.sgml| 83 ---
 meson.build   | 56 +
 src/Makefile  |  5 +-
 src/Makefile.global.in| 32 +++
 src/backend/Makefile  | 52 ++--
 src/backend/bootstrap/Makefile|  6 +-
 src/backend/catalog/Makefile  |  8 +-
 src/backend/jit/llvm/Makefile |  2 +-
 src/backend/nls.mk|  2 +-
 src/backend/nodes/Makefile|  6 +-
 src/backend/parser/Makefile   |  8 +-
 src/backend/port/Makefile  

Re: Views no longer in rangeTabls?

2023-06-09 Thread David Steele

On 6/9/23 11:28, David Steele wrote:


It seems the thing to do here would be to scan permInfos instead, which 
works fine except that we also need access to rellockmode, which is only 
included in rangeTabls. We can add a scan of rangeTabls to get 
rellockmode when needed and we might be better off overall since 
permInfos will generally have fewer entries. I have not implemented this 
yet but it seems like it will work.


I implemented this and it does work, but it was not as straight forward 
as I would have liked. To make the relationship from RTEPermissionInfo 
back to RangeTblEntry I was forced to generate my own perminfoindex.


This was not hard to do but seems a bit fragile. Perhaps we need an 
rteindex in RTEPermissionInfo? This would also avoid the need to scan 
rangeTabls.


Regards,
-David




Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys

2023-06-09 Thread Richard Guo
On Fri, Jun 9, 2023 at 8:13 AM David Rowley  wrote:

> After looking again at nodeWindowAgg.c, I think it might be possible
> to do a bit more work and apply this to ORDER BY items too.  Without
> an ORDER BY clause, all rows in the partition are peers of each other,
> and if the ORDER BY column is redundant due to belonging to a
> redundant pathkey, then those rows must also be peers too since the
> redundant pathkey must mean all rows have an equal value in the
> redundant column.
>
> However, there is a case where we must be much more careful.  The
> comment you highlighted in create_windowagg_plan() does mention this.
> It reads "we must *not* remove the ordering column for RANGE OFFSET
> cases".


I see.  I tried to run the query below

select a, b, sum(a) over (order by b range between 10 preceding and current
row) from t where b = 2;
server closed the connection unexpectedly

and if we've removed redundant items in wc->orderClause the query would
trigger the Assert in update_frameheadpos().

/* We must have an ordering column */
Assert(node->ordNumCols == 1);


>
> It might be possible to make adjustments in nodeWindowAgg.c to have
> the equality checks come out as true when there is no ORDER BY.
> update_frameheadpos() is one location that would need to be adjusted.
> It would need further study to ensure we don't accidentally break
> anything. I've not done that study, so won't be adjusting the patch
> for now.


I'm also not sure if doing that is safe in all cases.  Hmm, do you think
we can instead check wc->frameOptions to see if it is the RANGE OFFSET
case in make_pathkeys_for_window(), and decide to not remove or remove
redundant ORDER BY items according to whether it is or not RANGE OFFSET?

Thanks
Richard


Views no longer in rangeTabls?

2023-06-09 Thread David Steele

Hackers,

While updating pgAudit for PG16 I found the following (from our 
perspective) regression.


In prior versions of Postgres, views were listed in rangeTabls when 
ExecutorCheckPerms_hook() was called but in PG16 the views are no longer 
in this list. The permissions have been broken out into permInfos as of 
a61b1f748 and this list does include the view.


It seems the thing to do here would be to scan permInfos instead, which 
works fine except that we also need access to rellockmode, which is only 
included in rangeTabls. We can add a scan of rangeTabls to get 
rellockmode when needed and we might be better off overall since 
permInfos will generally have fewer entries. I have not implemented this 
yet but it seems like it will work.


From reading the discussion it appears this change to rangeTabls was 
intentional, but I wonder if I am missing something. For instance, is 
there a better way to get rellockmode when scanning permInfos?


It seems unlikely that we are the only ones using rangeTabls in an 
extension, so others might benefit from having an answer to this on list.


Thanks,
-David




Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15

2023-06-09 Thread Gurjeet Singh
On Fri, Jun 9, 2023 at 12:28 AM Gregory Smith  wrote:
>
> Let me start with the happy ending to this thread:

Phew! I'm sure everyone would be relieved to know this was a false alarm.

> On Thu, Jun 8, 2023 at 9:21 PM Andres Freund  wrote:
>>
>> You might need to add --no-children to the perf report invocation, otherwise
>> it'll show you the call graph inverted.
>
>
> My problem was not writing kernel symbols out, I was only getting addresses 
> for some reason.  This worked:
>
>   sudo perf record -g --call-graph dwarf -d --phys-data -a sleep 1
>   perf report --stdio

There is no mention of perf or similar utilities in pgbench-tools
docs. I'm guessing Linux is the primary platform pgbench-tools gets
used on most. If so, I think it'd be useful to mention these tools and
snippets in there to make others lives easier, when they find
themselves scratching heads.

Best regards,
Gurjeet
http://Gurje.et




Re: GTIN14 support for contrib/isn

2023-06-09 Thread Michael Kefeder


--- Original Message ---
On Thursday, June 8th, 2023 at 5:23 PM, Josef Šimánek  
wrote:


> čt 8. 6. 2023 v 17:20 odesílatel Michael Kefeder m...@multiwave.ch napsal:
> 
> > Am 15.03.19 um 17:27 schrieb Tom Lane:
> > 
> > > Michael Kefeder m...@multiwave.ch writes:
> > > 
> > > > For a project of ours we need GTIN14 data type support.
> > > 
> > > Hm, what is that and where would a reviewer find the specification for it?
> > 
> > specs are from GS1 here https://www.gs1.org/standards/id-keys/gtin
> > side-note EAN13 is actually called GTIN-13 now. Wikipedia has a quick
> > overview https://en.wikipedia.org/wiki/Global_Trade_Item_Number
> > 
> > > > Looking at the code I saw every format that isn-extension supports is
> > > > stored as an EAN13. Theoretically that can be changed to be GTIN14, but
> > > > that would mean quite a lot of rewrite I feared, so I chose to code only
> > > > GTIN14 I/O separetely to not interfere with any existing conversion
> > > > magic. This yields an easier to understand patch and doesn't touch
> > > > existing functionality. However it introduces redundancy to a certain
> > > > extent.
> > > 
> > > Yeah, you certainly don't get to change the on-disk format of the existing
> > > types, unfortunately. Not sure what the least messy way of dealing with
> > > that is. I guess we do want this to be part of contrib/isn rather than
> > > an independent module, if there are sane datatype conversions with the
> > > existing isn types.
> > 
> > the on-disk format does not change (it would support even longer codes
> > it's just an integer where one bit is used for valid/invalid flag, did
> > not touch that at all). Putting GTIN14 in isn makes sense I find and is
> > back/forward compatible.
> > 
> > > > Find my patch attached. Please let me know if there are things that need
> > > > changes, I'll do my best to get GTIN support into postgresql.
> > > 
> > > Well, two comments immediately:
> > > 
> > > * where's the documentation changes?
> > > 
> > > * simply editing the .sql file in-place is not acceptable; that breaks
> > > the versioning conventions for extensions, and leaves users with no
> > > easy upgrade path. What you need to do is create a version upgrade
> > > script that adds the new objects. For examples look for other recent
> > > patches that have added features to contrib modules, eg
> > > 
> > > https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=eb6f29141bed9dc95cb473614c30f470ef980705
> > > 
> > > Also, I'm afraid you've pretty much missed the deadline to get this
> > > into PG v12; we've already got more timely-submitted patches than
> > > we're likely to be able to finish reviewing. Please add it to the
> > > first v13 commit fest,
> > > 
> > > https://commitfest.postgresql.org/23/
> > > 
> > > so that we don't forget about it when the time does come to look at it.
> > > 
> > > regards, tom lane
> > 
> > thanks for the feedback! will do mentioned documentation changes and
> > create a separate upgrade sql file. Making it into v13 is fine by me.
> 
> 
> Hello!
> 
> If I understand it well, this patch wasn't finished and submitted
> after this discussion. If there is still interest, I can try to polish
> the patch, rebase and submit. I'm interested in GTIN14 support.
> 

Hello Josef,

>From my side you can finish the patch. Sorry that I didn't follow up on it, 
>the company completely switched product line and then I forgot about it 
>because we no longer needed it.

br
 mike




Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15

2023-06-09 Thread Gregory Smith
Let me start with the happy ending to this thread:

$ pgbench -S -T 10 -c 32 -j 32 -M prepared -P 1 pgbench
pgbench (15.3 (Ubuntu 15.3-1.pgdg23.04+1))
progress: 1.0 s, 1015713.0 tps, lat 0.031 ms stddev 0.007, 0 failed
progress: 2.0 s, 1083780.4 tps, lat 0.029 ms stddev 0.007, 0 failed...
progress: 8.0 s, 1084574.1 tps, lat 0.029 ms stddev 0.001, 0 failed
progress: 9.0 s, 1082665.1 tps, lat 0.029 ms stddev 0.001, 0 failed
tps = 1077739.910163 (without initial connection time)

Which even seems a whole 0.9% faster than 14 on this hardware!  The wonders
never cease.

On Thu, Jun 8, 2023 at 9:21 PM Andres Freund  wrote:

> You might need to add --no-children to the perf report invocation,
> otherwise
> it'll show you the call graph inverted.
>

My problem was not writing kernel symbols out, I was only getting addresses
for some reason.  This worked:

  sudo perf record -g --call-graph dwarf -d --phys-data -a sleep 1
  perf report --stdio

And once I looked at the stack trace I immediately saw the problem, fixed
the config option, and this report is now closed as PEBKAC on my part.
Somehow I didn't notice the 15 installs on both systems had
log_min_duration_statement=0, and that's why the performance kept dropping
*only* on the fastest runs.

What I've learned today then is that if someone sees osq_lock in simple
perf top out on oddly slow server, it's possible they are overloading a
device writing out log file data, and leaving out the boring parts the call
trace you might see is:

EmitErrorReport
 __GI___libc_write
  ksys_write
   __fdget_pos
mutex_lock
 __mutex_lock_slowpath
  __mutex_lock.constprop.0
   71.20% osq_lock

Everyone was stuck trying to find the end of the log file to write to it,
and that was the entirety of the problem.  Hope that call trace and info
helps out some future goofball making the same mistake.  I'd wager this
will come up again.

Thanks to everyone who helped out and I'm looking forward to PG16 testing
now that I have this rusty, embarrassing warm-up out of the way.

--
Greg Smith  greg.sm...@crunchydata.com
Director of Open Source Strategy