date:20240121

Re: introduce dynamic shared memory registry

2024-01-21 Thread Michael Paquier

On Sun, Jan 21, 2024 at 04:13:20PM -0600, Nathan Bossart wrote:
> Oops.  I've attached an attempt at fixing this.  I took the opportunity to
> clean up the surrounding code a bit.

Thanks for the patch.  Your proposed attempt looks correct to me with
an ERROR when no segments are found..
--
Michael


signature.asc
Description: PGP signature

Re: Synchronizing slots from primary to standby

2024-01-21 Thread Bertrand Drouvot

Hi,

On Fri, Jan 19, 2024 at 05:23:53PM +0530, Amit Kapila wrote:
> On Wed, Jan 17, 2024 at 4:00 PM shveta malik  wrote:
> >
> 
> Now, the concerns related to this could be that users would probably
> need to change existing mechanisms/tools to update priamry_conninfo

Yeah, for the ones that want the sync slot feature.

> and one of the alternatives proposed is to have an additional GUC like
> slot_sync_dbname. Users won't be able to drop the database this worker
> is connected to aka whatever is specified in slot_sync_dbname but as
> the user herself sets up the configuration it shouldn't be a big deal.

Same point of view here.

> Then we also discussed whether extending libpqwalreceiver's connect
> API is a good idea and whether we need to further extend it in the
> future. As far as I can see, slotsync worker's primary requirement is
> to execute SQL queries which the current API is sufficient, and don't
> see something that needs any drastic change in this API. Note that
> tablesync worker that executes SQL also uses these APIs, so we may
> need something in the future for either of those. Then finally we need
> a slotsync worker to also connect to a database to use SQL and fetch
> results.
>

On my side the nits concerns about using the libpqrcv_connect / walrcv_connect 
are:

- cosmetic: the "rcv" do not really align with the sync slot worker
- we're using a WalReceiverConn, while a PGconn should suffice. From what I can
see the "overhead" is (1 byte + 7 bytes hole + 8 bytes). I don't think that's a
big deal even if we switch to a multi sync slot worker design later on.

Those have already been discussed in [1] and I'm fine with them.

> Now, let us consider if we extend the replication commands like
> READ_REPLICATION_SLOT and or introduce a new set of replication
> commands to fetch the required information then we don't need a DB
> connection with primary or a connection in slotsync worker. As per my
> current understanding, it is quite doable but I think we will slowly
> go in the direction of making replication commands something like SQL
> because today we need to extend it to fetch all slots info that have
> failover marked as true, the existence of a particular replication,
> etc. Then tomorrow, if we want to extend this work to have multiple
> slotsync workers say workers perdb then we have to extend the
> replication command to fetch per-database failover marked slots. To
> me, it sounds more like we are slowly adding SQL-like features to
> replication commands.

Agree. Also it seems to me that extending the replication commands is more like
a one-way door change.

> Apart from this when we are reading per-db replication slots without
> connecting to a database, we probably need some additional protection
> mechanism so that the database won't get dropped.
> 
> Considering all this it seems that for now probably extending
> replication commands can simplify a few things like mentioned above
> but using SQL's with db-connection is more extendable.

I'd vote for using a SQL db-connection (like we are doing currently).
It seems more extendable and more a two-way door (as compared to extending the
replication commands): I think it still gives us the flexibility to switch to
extending the replication commands if we want to in the future.

[1]: 
https://www.postgresql.org/message-id/ZZe6sok7IWmhKReU%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Use of backup_label not noted in log

2024-01-21 Thread Michael Paquier

On Fri, Jan 19, 2024 at 09:32:26AM -0400, David Steele wrote:
> Any status on this patch? If we do back patch it would be nice to see this
> in the upcoming minor releases. I'm in favor of a back patch, as I think
> this is minimally invasive and would be very useful for debugging recovery
> issues.

I am not sure about the backpatch part, but on a second look I'm OK
with applying it on HEAD for now with the LOG added for the startup of
recovery when the backup_label file is read, for the recovery
completed from a backup, and for the restart from a backup.

> I like the phrasing you demonstrated in [1] but doesn't seem like there's a
> new patch for that, so I have attached one.

+   if (ControlFile->backupStartPoint != InvalidXLogRecPtr)

Nit 1: I would use XLogRecPtrIsInvalid here.

+   ereport(LOG,
+   (errmsg("completed backup recovery with redo LSN %X/%X",
+   LSN_FORMAT_ARGS(oldBackupStartPoint;

Nit 2: How about adding backupEndPoint in this LOG?  That would give:
"completed backup recovery with redo LSN %X/%X and end LSN %X/%X".
--
Michael


signature.asc
Description: PGP signature

make dist using git archive

2024-01-21 Thread Peter Eisentraut

One of the goals is to make the creation of the distribution tarball 
more directly traceable to the git repository.  That is why we removed 
the "make distprep" step.


Here I want to take another step in that direction, by changing "make 
dist" to directly use "git archive", rather than the custom shell script 
it currently runs.


The simple summary is that it would run

git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o 
postgresql-17devel.tar.gz


(with appropriate version numbers, of course), and that's the tarball we 
would ship.


There are analogous commands for other compression formats.

The actual command gets subtly more complicated if you need to run this 
in a separate build directory.  In my attached patch, the make version 
doesn't support vpath at the moment, just so that it's easier to 
understand for now.  The meson version is a bit hairy.


I have studied and tested this quite a bit, and I have found that the 
archives produced this way are deterministic and reproducible, meaning 
for a given commit the result file should always be bit-for-bit identical.


The exception is that if you use a git version older than 2.38.0, gzip 
records the platform in the archive, so you'd get a different output on 
Windows vs. macOS vs. "UNIX" (everything else).  In git 2.38.0, this was 
changed so that everything is recorded as "UNIX" now.  This is just 
something to keep in mind.  This issue is specific to the gzip format, 
it does not affect other compression formats.


Meson has its own distribution building command (meson dist), but opted 
against using that.  The main problem is that the way they have 
implemented it, it is not deterministic in the above sense.  (Another 
point is of course that we probably want a "make" version for the time 
being.)


But the target name "dist" in meson is reserved for that reason, so I 
needed to call the custom target "pgdist".


I did take one idea from meson: It runs a check before git archive that 
the checkout is clean.  That way, you avoid mistakes because of 
uncommitted changes.  This works well in my "make" implementation.  In 
the meson implementation, I had to find a workaround, because a 
custom_target cannot have a dependency on a run_target.  As also 
mentioned above, the whole meson implementation is a bit ugly.


Anyway,  with the attached patch you can do

make dist

or

meson compile -C build pgdist

and it produces the same set of tarballs as before, except it's done 
differently.


The actual build scripts need some fine-tuning, but the general idea is 
correct, I think.From 4b128faca90238d0a0bb6949a8050c2501d1bd67 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 20 Jan 2024 21:54:36 +0100
Subject: [PATCH v0] make dist uses git archive

---
 GNUmakefile.in | 34 --
 meson.build| 38 ++
 2 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index eba569e930e..3e04785ada2 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git
+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+.PHONY: check-dirty-index
+check-dirty-index:
+   $(GIT) diff-index --quiet HEAD
+
+$(distdir).tar.gz: check-dirty-index
+   $(GIT) archive --format tar.gz --prefix $(distdir)/ HEAD -o $@
+
+$(distdir).tar.bz2: check-dirty-index
+   $(GIT) -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 
--prefix $(distdir)/ HEAD -o $@
 
 distcheck: dist
rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index c317144b6bc..f0d870c5192 100644
--- a/meson.build
+++ b/meson.build
@@ -3347,6 +3347,44 @@ run_target('help',
 
 
 
+###
+# Distribution archive
+###
+
+git = find_program('git', required: false, native: true, disabler: true)
+bzip2 = find_program('bzip2', required: false, native: true, disabler: true)
+
+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index =

Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-21 Thread Masahiko Sawada

On Mon, Jan 22, 2024 at 2:36 PM John Naylor  wrote:
>
> On Mon, Jan 22, 2024 at 10:28 AM Masahiko Sawada  
> wrote:
> >
> > On further thought, as you pointed out before, "num_tids" should not
> > be in tidstore in terms of integration with tidbitmap.c, because
> > tidbitmap.c has "lossy pages". With lossy pages, "num_tids" is no
> > longer accurate and useful. Similarly, looking at tidbitmap.c, it has
> > npages and nchunks but they will not be necessary in lazy vacuum use
> > case. Also, assuming that we support parallel heap pruning, probably
> > we need to somehow lock the tidstore while adding tids to the tidstore
> > concurrently by parallel vacuum worker. But in tidbitmap use case, we
> > don't need to lock the tidstore since it doesn't have multiple
> > writers.
>
> Not currently, and it does seem bad to require locking where it's not 
> required.
>
> (That would be a prerequisite for parallel index scan. It's been tried
> before with the hash table, but concurrency didn't scale well with the
> hash table. I have no reason to think that the radix tree would scale
> significantly better with the same global LW lock, but as you know
> there are other locking schemes possible.)
>
> > Given these facts, different statistics and different lock
> > strategies are required by different use case. So I think there are 3
> > options:
> >
> > 1. expose lock functions for tidstore and the caller manages the
> > statistics in the outside of tidstore. For example, in lazyvacuum.c we
> > would have a TidStore for tid storage as well as VacDeadItemsInfo that
> > has num_tids and max_bytes. Both are in LVRelState. For parallel
> > vacuum, we pass both to the workers via DSM and pass both to function
> > where the statistics are required. As for the exposed lock functions,
> > when adding tids to the tidstore, the caller would need to call
> > something like TidStoreLockExclusive(ts) that further calls
> > LWLockAcquire(ts->tree.shared->ctl.lock, LW_EXCLUSIVE) internally.
>
> The advantage here is that vacuum can avoid locking entirely while
> using shared memory, just like it does now, and has the option to add
> it later.

True.

> IIUC, the radix tree struct would have a lock member, but wouldn't
> take any locks internally? Maybe we still need one for
> RT_MEMORY_USAGE? For that, I see dsa_get_total_size() takes its own
> DSA_AREA_LOCK -- maybe that's enough?

I think that's a good point. So there will be no place where the radix
tree takes any locks internally.

>
> That seems simplest, and is not very far from what we do now. If we do
> this, then the lock functions should be where we branch for is_shared.

Agreed.

>
> > 2. add callback functions to tidstore so that the caller can do its
> > work while holding a lock on the tidstore. This is like the idea we
> > just discussed for radix tree. The caller passes a callback function
> > and user data to TidStoreSetBlockOffsets(), and the callback is called
> > after setting tids. Similar to option 1, the statistics need to be
> > stored in a different area.
>
> I think we'll have to move to something like this eventually, but it
> seems like overkill right now.

Right.

>
> > 3. keep tidstore.c and tidbitmap.c separate implementations but use
> > radix tree in tidbitmap.c. tidstore.c would have "num_tids" in its
> > control object and doesn't have any lossy page support. On the other
> > hand, in tidbitmap.c we replace simplehash with radix tree. This makes
> > tidstore.c simple but we would end up having different data structures
> > for similar usage.
>
> They have so much in common that it's worth it to use the same
> interface and (eventually) value type. They just need separate paths
> for adding tids, as we've discussed.

Agreed.

>
> > I think it's worth trying option 1. What do you think, John?
>
> +1

Thanks!

Before working on this idea, since the latest patches conflict with
the current HEAD, I share the latest patch set (v54). Here is the
summary:

- As for radix tree part, it's based on v53 patch. I've squashed most
of cleanups and changes in v53 except for "DRAFT: Stop using invalid
pointers as placeholders." as I thought you might want to still work
on it. BTW it includes "#undef RT_SHMEM".
- As for tidstore, it's based on v51. That is, it still has the
control object and num_tids there.
- As for vacuum integration, it's also based on v51. But we no longer
need to change has_lpdead_items and LVPagePruneState thanks to the
recent commit c120550edb8 and e313a61137.

For the next version patch, I'll work on this idea and try to clean up
locking stuff both in tidstore and radix tree. Or if you're already
working on some of them, please let me know. I'll review it.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v54-ART.tar.gz
Description: GNU Zip compressed data

Re: the s_lock_stuck on perform_spin_delay

2024-01-21 Thread Andy Fan


Andy Fan  writes:

> I found a speical case about checking it in errstart. So commit 3 in v7
> is added. 
>
> commit 757c67c1d4895ce6a523bcf5217af8eb2351e2a1 (HEAD -> s_stuck_v3)
> Author: yizhi.fzh 
> Date:   Mon Jan 22 07:14:29 2024 +0800
>
> Bypass SpinLock checking in SIGQUIT signal hander
> 

I used sigismember(, SIGQUIT) to detect if a process is doing a
quickdie, however this is bad not only because it doesn't work on
Windows, but also it has too poor performance even it impacts on
USE_ASSERT_CHECKING build only. In v8, I introduced a new global
variable quickDieInProgress to handle this.

-- 
Best Regards
Andy Fan

>From b6bb33994f479824a8fac6a1d076a103c16e9d69 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" 
Date: Fri, 19 Jan 2024 13:52:07 +0800
Subject: [PATCH v8 1/3] Detect more misuse of spin lock automatically

Spin lock are intended for *very* short-term locks, but it is possible
to be misused in many cases. e.g. Acquiring another LWLocks or regular
locks, memory allocation, errstart when holding a spin lock. this patch
would detect such misuse automatically in a USE_ASSERT_CHECKING build.

CHECK_FOR_INTERRUPTS should be avoided as well when holding a spin lock.
Depends on what signals are left to handle, PG may raise error/fatal
which would cause the code jump to some other places which is hardly to
release the spin lock anyway.
---
 src/backend/storage/lmgr/lock.c   |  6 
 src/backend/storage/lmgr/lwlock.c |  6 
 src/backend/storage/lmgr/spin.c   | 13 +
 src/backend/utils/error/elog.c|  7 +
 src/backend/utils/mmgr/mcxt.c | 16 +++
 src/include/miscadmin.h   | 12 +++-
 src/include/storage/spin.h| 46 +--
 7 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index c70a1adb9a..cb9969b860 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -776,6 +776,12 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	bool		found_conflict;
 	bool		log_lock = false;
 
+	/*
+	 * Spin lock should not be held for a long time, but the time needed here
+	 * may be too long, so let make sure no spin lock is held now.
+	 */
+	VerifyNoSpinLocksHeld();
+
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
 	lockMethodTable = LockMethods[lockmethodid];
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2f2de5a562..1a24687394 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1209,6 +1209,12 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 
 	Assert(mode == LW_SHARED || mode == LW_EXCLUSIVE);
 
+	/*
+	 * Spin lock should not be held for a long time, but the time needed here
+	 * may be too long, so let make sure no spin lock is held now.
+	 */
+	VerifyNoSpinLocksHeld();
+
 	PRINT_LWDEBUG("LWLockAcquire", lock, mode);
 
 #ifdef LWLOCK_STATS
diff --git a/src/backend/storage/lmgr/spin.c b/src/backend/storage/lmgr/spin.c
index 50cb99cd3b..08cc6da5d9 100644
--- a/src/backend/storage/lmgr/spin.c
+++ b/src/backend/storage/lmgr/spin.c
@@ -47,6 +47,9 @@ PGSemaphore *SpinlockSemaArray;
 
 #endif			/* HAVE_SPINLOCKS */
 
+volatile const char *last_spin_lock_file = NULL;
+volatile int last_spin_lock_lineno = -1;
+
 /*
  * Report the amount of shared memory needed to store semaphores for spinlock
  * support.
@@ -178,3 +181,13 @@ tas_sema(volatile slock_t *lock)
 }
 
 #endif			/* !HAVE_SPINLOCKS */
+
+void
+VerifyNoSpinLocksHeld(void)
+{
+#ifdef USE_ASSERT_CHECKING
+	if (last_spin_lock_file != NULL)
+		elog(PANIC, "A spin lock has been held at %s:%d",
+			 last_spin_lock_file, last_spin_lock_lineno);
+#endif
+}
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 2c7a20e3d3..22662955d2 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -351,6 +351,13 @@ errstart(int elevel, const char *domain)
 	bool		output_to_client = false;
 	int			i;
 
+	/*
+	 * Logging likely happens in many places without a outstanding attention,
+	 * and it's far more than a few dozen instructions, so it should be only
+	 * called when there is no spin lock is held.
+	 */
+	VerifyNoSpinLocksHeld();
+
 	/*
 	 * Check some cases in which we want to promote an error into a more
 	 * severe error.  None of this logic applies for non-error messages.
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 1336944084..7a14e347aa 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1028,6 +1028,13 @@ MemoryContextAlloc(MemoryContext context, Size size)
 	if (!AllocSizeIsValid(size))
 		elog(ERROR, "invalid memory alloc request size %zu", size);
 
+	/*
+	 * Memory allocation likely happens in many places without a outstanding
+	 * attention, and it's far more than a few dozen instructions, so it
+	 *

Re: Add system identifier to backup manifest

2024-01-21 Thread Amul Sul

On Mon, Jan 22, 2024 at 10:08 AM Amul Sul  wrote:

>
>
> On Fri, Jan 19, 2024 at 10:36 PM Amul Sul  wrote:
>
>> On Wed, Jan 17, 2024 at 8:40 PM Robert Haas 
>> wrote:
>>
>>>
>>>
>> Updated version is attached.
>>
>
> Another updated version attached -- fix missing manifest version check in
> pg_verifybackup before system identifier validation.
>

Thinking a bit more on this, I realized parse_manifest_file() has many out
parameters. Instead parse_manifest_file() should simply return manifest data
like load_backup_manifest().  Attached 0001 patch doing the same, and
removed
parser_context structure, and added manifest_data, and did the required
adjustments to pg_verifybackup code.

Regards,
Amul

v4-0001-pg_verifybackup-code-refactor.patch
Description: Binary data

v4-0002-Add-system-identifier-to-backup-manifest.patch
Description: Binary data

Re: index prefetching

2024-01-21 Thread Konstantin Knizhnik



On 22/01/2024 1:39 am, Tomas Vondra wrote:

Why we can prefer covering index  to compound index? I see only two good
reasons:
1. Extra columns type do not  have comparison function need for AM.
2. The extra columns are never used in query predicate.


Or maybe you don't want to include the columns in a UNIQUE constraint?

Do you mean that compound index (a,b) can not be used to enforce 
uniqueness of "a"?

If so, I agree.


If you are going to use this columns in query predicates I do not see
much sense in creating inclusive index rather than compound index.
Do you?


But this is also about conditions that can't be translated into index
scan keys. Consider this:

create table t (a int, b int, c int);
insert into t select 1000 * random(), 1000 * random(), 1000 * random()
from generate_series(1,100) s(i);
create index on t (a,b);
vacuum analyze t;

explain (analyze, buffers) select * from t where a = 10 and mod(b,10) =
111;
QUERY PLAN

-
  Index Scan using t_a_b_idx on t  (cost=0.42..3670.74 rows=5 width=12)
(actual time=4.562..4.564 rows=0 loops=1)
Index Cond: (a = 10)
Filter: (mod(b, 10) = 111)
Rows Removed by Filter: 974
Buffers: shared hit=980
Prefetches: blocks=901
  Planning Time: 0.304 ms
  Execution Time: 5.146 ms
(8 rows)

Notice that this still fetched ~1000 buffers in order to evaluate the
filter on "b", because it's complex and can't be transformed into a nice
scan key.


O yes.
Looks like I didn't understand the logic when predicate is included in 
index condition and when not.
It seems to be natural that only such predicate which specifies some 
range can be included in index condition.

But it is not the case:

postgres=# explain select * from t where a = 10 and b in (10,20,30);
 QUERY PLAN
-
 Index Scan using t_a_b_idx on t  (cost=0.42..25.33 rows=3 width=12)
   Index Cond: ((a = 10) AND (b = ANY ('{10,20,30}'::integer[])))
(2 rows)

So I though ANY predicate using index keys is included in index condition.
But it is not true (as your example shows).

But IMHO mod(b,10)=11 or (b+1) < 100 are both quite rare predicates 
this is why I named this use cases "exotic".


In any case, if we have some columns in index tuple it is desired to use 
them for filtering before extracting heap tuple.

But I afraid it will be not so easy to implement...

RE: speed up a logical replica setup

2024-01-21 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

> 
> 15.
> I found that subscriptions cannot be started if tuples are inserted on 
> publisher
> after creating temp_replslot. After starting a subscriber, I got below output 
> on the
> log.
> 
> ```
> ERROR:  could not receive data from WAL stream: ERROR:  publication
> "pg_subscriber_5" does not exist
> CONTEXT:  slot "pg_subscriber_5_3632", output plugin "pgoutput", in the change
> callback, associated LSN 0/30008A8
> LOG:  background worker "logical replication apply worker" (PID 3669) exited
> with exit code 1
> ```
> 
> But this is strange. I confirmed that the specified publication surely exists.
> Do you know the reason?
> 
> ```
> publisher=# SELECT pubname FROM pg_publication;
>  pubname
> -
>  pg_subscriber_5
> (1 row)
> ```
>

I analyzed and found a reason. This is because publications are invisible for 
some transactions.

As the first place, below operations were executed in this case.
Tuples were inserted after getting consistent_lsn, but before starting the 
standby.
After doing the workload, I confirmed again that the publication was created.

1. on primary, logical replication slots were created.
2. on primary, another replication slot was created.
3. ===on primary, some tuples were inserted. ===
4. on standby, a server process was started
5. on standby, the process waited until all changes have come.
6. on primary, publications were created.
7. on standby, subscriptions were created.
8. on standby, a replication progress for each subscriptions was set to given 
LSN (got at step2).
=pg_subscriber finished here=
9. on standby, a server process was started again
10. on standby, subscriptions were enabled. They referred slots created at 
step1.
11. on primary, decoding was started but ERROR was raised.

In this case, tuples were inserted *before creating publication*.
So I thought that the decoded transaction could not see the publication because
it was committed after insertions.

One solution is to create a publication before creating a consistent slot.
Changes which came before creating the slot were surely replicated to the 
standby,
so upcoming transactions can see the object. We are planning to patch set to fix
the issue in this approach.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-21 Thread Masahiko Sawada

On Wed, Jan 17, 2024 at 12:32 PM John Naylor  wrote:
>
> I wrote:
>
> > > Hmm, I wonder if that's a side-effect of the "create" functions doing
> > > their own allocations and returning a pointer. Would it be less tricky
> > > if the structs were declared where we need them and passed to "init"
> > > functions?
>
> If this is a possibility, I thought I'd first send the last (I hope)
> large-ish set of radix tree cleanups to avoid rebasing issues. I'm not
> including tidstore/vacuum here, because recent discussion has some
> up-in-the-air work.

Thank you for updating the patches! These updates look good to me.

>
> Should be self-explanatory, but some thing are worth calling out:
> 0012 and 0013: Some time ago I started passing insertpos as a
> parameter, but now see that is not ideal -- when growing from node16
> to node48 we don't need it at all, so it's a wasted calculation. While
> reverting that, I found that this also allows passing constants in
> some cases.
> 0014 makes a cleaner separation between adding a child and growing a
> node, resulting in more compact-looking functions.
> 0019 is a bit unpolished, but I realized that it's pointless to assign
> a zero child when further up the call stack we overwrite it anyway
> with the actual value. With this, that assignment is skipped. This
> makes some comments and names strange, so needs a bit of polish, but
> wanted to get it out there anyway.

Cool.

I'll merge these patches in the next version v54 patch set.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-21 Thread Amit Kapila

On Fri, Jan 19, 2024 at 3:55 PM shveta malik  wrote:
>
> On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada  
> wrote:
> >
> >
> > Thank you for updating the patch. I have some comments:
> >
> > ---
> > +latestWalEnd = GetWalRcvLatestWalEnd();
> > +if (remote_slot->confirmed_lsn > latestWalEnd)
> > +{
> > +elog(ERROR, "exiting from slot synchronization as the
> > received slot sync"
> > + " LSN %X/%X for slot \"%s\" is ahead of the
> > standby position %X/%X",
> > + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> > + remote_slot->name,
> > + LSN_FORMAT_ARGS(latestWalEnd));
> > +}
> >
> > IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
> > typically the primary server's flush position and doesn't mean the LSN
> > where the walreceiver received/flushed up to.
>
> yes. I think it makes more sense to use something which actually tells
> flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd()
> with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have
> enabled the slot-sync feature in a running standby, in that case we
> are all good (flushedUpto is the same as actual flush-position
> indicated by LogstreamResult.Flush). But if I restart standby, then I
> observed that the startup process sets flushedUpto to some value 'x'
> (see [1]) while when the wal-receiver starts, it sets
> 'LogstreamResult.Flush' to another value (see [2]) which is always
> greater than 'x'. And we do not update flushedUpto with the
> 'LogstreamResult.Flush' value in walreceiver until we actually do an
> operation on primary. Performing a data change on primary sends WALs
> to standby which then hits XLogWalRcvFlush() and updates flushedUpto
> same as LogstreamResult.Flush. Until then we have a situation where
> slots received on standby are ahead of flushedUpto and thus slotsync
> worker keeps one erroring out. I am yet to find out why flushedUpto is
> set to a lower value than 'LogstreamResult.Flush' at the start of
> standby.  Or maybe am I using the wrong function
> GetWalRcvFlushRecPtr() and should be using something else instead?
>

Can we think of using GetStandbyFlushRecPtr()? We probably need to
expose this function, if this works for the required purpose.

-- 
With Regards,
Amit Kapila.

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-21 Thread Michael Paquier

On Fri, Jan 19, 2024 at 09:03:01AM +, Bertrand Drouvot wrote:
> On Fri, Jan 19, 2024 at 09:00:01AM +0300, Alexander Lakhin wrote:
>> 15.01.2024 12:00, Alexander Lakhin wrote:
>>> If this approach looks promising to you, maybe we could add a submodule to
>>> perl/PostgreSQL/Test/ and use this functionality in other tests (e.g., in
>>> 019_replslot_limit) as well.
>>> 
>>> Personally I think that having such a functionality for using in tests
>>> might be useful not only to avoid some "problematic" behaviour but also to
>>> test the opposite cases.
>> 
>> After spending a few days on it, I've discovered two more issues:
>> https://www.postgresql.org/message-id/16d6d9cc-f97d-0b34-be65-425183ed3721%40gmail.com
>> https://www.postgresql.org/message-id/b0102688-6d6c-c86a-db79-e0e91d245b1a%40gmail.com
>> 
>> (The latter is not related to bgwriter directly, but it was discovered
>> thanks to the RUNNING_XACTS record flew in WAL in a lucky moment.)
>> 
>> So it becomes clear that the 035 test is not the only one, which might
>> suffer from bgwriter's activity,
> 
> Yeah... thanks for sharing!
> 
>> and inventing a way to stop bgwriter/
>> control it is a different subject, getting out of scope of the current
>> issue.
> 
> Agree.
> 
>> 15.01.2024 11:49, Bertrand Drouvot wrote:
>> Maybe it would be a right move to commit the fix, and then think about
>> more rare failures.
> 
> +1

Yeah, agreed to make things more stable before making them fancier.

>> 2) Shall we align the 035_standby_logical_decoding with
>> 031_recovery_conflict in regard to improving stability of vacuum?
> 
> Yeah, I think that could make sense.

Probably.  That can always be done as a separate change, after a few
runs of the slow buildfarm members able to reproduce the failure.

>> I see the following options for this:
>> a) use wait_until_vacuum_can_remove() and autovacuum = off in both tests;
>> b) use FREEZE and autovacuum = off in both tests;
>> c) use wait_until_vacuum_can_remove() in 035, FREEZE in 031, and
>>  autovacuum = off in both.
> 
> I'd vote for a) as I've the feeling it's "easier" to understand (and I'm not
> sure using FREEZE would give full "stabilization predictability", at least for
> 035_standby_logical_decoding.pl). That said I did not test what the outcome 
> would
> be for 031_recovery_conflict.pl by making use of a).

Yeah, I think I agree here with a), as v7 does for 035.

+# Launch $sql and wait for a new snapshot that has a newer horizon before
+# doing the vacuum with $vac_option on $to_vac.
+sub wait_until_vacuum_can_remove

This had better document what the arguments of this routine are,
because that's really unclear.  $to_vac is the relation that will be
vacuumed, for one.

Also, wouldn't it be better to document in the test why
txid_current_snapshot() is chosen?  Contrary to txid_current(), it
does not generate a Transaction/COMMIT to make the test more
predictible, something you have mentioned upthread, and implied in the
test.

-  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal

This removes two INSERTs on flush_wal and refactors the code to do the
INSERT in wait_until_vacuum_can_remove(), using a SQL comment to
document a reference about the reason why an INSERT is used.  Couldn't
you just use a normal comment here?

>> I've re-tested the v6 patch today and confirmed that it makes the test
>> more stable. I ran (with bgwriter_delay = 1 in temp.config) 20 tests in
>> parallel and got failures ('inactiveslot slot invalidation is logged with
>> vacuum on pg_authid') on iterations 2, 6, 6 with no patch applied.
>> (With unlimited CPU, when average test duration is around 70 seconds.)
>> 
>> But with v6 applied, 60 iterations succeeded.
> 
> Nice! Thanks for the testing!

I need to review what you have more thoroughly, but is it OK to assume
that both of you are happy with the latest version of the patch in
terms of stability gained?  That's still not the full picture, still a
good step towards that.
--
Michael


signature.asc
Description: PGP signature

Re: speed up a logical replica setup

2024-01-21 Thread Peter Eisentraut


On 18.01.24 10:37, Amit Kapila wrote:

The other option could be pg_createsubscriber on the lines of
pg_verifybackup and pg_combinebackup.


Yes, that spelling would be more consistent.


Yet other options could be
pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
sounds like we are doing some work to create the subscriber which I
think is the case here.


I see your point here.  pg_createsubscriber is not like createuser in 
that it just runs an SQL command.  It does something different than 
CREATE SUBSCRIBER.  So a different verb would make that clearer.  Maybe 
something from here: https://www.thesaurus.com/browse/convert

Re: POC: Extension for adding distributed tracing - pg_tracing

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4456/
[2] https://cirrus-ci.com/task/5581154296791040

Kind Regards,
Peter Smith.

Re: Shared detoast Datum proposal

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4759/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4759

Kind Regards,
Peter Smith.

Re: Things I don't like about \du's "Attributes" column

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4738/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4738

Kind Regards,
Peter Smith.

Re: index prefetching

2024-01-21 Thread Konstantin Knizhnik



On 22/01/2024 1:47 am, Tomas Vondra wrote:

h, right. Well, you're right in this case we perhaps could set just one
of those flags, but the "purpose" of the two places is quite different.

The "prefetch" flag is fully controlled by the prefetcher, and it's up
to it to change it (e.g. I can easily imagine some new logic touching
setting it to "false" for some reason).

The "data" flag is fully controlled by the custom callbacks, so whatever
the callback stores, will be there.

I don't think it's worth simplifying this. In particular, I don't think
the callback can assume it can rely on the "prefetch" flag.

Why not to add "all_visible" flag to IndexPrefetchEntry ? If will not 
cause any extra space overhead (because of alignment), but allows to 
avoid dynamic memory allocation (not sure if it is critical, but nice to 
avoid if possible).

Re: pg_stat_statements and "IN" conditions

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there was a CFbot test failure last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/2837/
[2] https://cirrus-ci.com/task/6688578378399744

Kind Regards,
Peter Smith.

Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [Fixed Already]

2024-01-21 Thread Kirk Wolak

On Fri, Jan 19, 2024 at 7:03 PM Daniel Gustafsson  wrote:

> > On 19 Jan 2024, at 23:09, Kirk Wolak  wrote:
>
> >  From a FUTURE email, I noticed pg_jit_available() and it's set to f??
>
> Right, then this installation does not contain the necessary library to JIT
> compile the query.
>
> > Okay, so does this require a special BUILD command?
>
> Yes, it requires that you compile with --with-llvm.  If you don't have
> llvm in
> the PATH you might need to set LLVM_CONFIG to point to your llvm-config
> binary.
> With autotools that would be something like:
>
> ./configure  --with-llvm LLVM_CONFIG=/path/to/llvm-config
>
> --
> Daniel Gustafsson
>

Thank you, that made it possible to build and run...
UNFORTUNATELY this has a CLEAR memory leak (visible in htop)
I am watching it already consuming 6% of my system memory.

I am re-attaching my script.  WHICH includes the settings to FORCE JIT.
It also does an EXPLAIN so you can verify that JIT is on (this is what I
added/noticed!)
And it takes over 20 minutes to get this far.  It's still running.
I am re-attaching the script. (as I tweaked it).

This creates 90 million rows of data, so it takes a while.
I BELIEVE that it consumes far less memory if you do not fetch any rows (I
had problems reproducing it if no rows were fetched).
So, this may be beyond the planning stages.

Thanks,

Kirk Out!
CREATE TABLE pg_temp.parts
(
seid bigint,
r_field_name_1   smallint,
fr_field_namesmallint   NOT NULL,
p1_field_namevarchar(4),
qty_field_name   integer,
p5_field_namevarchar(30),
partnum  varchar(30),
st_field_namesmallint DEFAULT 0 NOT NULL
); -- drop table pg_temp.parts;

INSERT INTO pg_temp.parts (seid, partnum, qty_field_name, fr_field_name, 
st_field_name)
SELECT (RANDOM() * 500 + 1)::bigint   AS seid,
   trunc(RANDOM() * 1)::text AS 
partnum,
   CASE
   WHEN q.rnd BETWEEN 0 AND 0.45 THEN FLOOR(RANDOM() * 900) + 100 -- 
Random number in the range [100, 999]
   WHEN q.rnd BETWEEN 0.46 AND 0.96 THEN LEAST(TRUNC(FLOOR(RANDOM() * 
99) + 1000)::int, 99::int) -- Random number in the range [1000, ]
   ELSE FLOOR(RANDOM() * 900) + 100 -- Random number in the 
range [10, 99]
   END AS 
qty_field_name,
   CASE WHEN RANDOM() < 0.72 THEN 0::smallint ELSE 1::smallint END AS 
fr_field_name,
   CASE WHEN RANDOM() < 0.46 THEN 1::smallint ELSE 2::smallint END AS 
st_field_name
  FROM (SELECT RANDOM() AS rnd, x FROM GENERATE_SERIES(1, 90_000_000) x) q;

CREATE INDEX idx_parts_supid ON pg_temp.parts USING btree (seid, p1_field_name, 
partnum, st_field_name, r_field_name_1, qty_field_name);
CREATE INDEX idx_parts_p5 ON pg_temp.parts USING btree (p5_field_name, seid, 
st_field_name, r_field_name_1, p1_field_name);
CREATE INDEX idx_parts_partnum ON pg_temp.parts USING btree (partnum, seid, 
st_field_name, r_field_name_1, p1_field_name);

CREATE OR REPLACE FUNCTION pg_temp.fx(asupplier bigint = 497 )
RETURNS void
LANGUAGE plpgsql
AS
$function$
DECLARE
supplier_parts   CURSOR (sid bigint) FOR  -- Again, selecting with 
COUNT() would reduce 1 query per row!
SELECT
partnum, qty_field_name, st_field_name, sum(qty_field_name) as qty
FROM pg_temp.parts
WHERE seid = sid AND (st_field_name = 1)
GROUP BY partnum, qty_field_name, st_field_name
ORDER BY partnum, qty_field_name, st_field_name;
supplier_part_qty_matches CURSOR (sid bigint, pnum varchar(30), pqty 
bigint) FOR
SELECT DISTINCT
seid, fr_field_name, partnum, st_field_name
FROM pg_temp.parts
WHERE seid <> sid AND partnum = pnum AND qty_field_name = pqty
ORDER BY seid, partnum;

a_partnum varchar(30);
a_qty integer;
a_st  smallint;
a_cnt integer = 0;
b_partnum varchar(30);
b_fr  smallint;
b_seidbigint;
b_st  smallint;
b_cnt bigint = 0;
BEGIN
RAISE NOTICE '%', (SELECT (PG_SIZE_PRETTY(SUM(used_bytes)), 
PG_SIZE_PRETTY(SUM(total_bytes)), PG_SIZE_PRETTY(SUM(free_bytes))) FROM 
pg_get_backend_memory_contexts());
OPEN supplier_parts (asupplier);
LOOP
FETCH supplier_parts INTO a_partnum, a_qty, a_st, a_qty;
EXIT WHEN NOT FOUND;
a_cnt := a_cnt + 1;
OPEN supplier_part_qty_matches (sid := asupplier, pnum := a_partnum, 
pqty := a_qty);
LOOP
FETCH supplier_part_qty_matches INTO b_seid, b_fr, b_partnum, b_st;
b_cnt := b_cnt + 1;
EXIT WHEN TRUE;  -- no Need to loop here  One FETCH per query 
triggers the losses.
END LOOP;
CLOSE supplier_part_qty_matches;
END LOOP;
CLOSE supplier_parts;
RAISE NOTICE '---after close,

Re: WIP Incremental JSON Parser

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4725/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4725

Kind Regards,
Peter Smith.

Re: Transaction timeout

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there was a CFbot test failure last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4040/
[2] https://cirrus-ci.com/task/4721191139672064

Kind Regards,
Peter Smith.

Re: Row pattern recognition

2024-01-21 Thread Tatsuo Ishii

> Thank you very much for providing the patch for the RPR implementation.
> 
> After applying the v12-patches, I noticed an issue that
> the rpr related parts in window clauses were not displayed in the
> view definitions (the definition column of pg_views).
> 
> To address this, I have taken the liberty of adding an additional patch
> that modifies the relevant rewriter source code.
> I have attached the rewriter patch for your review and would greatly 
> appreciate your feedback.
> 
> Thank you for your time and consideration.

Thank you so much for spotting the issue and creating the patch. I
confirmed that your patch applies cleanly and solve the issue. I will
include the patches into upcoming v13 patches.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: remaining sql/json patches

2024-01-21 Thread John Naylor

On Mon, Nov 27, 2023 at 9:06 PM Alvaro Herrera  wrote:
> At this point one thing that IMO we cannot afford to do, is stop feature
> progress work on the name of parser speed.  I mean, parser speed is
> important, and we need to be mindful that what we add is reasonable.
> But at some point we'll probably have to fix that by parsing
> differently (a top-down parser, perhaps?  Split the parser in smaller
> pieces that each deal with subsets of the whole thing?)

I was reorganizing some old backups and rediscovered an experiment I
did four years ago when I had some extra time on my hands, to use a
lexer generator that emits a state machine driven by code, rather than
a table. It made parsing 12% faster on the above info-schema test, but
only (maybe) 3% on parsing pgbench-like queries. My quick hack ended
up a bit uglier and more verbose than Flex, but that could be
improved, and in fact small components could be shared across the
whole code base. I might work on it again; I might not.

Re: Moving forward with TDE [PATCH v3]

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/3985/
[2] https://cirrus-ci.com/task/5498215743619072

Kind Regards,
Peter Smith.

Re: Add \syncpipeline command to pgbench

2024-01-21 Thread Michael Paquier

On Fri, Jan 19, 2024 at 08:55:31AM +0100, Anthonin Bonnefoy wrote:
> I've updated the doc to group the commands. It does look better and
> more consistent with similar command groups like \if.

I was playing with a few meta command scenarios while looking at this
patch, and this sequence generates an error that should never happen:
$ cat /tmp/test.sql
\startpipeline
\syncpipeline
$ pgbench -n -f /tmp/test.sql -M extended
[...]
pgbench: error: unexpected transaction status 1
pgbench: error: client 0 aborted while receiving the transaction status

It looks to me that we need to be much smarter than that for the error
handling we'd need when a sync request is optionally sent when a
transaction stops at the end of pgbench.  Could you look at it?
--
Michael


signature.asc
Description: PGP signature

Re: remaining sql/json patches

2024-01-21 Thread jian he

I found two main issues regarding cocece SQL/JSON function output to
other data types.
* returning typmod influence the returning result of JSON_VALUE | JSON_QUERY.
* JSON_VALUE | JSON_QUERY handles returning type domains allowing null
and not allowing null inconsistencies.

in ExecInitJsonExprCoercion, there is IsA(coercion,JsonCoercion) or
not difference.
for the returning of (JSON_VALUE | JSON_QUERY),
"coercion" is a JsonCoercion or not is set in coerceJsonFuncExprOutput.

this influence returning type with typmod is not -1.
if set "coercion" as JsonCoercion Node then it may call the
InputFunctionCallSafe to do the coercion.
If not, it may call ExecInitFunc related code which is wrapped in
ExecEvalCoerceViaIOSafe.

for example:
SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING char(3));
will ExecInitFunc, will init function bpchar(character, integer,
boolean). it will set the third argument to true.
so it will initiate related instructions like: `select
bpchar('[,2]',7,true); ` which in the end will make the result be
`[,2`
However, InputFunctionCallSafe cannot handle that.
simple demo:
create table t(a char(3));
--fail.
INSERT INTO t values ('test');
--ok
select 'test'::char(3);

however current ExecEvalCoerceViaIOSafe cannot handle omit quotes.

even if I made the changes, still not bullet-proof.
for example:
create domain char3_domain_not_null as char(3) NOT NULL;
create domain hello as text NOT NULL check (value = 'hello');
create domain int42 as int check (value = 42);
CREATE TYPE comp_domain_with_typmod AS (a char3_domain_not_null, b int42);

SELECT JSON_VALUE(jsonb'{"rec": "(abcd,42)"}', '$.rec' returning
comp_domain_with_typmod);
will return NULL

however
SELECT JSON_VALUE(jsonb'{"rec": "abcd"}', '$.rec' returning
char3_domain_not_null);
will return `abc`.

I made the modification, you can see the difference.
attached is test_coerce.sql is the test file.
test_coerce_only_v35.out  is the test output of only applying v35 0001
to 0007 plus my previous changes[0].
test_coerce_v35_plus_change.out is  the test output of applying to v35
0001 to 0007 plus changes (attachment) and previous changes[0].

[0] 
https://www.postgresql.org/message-id/CACJufxHo1VVk_0th3AsFxqdMgjaUDz6s0F7%2Bj9rYA3d%3DURw97A%40mail.gmail.com


test_coerce.sql
Description: application/sql


test_coerce_only_v35.out
Description: Binary data


test_coerce_v35_plus_change.out
Description: Binary data


v1-0001-make-JSON_QUERY-JSON_VALUE-returning-type-with.no-cfbot
Description: Binary data

Re: Statistics Import and Export

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4538/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4538

Kind Regards,
Peter Smith.

Re: Sequence Access Methods, round two

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4677/
[2] https://cirrus-ci.com/task/5576959615303680

Kind Regards,
Peter Smith.

Re: Improve heapgetpage() performance, overhead from serializable

2024-01-21 Thread John Naylor

On Mon, Jul 17, 2023 at 9:58 PM Andres Freund  wrote:
> And 397->320ms
> for something as core as this, is imo worth considering on its own.

Hi Andres, this interesting work seems to have fallen off the radar --
are you still planning to move forward with this for v17?

Re: SQL:2011 application time

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4308/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4308

Kind Regards,
Peter Smith.

Re: Support "Right Semi Join" plan shapes

2024-01-21 Thread wenhui qiu

Hi  vignesh CI saw this path has been passed (
https://cirrus-ci.com/build/6109321080078336),can we push it?

Best wish

Richard Guo  于2024年1月9日周二 18:49写道：

>
> On Sun, Jan 7, 2024 at 3:03 PM vignesh C  wrote:
>
>> One of the tests in CFBot has failed at [1] with:
>> -   Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
>> -   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6,
>> r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS
>> (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
>> ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY
>> r1."C 1" ASC NULLS LAST
>> -(4 rows)
>> +   Sort Key: t1.c1
>> +   ->  Foreign Scan
>> + Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
>> + Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
>> + Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5,
>> r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND
>> EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
>> ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
>> +(7 rows)
>
>
> Thanks.  I looked into it and have figured out why the plan differs.
> With this patch the SEMI JOIN that is pushed down to the remote server
> is now implemented using JOIN_RIGHT_SEMI, whereas previously it was
> implemented using JOIN_SEMI.  Consequently, this leads to changes in the
> costs of the paths: path with the sort pushed down to remote server, and
> path with the sort added atop the foreign join.  And at last the latter
> one wins by a slim margin.
>
> I think we can simply update the expected file to fix this plan diff, as
> attached.
>
> Thanks
> Richard
>

Re: remaining sql/json patches

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4377/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4377

Kind Regards,
Peter Smith.

Re: Row pattern recognition

2024-01-21 Thread NINGWEI CHEN

On Sat, 09 Dec 2023 07:22:58 +0900 (JST)
Tatsuo Ishii  wrote:

> > On 04.12.23 12:40, Tatsuo Ishii wrote:
> >> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> >> index d631ac89a9..5a77fca17f 100644
> >> --- a/src/backend/parser/gram.y
> >> +++ b/src/backend/parser/gram.y
> >> @@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>DefElem*defelt;
> >>SortBy *sortby;
> >>WindowDef  *windef;
> >> +  RPCommonSyntax  *rpcom;
> >> +  RPSubsetItem*rpsubset;
> >>JoinExpr   *jexpr;
> >>IndexElem  *ielem;
> >>StatsElem  *selem;
> >> @@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>MergeWhenClause *mergewhen;
> >>struct KeyActions *keyactions;
> >>struct KeyAction *keyaction;
> >> +  RPSkipToskipto;
> >>   }
> >> %typestmt toplevel_stmt schema_stmt routine_body_stmt
> > 
> > It is usually not the style to add an entry for every node type to the
> > %union.  Otherwise, we'd have hundreds of entries in there.
> 
> Ok, I have removed the node types and used existing node types.  Also
> I have moved RPR related %types to same place to make it easier to know
> what are added by RPR.
> 
> >> @@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char
> >> *relname, List *aliases, Node *query);
> >>   %nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
> >>   %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE
> >>   %ROLLUP
> >>SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
> >> +%nonassoc MEASURES AFTER INITIAL SEEK PATTERN_P
> >>   %left Op OPERATOR /* multi-character ops and user-defined operators */
> >>   %left'+' '-'
> >>   %left'*' '/' '%'
> > 
> > It was recently discussed that these %nonassoc should ideally all have
> > the same precedence.  Did you consider that here?
> 
> No, I didn't realize it. Thanks for pointing it out. I have removed
> %nonassoc so that MEASURES etc. have the same precedence as IDENT etc.
> 
> Attached is the new diff of gram.y against master branch.

Thank you very much for providing the patch for the RPR implementation.

After applying the v12-patches, I noticed an issue that
the rpr related parts in window clauses were not displayed in the
view definitions (the definition column of pg_views).

To address this, I have taken the liberty of adding an additional patch
that modifies the relevant rewriter source code.
I have attached the rewriter patch for your review and would greatly appreciate 
your feedback.

Thank you for your time and consideration.

-- 
SRA OSS LLC
Ningwei Chen 
TEL: 03-5979-2701 FAX: 03-5979-2702
diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 0b2a164057..baded88201 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -428,6 +428,10 @@ static void get_rule_groupingset(GroupingSet *gset, List 
*targetlist,
 bool 
omit_parens, deparse_context *context);
 static void get_rule_orderby(List *orderList, List *targetList,
 bool force_colno, 
deparse_context *context);
+static void get_rule_pattern(List *patternVariable, List *patternRegexp,
+bool force_colno, 
deparse_context *context);
+static void get_rule_define(List *defineClause, List *patternVariables,
+   bool force_colno, 
deparse_context *context);
 static void get_rule_windowclause(Query *query, deparse_context *context);
 static void get_rule_windowspec(WindowClause *wc, List *targetList,
deparse_context 
*context);
@@ -6459,6 +6463,64 @@ get_rule_orderby(List *orderList, List *targetList,
}
 }
 
+/*
+ * Display a PATTERN clause.
+ */
+static void
+get_rule_pattern(List *patternVariable, List *patternRegexp,
+bool force_colno, deparse_context *context)
+{
+   StringInfo  buf = context->buf;
+   const char *sep;
+   ListCell   *lc_var, *lc_reg = list_head(patternRegexp);
+
+   sep = "";
+   appendStringInfoChar(buf, '(');
+   foreach(lc_var, patternVariable)
+   {
+   char *variable = strVal((String *) lfirst(lc_var));
+   char *regexp = NULL;
+
+   if (lc_reg != NULL)
+   {
+   regexp = strVal((String *) lfirst(lc_reg));
+   lc_reg = lnext(patternRegexp, lc_reg);
+   }
+
+   appendStringInfo(buf, "%s%s", sep, variable);
+   if (regexp != NULL)
+   appendStringInfoString(buf, regexp);
+
+   sep = " ";
+   }
+   appendStringInfoChar(buf, ')');
+}
+

Re: Relation bulk write facility

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there was a CFbot test failure last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4575/
[2] https://cirrus-ci.com/task/4990764426461184

Kind Regards,
Peter Smith.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-01-21 Thread Dilip Kumar

On Wed, Jan 17, 2024 at 1:46 PM Kartyshov Ivan
 wrote:
>
> Add some fixes and rebase.
>
While quickly looking into the patch, I understood the idea of what we
are trying to achieve here and I feel that a useful feature.  But
while looking at both the patches I could not quickly differentiate
between these two approaches.  I believe, internally at the core both
are implementing similar wait logic but providing different syntaxes.
So if we want to keep both these approaches open for the sake of
discussion then better first to create a patch that implements the
core approach i.e. the waiting logic and the other common part and
then add top-up patches with 2 different approaches that would be easy
for review.  I also see in v4 that there is no documentation for the
syntax part so it makes it even harder to understand.

I think this thread is implementing a useful feature so my suggestion
is to add some documentation in v4 and also make it more readable
w.r.t. What are the clear differences between these two approaches,
maybe adding commit message will also help.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Permute underscore separated components of columns before fuzzy matching

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there were  CFbot test failures last time it was run [2]. Please
have a look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4282/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4282

Kind Regards,
Peter Smith.

Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-21 Thread John Naylor

On Mon, Jan 22, 2024 at 10:28 AM Masahiko Sawada  wrote:
>
> On further thought, as you pointed out before, "num_tids" should not
> be in tidstore in terms of integration with tidbitmap.c, because
> tidbitmap.c has "lossy pages". With lossy pages, "num_tids" is no
> longer accurate and useful. Similarly, looking at tidbitmap.c, it has
> npages and nchunks but they will not be necessary in lazy vacuum use
> case. Also, assuming that we support parallel heap pruning, probably
> we need to somehow lock the tidstore while adding tids to the tidstore
> concurrently by parallel vacuum worker. But in tidbitmap use case, we
> don't need to lock the tidstore since it doesn't have multiple
> writers.

Not currently, and it does seem bad to require locking where it's not required.

(That would be a prerequisite for parallel index scan. It's been tried
before with the hash table, but concurrency didn't scale well with the
hash table. I have no reason to think that the radix tree would scale
significantly better with the same global LW lock, but as you know
there are other locking schemes possible.)

> Given these facts, different statistics and different lock
> strategies are required by different use case. So I think there are 3
> options:
>
> 1. expose lock functions for tidstore and the caller manages the
> statistics in the outside of tidstore. For example, in lazyvacuum.c we
> would have a TidStore for tid storage as well as VacDeadItemsInfo that
> has num_tids and max_bytes. Both are in LVRelState. For parallel
> vacuum, we pass both to the workers via DSM and pass both to function
> where the statistics are required. As for the exposed lock functions,
> when adding tids to the tidstore, the caller would need to call
> something like TidStoreLockExclusive(ts) that further calls
> LWLockAcquire(ts->tree.shared->ctl.lock, LW_EXCLUSIVE) internally.

The advantage here is that vacuum can avoid locking entirely while
using shared memory, just like it does now, and has the option to add
it later.
IIUC, the radix tree struct would have a lock member, but wouldn't
take any locks internally? Maybe we still need one for
RT_MEMORY_USAGE? For that, I see dsa_get_total_size() takes its own
DSA_AREA_LOCK -- maybe that's enough?

That seems simplest, and is not very far from what we do now. If we do
this, then the lock functions should be where we branch for is_shared.

> 2. add callback functions to tidstore so that the caller can do its
> work while holding a lock on the tidstore. This is like the idea we
> just discussed for radix tree. The caller passes a callback function
> and user data to TidStoreSetBlockOffsets(), and the callback is called
> after setting tids. Similar to option 1, the statistics need to be
> stored in a different area.

I think we'll have to move to something like this eventually, but it
seems like overkill right now.

> 3. keep tidstore.c and tidbitmap.c separate implementations but use
> radix tree in tidbitmap.c. tidstore.c would have "num_tids" in its
> control object and doesn't have any lossy page support. On the other
> hand, in tidbitmap.c we replace simplehash with radix tree. This makes
> tidstore.c simple but we would end up having different data structures
> for similar usage.

They have so much in common that it's worth it to use the same
interface and (eventually) value type. They just need separate paths
for adding tids, as we've discussed.

> I think it's worth trying option 1. What do you think, John?

+1

Re: [PATCH] New [relation] option engine

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4688/
[2] https://cirrus-ci.com/task/5066432363364352

Kind Regards,
Peter Smith.

Re: Network failure may prevent promotion

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4748/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4748

Kind Regards,
Peter Smith.

Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4698/
[2] https://cirrus-ci.com/task/5367036042280960

Kind Regards,
Peter Smith.

meson: Specify -Wformat as a common warning flag for extensions

2024-01-21 Thread Sutou Kouhei

Hi,

I'm an extension developer. If I use PostgreSQL built with
Meson, I get the following warning:

cc1: warning: '-Wformat-security' ignored without '-Wformat' 
[-Wformat-security]

Because "pg_config --cflags" includes -Wformat-security but
doesn't include -Wformat.

Can we specify -Wformat as a common warning flag too? If we
do it, "pg_config --cflags" includes both of
-Wformat-security and -Wformat. So I don't get the warning.


Thanks,
-- 
kou
>From 0913033512c9b75ee3d2941c89ff8696f3c5f53b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 22 Jan 2024 13:51:58 +0900
Subject: [PATCH v1] meson: Specify -Wformat explicitly for extensions

We specify -Wformat-security as a common warning flag explicitly. Our
common warning flags are used by extensions via
pgxs/src/Makefile.global or "pg_config --cflags".

If -Wformat-security is used without -Wall/-Wformat, GCC shows the
following warning:

cc1: warning: '-Wformat-security' ignored without '-Wformat' [-Wformat-security]

We can't assume that all extensions use -Wall/-Wformat. So specifying
only -Wformat-security may cause the warning.

If we specify -Wformat explicitly, the warning isn't shown.
---
 meson.build | 8 
 1 file changed, 8 insertions(+)

diff --git a/meson.build b/meson.build
index 55184db248..de6ce778fc 100644
--- a/meson.build
+++ b/meson.build
@@ -1822,6 +1822,14 @@ common_warning_flags = [
   '-Wimplicit-fallthrough=3',
   '-Wcast-function-type',
   '-Wshadow=compatible-local',
+  # This is for preventing the "cc1: warning: '-Wformat-security'
+  # ignored without '-Wformat' [-Wformat-security]" warning. We don't
+  # need this for PostgreSQL itself. This is just for
+  # extensions. Extensions use "pg_config --cflags" to build
+  # themselves. If extensions use only -Wformat-security, the warning
+  # is shown. If we have this here, extensions use both of -Wformat
+  # and -Wformat-security. So the warning isn't shown.
+  '-Wformat',
   # This was included in -Wall/-Wformat in older GCC versions
   '-Wformat-security',
 ]
-- 
2.43.0

Re: Make mesage at end-of-recovery less scary.

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/2490/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/2490

Kind Regards,
Peter Smith.

Re: Fix some typos

2024-01-21 Thread Michael Paquier

On Sun, Jan 21, 2024 at 08:22:01PM +0800, Yongtao Huang wrote:
> As the title said, just fix some typos.

Thanks, applied.
--
Michael


signature.asc
Description: PGP signature

Re: cleanup patches for dshash

2024-01-21 Thread Nathan Bossart

On Sun, Jan 21, 2024 at 09:51:18PM -0600, Nathan Bossart wrote:
> I did notice a cfbot failure [0].  After a quick glance, I'm assuming this
> is caused by the memcpy() in insert_into_bucket().  Even if the string is
> short, it will copy the maximum key size, which is bad.  So, 0002 is
> totally broken at the moment, and we'd need to teach insert_into_bucket()
> to strcpy() instead for string keys to fix it.  Perhaps we could add a
> field to dshash_parameters for this purpose...

I attempted to fix this in v2 of the patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From af203e6e80f7a2843890e51fd3e3641aef8bd901 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 19 Jan 2024 15:23:35 -0600
Subject: [PATCH v2 1/2] Use NULL instead of 0 for arg parameter in
 dshash_create().

---
 src/backend/replication/logical/launcher.c | 2 +-
 src/backend/utils/activity/pgstat_shmem.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 122db0bb13..ee66c292bd 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -1013,7 +1013,7 @@ logicalrep_launcher_attach_dshmem(void)
 		last_start_times_dsa = dsa_create(LWTRANCHE_LAUNCHER_DSA);
 		dsa_pin(last_start_times_dsa);
 		dsa_pin_mapping(last_start_times_dsa);
-		last_start_times = dshash_create(last_start_times_dsa, _params, 0);
+		last_start_times = dshash_create(last_start_times_dsa, _params, NULL);
 
 		/* Store handles in shared memory for other backends to use. */
 		LogicalRepCtx->last_start_dsa = dsa_get_handle(last_start_times_dsa);
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 3ce48e88a3..71410ddd3f 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -180,7 +180,7 @@ StatsShmemInit(void)
 		 * With the limit in place, create the dshash table. XXX: It'd be nice
 		 * if there were dshash_create_in_place().
 		 */
-		dsh = dshash_create(dsa, _params, 0);
+		dsh = dshash_create(dsa, _params, NULL);
 		ctl->hash_handle = dshash_get_hash_table_handle(dsh);
 
 		/* lift limit set above */
-- 
2.25.1

>From 1d328f0be681da7e43574350a1d3c8da8ae8ddcc Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 19 Jan 2024 15:38:34 -0600
Subject: [PATCH v2 2/2] Add hash/compare functions for dshash tables with
 string keys.

---
 src/backend/lib/dshash.c   | 58 +-
 src/backend/replication/logical/launcher.c |  1 +
 src/backend/storage/ipc/dsm_registry.c |  9 ++--
 src/backend/utils/activity/pgstat_shmem.c  |  1 +
 src/backend/utils/cache/typcache.c |  2 +
 src/include/lib/dshash.h   | 19 ++-
 6 files changed, 83 insertions(+), 7 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index b0bc0abda0..cc49b4ca51 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -188,6 +188,8 @@ static bool delete_item_from_bucket(dshash_table *hash_table,
 static inline dshash_hash hash_key(dshash_table *hash_table, const void *key);
 static inline bool equal_keys(dshash_table *hash_table,
 			  const void *a, const void *b);
+static inline void copy_key(dshash_table *hash_table, void *dest,
+			const void *src);
 
 #define PARTITION_LOCK(hash_table, i)			\
 	(&(hash_table)->control->partitions[(i)].lock)
@@ -583,6 +585,49 @@ dshash_memhash(const void *v, size_t size, void *arg)
 	return tag_hash(v, size);
 }
 
+/*
+ * A copy function that forwards to memcpy.
+ */
+void
+dshash_memcpy(void *dest, const void *src, size_t size, void *arg)
+{
+	(void) memcpy(dest, src, size);
+}
+
+/*
+ * A compare function that forwards to strcmp.
+ */
+int
+dshash_strcmp(const void *a, const void *b, size_t size, void *arg)
+{
+	Assert(strlen((const char *) a) < size);
+	Assert(strlen((const char *) b) < size);
+
+	return strcmp((const char *) a, (const char *) b);
+}
+
+/*
+ * A hash function that forwards to string_hash.
+ */
+dshash_hash
+dshash_strhash(const void *v, size_t size, void *arg)
+{
+	Assert(strlen((const char *) v) < size);
+
+	return string_hash((const char *) v, size);
+}
+
+/*
+ * A copy function that forwards to strcpy.
+ */
+void
+dshash_strcpy(void *dest, const void *src, size_t size, void *arg)
+{
+	Assert(strlen((const char *) src) < size);
+
+	(void) strcpy((char *) dest, (const char *) src);
+}
+
 /*
  * Sequentially scan through dshash table and return all the elements one by
  * one, return NULL when all elements have been returned.
@@ -949,7 +994,7 @@ insert_into_bucket(dshash_table *hash_table,
 hash_table->params.entry_size +
 MAXALIGN(sizeof(dshash_table_item)));
 	item = dsa_get_address(hash_table->area, item_pointer);
-	memcpy(ENTRY_FROM_ITEM(item), key, hash_table->params.key_size);
+	copy_key(hash_table, ENTRY_FROM_ITEM(item), key);

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there was a CFbot test failure last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4221/
[2] https://cirrus-ci.com/task/5618308515364864

Kind Regards,
Peter Smith.

Re: index prefetching

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there were  CFbot test failures last time it was run [2]. Please
have a look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4351/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4351

Kind Regards,
Peter Smith.

Re: Adding facility for injection points (or probe points?) for more advanced tests

2024-01-21 Thread Ashutosh Bapat

On Mon, Jan 22, 2024 at 10:08 AM Michael Paquier  wrote:
>
> On Thu, Jan 18, 2024 at 10:56:09AM +0530, Ashutosh Bapat wrote:
> > There is some overlap between Dtrace functionality and this
> > functionality. But I see differences too. E.g. injection points offer
> > deeper integration whereas dtrace provides more information to the
> > probe like callstack and argument values etc. We need to assess
> > whether these functionality can co-exist and whether we need both of
> > them. If the answer to both of these questions is yes, it will be good
> > to add documentation explaining the differences and similarities and
> > also some guidance on when to use what.
>
> Perhaps, I'm not sure how much we want to do regarding that yet,
> injection points have no external dependencies and will work across
> all environments as long as dlsym() (or an equivalent) is able to
> work, while being cheaper because they don't spawn an external process
> to trace the call.

Yes. Both have their advantages and disadvantages. So I believe both
will stay but that means the guidance is necessary. We may want to see
reception and add the guidance later in the release cycle.

>
> > Other code changes look good. I think the documentation and comments
> > need some changes esp. considering the users point of view. Have
> > attached two patches (0003, and 0004) with those changes to be applied
> > on top of 0001 and 0002 respectively. Please review them. Might need
> > some wordsmithy and language correction. Attaching the whole patch set
> > to keep cibot happy.
>
> The CF bot was perhaps happy but your 0004 has forgotten to update the
> expected output.  There were also a few typos, some markups and edits
> required for 0002 but as a whole what you have suggested was an
> improvement.  Thanks.

Sorry for that. Glad that you found those suggestions acceptable.

-- 
Best Wishes,
Ashutosh Bapat

Re: Improve eviction algorithm in ReorderBuffer

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there was some CFbot test failures last time it was run [2].
Please have a look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4699/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4699

Kind Regards,
Peter Smith.

Re: Add system identifier to backup manifest

2024-01-21 Thread Amul Sul

On Fri, Jan 19, 2024 at 10:36 PM Amul Sul  wrote:

> On Wed, Jan 17, 2024 at 8:40 PM Robert Haas  wrote:
>
>>
>>
> Updated version is attached.
>

Another updated version attached -- fix missing manifest version check in
pg_verifybackup before system identifier validation.

Regards,
Amul
From 7ff9e85acbf0789d16d29dc316ce2bd9382ac879 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Mon, 22 Jan 2024 10:05:20 +0530
Subject: [PATCH v3] Add system identifier to backup manifest

The backup manifest will be having a new item as "System-Identifier"
and its value is from "ControlFileData.system_identifier" when
manifest generated.  This help identify the correct database server
and/or backup while taking subsequent backup.
---
 doc/src/sgml/backup-manifest.sgml | 15 ++-
 doc/src/sgml/ref/pg_verifybackup.sgml |  3 +-
 src/backend/backup/backup_manifest.c  |  9 +-
 src/backend/backup/basebackup.c   |  3 +-
 src/backend/backup/basebackup_incremental.c   | 39 
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  | 20 
 src/bin/pg_combinebackup/load_manifest.c  | 62 ++--
 src/bin/pg_combinebackup/load_manifest.h  |  1 +
 src/bin/pg_combinebackup/pg_combinebackup.c   | 33 ++-
 src/bin/pg_combinebackup/t/005_integrity.pl   | 14 +++
 src/bin/pg_combinebackup/write_manifest.c |  8 +-
 src/bin/pg_combinebackup/write_manifest.h |  3 +-
 src/bin/pg_verifybackup/pg_verifybackup.c | 99 ++-
 src/bin/pg_verifybackup/t/003_corruption.pl   | 31 +-
 src/bin/pg_verifybackup/t/004_options.pl  |  2 +-
 src/bin/pg_verifybackup/t/005_bad_manifest.pl | 11 +++
 src/common/parse_manifest.c   | 83 +++-
 src/include/backup/backup_manifest.h  |  3 +-
 src/include/common/parse_manifest.h   |  6 ++
 19 files changed, 413 insertions(+), 32 deletions(-)

diff --git a/doc/src/sgml/backup-manifest.sgml b/doc/src/sgml/backup-manifest.sgml
index 771be1310a..a67462e3eb 100644
--- a/doc/src/sgml/backup-manifest.sgml
+++ b/doc/src/sgml/backup-manifest.sgml
@@ -37,7 +37,20 @@
 PostgreSQL-Backup-Manifest-Version
 
  
-  The associated value is always the integer 1.
+  The associated value is an integer. Beginning in
+  PostgreSQL 17, it is 2; in older versions,
+  it is 1.
+ 
+
+   
+
+   
+System-Identifier
+
+ 
+  The associated integer value is an unique system identifier to ensure
+  file match up with the installation that produced them. Available only
+  when PostgreSQL-Backup-Manifest-Version is 2 and later.
  
 

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 36335e0a18..3051517d92 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -53,7 +53,8 @@ PostgreSQL documentation
Backup verification proceeds in four stages. First,
pg_verifybackup reads the
backup_manifest file. If that file
-   does not exist, cannot be read, is malformed, or fails verification
+   does not exist, cannot be read, is malformed, fails to match the system
+   identifier with pg_control of the backup directory or fails verification
against its own internal checksum, pg_verifybackup
will terminate with a fatal error.
   
diff --git a/src/backend/backup/backup_manifest.c b/src/backend/backup/backup_manifest.c
index 2c34e59752..612ff3add2 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -56,7 +56,8 @@ IsManifestEnabled(backup_manifest_info *manifest)
 void
 InitializeBackupManifest(backup_manifest_info *manifest,
 		 backup_manifest_option want_manifest,
-		 pg_checksum_type manifest_checksum_type)
+		 pg_checksum_type manifest_checksum_type,
+		 uint64 system_identifier)
 {
 	memset(manifest, 0, sizeof(backup_manifest_info));
 	manifest->checksum_type = manifest_checksum_type;
@@ -79,8 +80,10 @@ InitializeBackupManifest(backup_manifest_info *manifest,
 
 	if (want_manifest != MANIFEST_OPTION_NO)
 		AppendToManifest(manifest,
-		 "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n"
-		 "\"Files\": [");
+		 "{ \"PostgreSQL-Backup-Manifest-Version\": 2,\n"
+		 "\"System-Identifier\": " UINT64_FORMAT ",\n"
+		 "\"Files\": [",
+		 system_identifier);
 }
 
 /*
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index d5b8ca21b7..315efc7536 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -256,7 +256,8 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 	backup_started_in_recovery = RecoveryInProgress();
 
 	InitializeBackupManifest(, opt->manifest,
-			 opt->manifest_checksum_type);
+			 opt->manifest_checksum_type,
+			 GetSystemIdentifier());
 
 	total_checksum_failures = 0;
 
diff --git a/src/backend/backup/basebackup_incremental.c b/src/backend/backup/basebackup_incremental.c
index

Re: Adding facility for injection points (or probe points?) for more advanced tests

2024-01-21 Thread Michael Paquier

On Thu, Jan 18, 2024 at 10:56:09AM +0530, Ashutosh Bapat wrote:
> There is some overlap between Dtrace functionality and this
> functionality. But I see differences too. E.g. injection points offer
> deeper integration whereas dtrace provides more information to the
> probe like callstack and argument values etc. We need to assess
> whether these functionality can co-exist and whether we need both of
> them. If the answer to both of these questions is yes, it will be good
> to add documentation explaining the differences and similarities and
> also some guidance on when to use what.

Perhaps, I'm not sure how much we want to do regarding that yet,
injection points have no external dependencies and will work across
all environments as long as dlsym() (or an equivalent) is able to
work, while being cheaper because they don't spawn an external process
to trace the call.

> +
> +#ifdef USE_INJECTION_POINTS
> +static bool
> +file_exists(const char *name)
> +{
> + struct stat st;
> +
> + Assert(name != NULL);
> + if (stat(name, ) == 0)
> + return !S_ISDIR(st.st_mode);
> + else if (!(errno == ENOENT || errno == ENOTDIR))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not access file \"%s\": %m", name)));
> + return false;
> +}
> 
> Shouldn't this be removed now? The code should use one from fd.c

Yep, removed that.

> Other code changes look good. I think the documentation and comments
> need some changes esp. considering the users point of view. Have
> attached two patches (0003, and 0004) with those changes to be applied
> on top of 0001 and 0002 respectively. Please review them. Might need
> some wordsmithy and language correction. Attaching the whole patch set
> to keep cibot happy.

The CF bot was perhaps happy but your 0004 has forgotten to update the
expected output.  There were also a few typos, some markups and edits
required for 0002 but as a whole what you have suggested was an
improvement.  Thanks.

> This is review of 0001 and 0002 only. Once we take care of these
> comments I think those patches will be ready for commit except one
> point of contention mentioned in [1]. We haven't heard any third
> opinion yet.

0001~0004 have been now applied, and I'm marking the CF entry as
committed.  I'll create a new thread once I have put more energy into
the regression test improvements.  Now the fun can really begin.  I am
also going to switch my buildfarm animals to use the new ./configure
switch.
--
Michael


signature.asc
Description: PGP signature

Re: In-placre persistance change of a relation

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there was a CFbot test failure last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/3461/
[2] https://cirrus-ci.com/task/6050020441456640

Kind Regards,
Peter Smith.

Re: Removing const-false IS NULL quals and redundant IS NOT NULL quals

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please
have a look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4459/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4459

Kind Regards,
Peter Smith.

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there were CFbot test failures last time it was run [1]. Please
have a look and post an updated version if necessary.

==
1[] https://commitfest.postgresql.org/46/3523/
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/3523

Re: UUID v7

2024-01-21 Thread Nikolay Samokhvalov

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Manually tested uuidv7(), uuid_extract_time() – they work as expected. The 
basic docs provided look clear.

I haven't checked the tests though and possible edge cases, so leaving it as 
"needs review" waiting for more reviewers

Re: UUID v7

2024-01-21 Thread Nikolay Samokhvalov

On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin 
wrote:

>
>
> > On 19 Jan 2024, at 13:25, Andrey Borodin  wrote:
> >
> > Also, I've added some documentation on all functions.
>
> Here's v12. Changes:
> 1. Documentation improvements
> 2. Code comments
> 3. Better commit message and reviews list
>

Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and
functions work well. I especially like that fact that we keep
uuid_extract_time(..) here – this is a great thing to have for time-based
partitioning, and in many cases we will be able to decide not to have a
creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in
pg17. Yes, the RFC is not fully finalized yet, but it's very close. And
many libraries are already including implementation of UUIDv7 – here are
some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik

Re: Synchronizing slots from primary to standby

2024-01-21 Thread Amit Kapila

On Fri, Jan 19, 2024 at 5:23 PM Amit Kapila  wrote:
>
> On Wed, Jan 17, 2024 at 4:00 PM shveta malik  wrote:
> >
>
> I had some off-list discussions with Sawada-San, Hou-San, and Shveta
> on the topic of extending replication commands instead of using the
> current model where we fetch the required slot information via SQL
> using a database connection. I would like to summarize the discussion
> and would like to know the thoughts of others on this topic.
>
> In the current patch, we launch the slotsync worker on physical
> standby which connects to the specified database (currently we let
> users specify the required dbname in primary_conninfo) on the primary.
> It then fetches the required information for failover marked slots
> from the primary and also does some primitive checks on the upstream
> node via SQL (the additional checks are like whether the upstream node
> has a specified physical slot or whether the upstream node is a
> primary node or a standby node). To fetch the required information it
> uses a libpqwalreciever API which is mostly apt for this purpose as it
> supports SQL execution but for this patch, we don't need a replication
> connection, so we extend the libpqwalreciever connect API.
>
> Now, the concerns related to this could be that users would probably
> need to change existing mechanisms/tools to update priamry_conninfo
> and one of the alternatives proposed is to have an additional GUC like
> slot_sync_dbname. Users won't be able to drop the database this worker
> is connected to aka whatever is specified in slot_sync_dbname but as
> the user herself sets up the configuration it shouldn't be a big deal.
> Then we also discussed whether extending libpqwalreceiver's connect
> API is a good idea and whether we need to further extend it in the
> future. As far as I can see, slotsync worker's primary requirement is
> to execute SQL queries which the current API is sufficient, and don't
> see something that needs any drastic change in this API. Note that
> tablesync worker that executes SQL also uses these APIs, so we may
> need something in the future for either of those. Then finally we need
> a slotsync worker to also connect to a database to use SQL and fetch
> results.
>
> Now, let us consider if we extend the replication commands like
> READ_REPLICATION_SLOT and or introduce a new set of replication
> commands to fetch the required information then we don't need a DB
> connection with primary or a connection in slotsync worker. As per my
> current understanding, it is quite doable but I think we will slowly
> go in the direction of making replication commands something like SQL
> because today we need to extend it to fetch all slots info that have
> failover marked as true, the existence of a particular replication,
> etc. Then tomorrow, if we want to extend this work to have multiple
> slotsync workers say workers perdb then we have to extend the
> replication command to fetch per-database failover marked slots. To
> me, it sounds more like we are slowly adding SQL-like features to
> replication commands.
>
> Apart from this when we are reading per-db replication slots without
> connecting to a database, we probably need some additional protection
> mechanism so that the database won't get dropped.
>
> Considering all this it seems that for now probably extending
> replication commands can simplify a few things like mentioned above
> but using SQL's with db-connection is more extendable.
>
> Thoughts?
>

Bertrand, and others, do you have an opinion on this matter?

-- 
With Regards,
Amit Kapila.

Re: Change GUC hashtable to use simplehash?

2024-01-21 Thread Jeff Davis

On Mon, 2024-01-22 at 09:03 +0700, John Naylor wrote:
> v15-0004 is a stab at that. As an idea, it also renames zero_bytes_le
> to zero_byte_low to reflect the effect better. There might be some
> other comment edits needed to explain usage, so I plan to hold on to
> this for later. Let me know what you think.

0004 looks good to me. No urgency so feel free to hold it until a
convenient time.

Regards,
Jeff Davis

Re: Avoid computing ORDER BY junk columns unnecessarily

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review", but it seems like
there were some CFbot test failures last time it was run [1]. Please
have a look and post an updated version if necessary.

==
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4717

Re: cleanup patches for dshash

2024-01-21 Thread Nathan Bossart

On Mon, Jan 22, 2024 at 10:28:42AM +0800, Andy Fan wrote:
> Both LGTM.

Thanks for looking.

> +dshash_strcmp(const void *a, const void *b, size_t size, void *arg)
> +{
> + Assert(strlen((const char *) a) < size);
> + Assert(strlen((const char *) b) < size);
> +
> 
> Do you think the below change will be more explicitly?
> 
> #define DSMRegistryNameSize 64
> 
> DSMRegistryEntry
> {
> char name[DSMRegistryNameSize];
> 
> }
> 
> Assert(strlen((const char *) a) < DSMRegistryNameSize);

This is effectively what it's doing already.  These functions are intended
to be generic so that they could be used with other dshash tables with
string keys, possibly with different sizes.

I did notice a cfbot failure [0].  After a quick glance, I'm assuming this
is caused by the memcpy() in insert_into_bucket().  Even if the string is
short, it will copy the maximum key size, which is bad.  So, 0002 is
totally broken at the moment, and we'd need to teach insert_into_bucket()
to strcpy() instead for string keys to fix it.  Perhaps we could add a
field to dshash_parameters for this purpose...

[0] 
https://api.cirrus-ci.com/v1/artifact/task/5124848070950912/log/src/test/modules/test_dsm_registry/log/postmaster.log

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review", but it seems like
there were some CFbot test failures last time it was run [1]. Please
have a look and post an updated version if necessary.

==
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4736

Kind Regards,
Peter Smith.

Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-21 Thread Masahiko Sawada

On Fri, Jan 19, 2024 at 6:48 PM John Naylor  wrote:
>
> On Fri, Jan 19, 2024 at 2:26 PM Masahiko Sawada  wrote:
> >
> > On Thu, Jan 18, 2024 at 1:30 PM John Naylor  wrote:
> > > I'm not quite sure what the point of "num_items" is anymore, because
> > > it was really tied to the array in VacDeadItems. dead_items->num_items
> > > is essential to reading/writing the array correctly. If this number is
> > > wrong, the array is corrupt. There is no such requirement for the
> > > radix tree. We don't need to know the number of tids to add to it or
> > > do a lookup, or anything.
> >
> > True. Sorry I wanted to say "num_tids" of TidStore. I'm still thinking
> > we need to have the number of TIDs in a tidstore, especially in the
> > tidstore's control object.
>
> Hmm, it would be kind of sad to require explicit locking in tidstore.c
> is only for maintaining that one number at all times. Aside from the
> two ereports after an index scan / second heap pass, the only
> non-assert place where it's used is
>
> @@ -1258,7 +1265,7 @@ lazy_scan_heap(LVRelState *vacrel)
>   * Do index vacuuming (call each index's ambulkdelete routine), then do
>   * related heap vacuuming
>   */
> - if (dead_items->num_items > 0)
> + if (TidStoreNumTids(dead_items) > 0)
>   lazy_vacuum(vacrel);
>
> ...and that condition can be checked by doing a single step of
> iteration to see if it shows anything. But for the ereport, my idea
> for iteration + popcount is probably quite slow.

Right.

On further thought, as you pointed out before, "num_tids" should not
be in tidstore in terms of integration with tidbitmap.c, because
tidbitmap.c has "lossy pages". With lossy pages, "num_tids" is no
longer accurate and useful. Similarly, looking at tidbitmap.c, it has
npages and nchunks but they will not be necessary in lazy vacuum use
case. Also, assuming that we support parallel heap pruning, probably
we need to somehow lock the tidstore while adding tids to the tidstore
concurrently by parallel vacuum worker. But in tidbitmap use case, we
don't need to lock the tidstore since it doesn't have multiple
writers. Given these facts, different statistics and different lock
strategies are required by different use case. So I think there are 3
options:

1. expose lock functions for tidstore and the caller manages the
statistics in the outside of tidstore. For example, in lazyvacuum.c we
would have a TidStore for tid storage as well as VacDeadItemsInfo that
has num_tids and max_bytes. Both are in LVRelState. For parallel
vacuum, we pass both to the workers via DSM and pass both to function
where the statistics are required. As for the exposed lock functions,
when adding tids to the tidstore, the caller would need to call
something like TidStoreLockExclusive(ts) that further calls
LWLockAcquire(ts->tree.shared->ctl.lock, LW_EXCLUSIVE) internally.

2. add callback functions to tidstore so that the caller can do its
work while holding a lock on the tidstore. This is like the idea we
just discussed for radix tree. The caller passes a callback function
and user data to TidStoreSetBlockOffsets(), and the callback is called
after setting tids. Similar to option 1, the statistics need to be
stored in a different area.

3. keep tidstore.c and tidbitmap.c separate implementations but use
radix tree in tidbitmap.c. tidstore.c would have "num_tids" in its
control object and doesn't have any lossy page support. On the other
hand, in tidbitmap.c we replace simplehash with radix tree. This makes
tidstore.c simple but we would end up having different data structures
for similar usage.

I think it's worth trying option 1. What do you think, John?

>
> > IIUC lpdead_items is the total number of LP_DEAD items vacuumed during
> > the whole lazy vacuum operation whereas num_items is the number of
> > LP_DEAD items vacuumed within one index vacuum and heap vacuum cycle.
> > That is, after heap vacuum, the latter counter is reset while the
> > former counter is not.
> >
> > The latter counter is used in lazyvacuum.c as well as the ereport in
> > vac_bulkdel_one_index().
>
> Ah, of course.
>
> > Putting a copy of the key in BlocktableEntry's header is an
> > interesting idea. But the current debug code in the tidstore also
> > makes sure that the tidstore returns TIDs in the correct order during
> > an iterate operation. I think it still has a value and you can disable
> > it by removing the "#define TIDSTORE_DEBUG" line.
>
> Fair enough. I just thought it'd be less work to leave this out in
> case we change how locking is called.
>
> > > This week I tried an idea to use a callback there so that after
> > > internal unlocking, the caller received the value (or whatever else
> > > needs to happen, such as lookup an offset in the tid bitmap). I've
> > > attached a draft for that that passes radix tree tests. It's a bit
> > > awkward, but I'm guessing this would more closely match future
> > > internal atomic locking. Let me know what you think of the concept,
> > > and then do

Re: Add last_commit_lsn to pg_stat_database

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review", but it seems like
there was some CFbot test failure last time it was run [1]. Please
have a look and post an updated version if necessary.

==
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4355

Kind Regards,
Peter Smith.

Re: Add code indentation check to cirrus-ci (was Re: Add BF member koel-like indentation checks to SanityCheck CI)

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there was some CFbot test failure last time it was run [2].
Please have a look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4691/
[2] https://cirrus-ci.com/task/5033191522697216

Kind Regards,
Peter Smith.

Re: Opportunistically pruning page before update

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there was some CFbot test failure last time it was run [2].
Please have a look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4384//
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4384

Re: Incremental View Maintenance, take 2

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there was some CFbot test failure last time it was run [2].
Please have a look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4337/
[2] https://cirrus-ci.com/task/6607979311529984

Kind Regards,
Peter Smith.

Re: cleanup patches for dshash

2024-01-21 Thread Andy Fan



Nathan Bossart  writes:

> While working on the dynamic shared memory registry, I noticed a couple of
> potential improvements for code that uses dshash tables.
>
> * A couple of dshash_create() callers pass in 0 for the "void *arg"
>   parameter, which seemed weird.  I incorrectly assumed this was some sort
>   of flags parameter until I actually looked at the function signature.
>   IMHO we should specify NULL here if arg is not used.  0001 makes this
>   change.  This is admittedly a nitpick.
>
> * There are no dshash compare/hash functions for string keys.  0002
>   introduces some that simply forward to strcmp()/string_hash(), and it
>   uses them for the DSM registry's dshash table.  This allows us to remove
>   the hacky key padding code for lookups on this table.
>
> Thoughts?

Both LGTM.

+dshash_strcmp(const void *a, const void *b, size_t size, void *arg)
+{
+   Assert(strlen((const char *) a) < size);
+   Assert(strlen((const char *) b) < size);
+

Do you think the below change will be more explicitly?

#define DSMRegistryNameSize 64

DSMRegistryEntry
{
char name[DSMRegistryNameSize];

}

Assert(strlen((const char *) a) < DSMRegistryNameSize);


-- 
Best Regards
Andy Fan

Re: Remove unused fields in ReorderBufferTupleBuf

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review" [1], but there has
been no activity on this thread for 5+ months.

Do you wish to keep this open, or can you post something to elicit
more interest in reviews for the latest patch set? Otherwise, if
nothing happens then the CF entry will be closed ("Returned with
feedback") at the end of this CF.

==
[1] https://commitfest.postgresql.org/46/4461/

Kind Regards,
Peter Smith.

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review" [1], but there has
been no activity on this thread for 6+ months.

Is anything else planned, or can you post something to elicit more
interest in reviews for the latest patch? Otherwise, if nothing
happens then the CF entry will be closed ("Returned with feedback") at
the end of this CF.

==
[1] https://commitfest.postgresql.org/46/3727/

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Volatile write caches on macOS and Windows, redux

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review" [1], but there has
been no activity on this thread for 6+ months.

Is anything else planned, or can you post something to elicit more
interest in the patch? Otherwise, if nothing happens then the CF entry
will be closed ("Returned with feedback") at the end of this CF.

==
[1] https://commitfest.postgresql.org/46/4453/

Kind Regards,
Peter Smith.

Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2024-01-21 Thread Peter Smith

Hi, this patch was marked in CF as "Needs Review" [1], but there has
been no activity on this thread for 6+ months.

Is anything else planned? Can you post something to elicit more
interest in the latest patch? Otherwise, if nothing happens then the
CF entry will be closed ("Returned with feedback") at the end of this
CF.

==
[1]  https://commitfest.postgresql.org/46/4092/

Kind Regards,
Peter Smith.

Re: Support TZ format code in to_timestamp()

2024-01-21 Thread Tom Lane

Peter Smith  writes:
> Hi, this patch was marked in CF as "Needs Review" [1], but there has
> been no activity on this thread for 7+ months.
> If nothing more is planned for this thread then it will be closed
> ("Returned with feedback") at the end of this CF.

I still think it would be a good idea, but I can't deny the lack
of other interest in it.  Unless someone steps up to review,
let's close it.

(FTR, I don't agree with David's objections to the entire concept
of zone abbreviations.  We're not going to remove support for them
everywhere else, so why shouldn't to_timestamp() handle them?)

regards, tom lane

Re: Change GUC hashtable to use simplehash?

2024-01-21 Thread John Naylor

I wrote:
>   fasthash_init(, sizeof(Datum), kind);
>   fasthash_accum(, (char *) , sizeof(Datum));
>   return fasthash_final32(, 0);

It occurred to me that it's strange to have two places that length can
be passed. That was a side effect of the original, which used length
to both know how many bytes to read, and to modify the internal seed.
With the incremental API, it doesn't make sense to pass the length (or
a dummy macro) up front -- with a compile-time fixed length, it can't
possibly break a tie, so it's just noise.

0001 removes the length from initialization in the incremental
interface. The standalone functions use length directly the same as
before, but after initialization. Thoughts?

Also, the fasthash_accum call is a bit verbose, because it's often
used in a loop with varlen input. For register-sized values, I think
it's simpler to say this, as done in the search path cache, so maybe a
comment to that effect would be helpful:

hs.accum = value;
fasthash_combine();

I noticed that we already have a more recent, stronger 64-bit mixer
than murmur64: splitmix64, in pg_prng.c. We could put that, as well as
a better 4-byte mixer [1] in hashfn_unstable.h, for in-memory use.
Maybe with names like "hash_4bytes" etc. so it's not tied to a
specific implementation. I see one simplehash case that can use it,
even if the resowner hash table gets rid of it.

0002 and 0003 use fasthash for dynahash and GUC hash, respectively.
These cannot use the existing cstring hashing directly because of
truncation and case-folding, respectively. (Some simplehash uses can,
but that can come later)

On Sun, Jan 21, 2024 at 8:06 AM Jeff Davis  wrote:
>
> After having stepped away from this work for a couple weeks and
> returning to it, I think the comments and/or naming could be more
> clear. We first use the result of haszero64() as a boolean to break out
> of the loop, but then later use it in a more interesting way to count
> the number of remaining bytes.
>
> Perhaps you can take the comment out of the loop and just describe the
> algorithm we're using, and make a note that we have to byteswap first.
> "Indeterminate" could be explained briefly as well.

v15-0004 is a stab at that. As an idea, it also renames zero_bytes_le
to zero_byte_low to reflect the effect better. There might be some
other comment edits needed to explain usage, so I plan to hold on to
this for later. Let me know what you think.

[1] Examples of both in
https://www.boost.org/doc/libs/1_84_0/boost/container_hash/detail/hash_mix.hpp
From ad25c43c264c5857bf41cbf056ac7d4ab0995b40 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sun, 21 Jan 2024 17:49:22 +0700
Subject: [PATCH v15 3/4] Use fasthash for guc_name_hash

---
 src/backend/utils/misc/guc.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8f65ef3d89..e76ab52618 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "catalog/objectaccess.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_parameter_acl.h"
+#include "common/hashfn_unstable.h"
 #include "guc_internal.h"
 #include "libpq/pqformat.h"
 #include "parser/scansup.h"
@@ -1324,22 +1325,34 @@ guc_name_compare(const char *namea, const char *nameb)
 static uint32
 guc_name_hash(const void *key, Size keysize)
 {
-	uint32		result = 0;
 	const char *name = *(const char *const *) key;
+	const char *const start = name;
+	fasthash_state hs;
+
+	fasthash_init(, 0);

 	while (*name)
 	{
-		char		ch = *name++;
+		int			chunk_len = 0;

-		/* Case-fold in the same way as guc_name_compare */
-		if (ch >= 'A' && ch <= 'Z')
-			ch += 'a' - 'A';
+		while (chunk_len < FH_SIZEOF_ACCUM && name[chunk_len] != '\0')
+		{
+			chunk_len++;
+			hs.accum <<= BITS_PER_BYTE;
+			hs.accum |= name[chunk_len];
+		}

-		/* Merge into hash ... not very bright, but it needn't be */
-		result = pg_rotate_left32(result, 5);
-		result ^= (uint32) ch;
+		/* Quick ASCII-only downcasing */
+		hs.accum |= 0x2020202020202020;
+
+		/* merge into hash and reset for next iteration */
+		fasthash_combine();
+		hs.accum = 0;
+
+		name += chunk_len;
 	}
-	return result;
+
+	return fasthash_final32(, name - start);
 }

 /*
-- 
2.43.0

From e33633ba036ff521482fb24e8984b5865c8515c8 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sun, 21 Jan 2024 15:33:22 +0700
Subject: [PATCH v15 4/4] WIP: comment edits

Clarify detection of zero bytes when hashing aligned C strings

Discussion: https://postgr.es/m/48e8f8bbe0be9c789f98776c7438244ab7a7cc63.camel%40j-davis.com
---
 src/include/common/hashfn_unstable.h | 38 
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/src/include/common/hashfn_unstable.h b/src/include/common/hashfn_unstable.h
index 8e829297fd..8c42e876be 100644
--- a/src/include/common/hashfn_unstable.h
+++ b/src/include/common/hashfn_unstable.h
@@ -209,26 +209,33 @@

Re: Support TZ format code in to_timestamp()

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review" [1], but there has
been no activity on this thread for 7+ months.

If nothing more is planned for this thread then it will be closed
("Returned with feedback") at the end of this CF.

==
[1]  https://commitfest.postgresql.org/46/4362/

Kind Regards,
Peter Smith.

Re: Temporary tables versus wraparound... again

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review", but there has been
no activity on this thread for 9+ months.

Since there seems not much interest, I have changed the status to
"Returned with Feedback" [1]. Feel free to propose a stronger use case
for the patch and add an entry for the same.

==
[1] https://commitfest.postgresql.org/46/3358/

Kind Regards,
Peter Smith.

Re: WIP: Aggregation push-down - take2

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review", but there has been
no activity on this thread for 9+ months.

Since there seems not much interest, I have changed the status to
"Returned with Feedback" [1]. Feel free to propose a stronger use case
for the patch and add an entry for the same.

==
[1] https://commitfest.postgresql.org/46/3764/

Kind Regards,
Peter Smith.

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review", but there has been
no activity on this thread for 14+ months.

Since there seems not much interest, I have changed the status to
"Returned with Feedback" [1]. Feel free to propose a stronger use case
for the patch and add an entry for the same.

==
[1] https://commitfest.postgresql.org/46/2377/

Kind Regards,
Peter Smith.

Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review", but there has been
no activity on this thread for 15+ months.

Since there seems not much interest, I have changed the status to
"Returned with Feedback" [1]. Feel free to propose a stronger use case
for the patch and add an entry for the same.

==
[1] https://commitfest.postgresql.org/46/3503/

Kind Regards,
Peter Smith.

Re: generate syscache info automatically

2024-01-21 Thread John Naylor

On Fri, Jan 19, 2024 at 9:03 PM Peter Eisentraut  wrote:
>
> The DECLARE_* macros map to actual BKI commands ("declare toast",
> "declare index").  So I wanted to use a different verb to distinguish
> "generate code for this" from those BKI commands.

That makes sense to me.

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch had a CF status of "Ready for Committer", but the
thread has been inactive for 5+ months.

Since the last post from Justin said "hoping to receive some feedback"
I have changed the CF status back to "Needs Review" [1].

==
[1] https://commitfest.postgresql.org/46/3256/

Kind Regards,
Peter Smith.

Re: remaining sql/json patches

2024-01-21 Thread jian he

based on v35.
Now I only applied from 0001 to 0007.
For {DEFAULT expression  ON EMPTY}  | {DEFAULT expression ON ERROR}
restrict DEFAULT expression be either Const node or FuncExpr node.
so these 3 SQL/JSON functions can be used in the btree expression index.

I made some big changes on the doc. (see attachment)
list (json_query, json_exists, json_value) as a new  may be
a good idea.

follow these two links, we can see the difference.
only apply v35, 0001 to 0007: https://v35-functions-json-html.vercel.app
apply v35, 0001 to 0007 plus my changes:
https://html-starter-seven-pied.vercel.app


minor issues:
+Note that if the path_expression
+is strict, an error is generated if it yields no
+items, provided the specified ON ERROR behavior is
+ERROR.

how about something like this:
+Note that if the path_expression
+is strict and ON ERROR
behavior is specified
+ERROR, an error is generated if it yields no
+items

+  
+   
+SQL/JSON path expression can currently only accept values of the
+jsonb type, so it might be necessary to cast the
+context_item argument of these functions to
+jsonb.
+   
+  
here should it be "SQL/JSON query functions"?


v35-0001-only-allow-Const-node-or-scalar-returning-fun.no-cfbot
Description: Binary data


v35-0001-refactor-json_value-json_query-js.sql_json_new_section
Description: Binary data

Re: Guiding principle for dropping LLVM versions?

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Ready for Committer", but it looks
like it failed when the CFbot test for it was last run [1]. Please
have a look and post an updated version..

==
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4640

Kind Regards,
Peter Smith.

Re: index prefetching

2024-01-21 Thread Tomas Vondra




On 1/21/24 20:56, Konstantin Knizhnik wrote:
> 
> On 19/01/2024 2:35 pm, Tomas Vondra wrote:
>>
>> On 1/19/24 09:34, Konstantin Knizhnik wrote:
>>> On 18/01/2024 6:00 pm, Tomas Vondra wrote:
 On 1/17/24 09:45, Konstantin Knizhnik wrote:
> I have integrated your prefetch patch in Neon and it actually works!
> Moreover, I combined it with prefetch of leaf pages for IOS and it
> also
> seems to work.
>
 Cool! And do you think this is the right design/way to do this?
>>> I like the idea of prefetching TIDs in executor.
>>>
>>> But looking though your patch I have some questions:
>>>
>>>
>>> 1. Why it is necessary to allocate and store all_visible flag in data
>>> buffer. Why caller of  IndexPrefetchNext can not look at prefetch field?
>>>
>>> +        /* store the all_visible flag in the private part of the
>>> entry */
>>> +        entry->data = palloc(sizeof(bool));
>>> +        *(bool *) entry->data = all_visible;
>>>
>> What you mean by "prefetch field"?
> 
> 
> I mean "prefetch" field of IndexPrefetchEntry:
> 
> +
> +typedef struct IndexPrefetchEntry
> +{
> +    ItemPointerData tid;
> +
> +    /* should we prefetch heap page for this TID? */
> +    bool        prefetch;
> +
> 
> You store the same flag twice:
> 
> +        /* prefetch only if not all visible */
> +        entry->prefetch = !all_visible;
> +
> +        /* store the all_visible flag in the private part of the entry */
> +        entry->data = palloc(sizeof(bool));
> +        *(bool *) entry->data = all_visible;
> 
> My question was: why do we need to allocate something in entry->data and
> store all_visible in it, while we already stored !all-visible in
> entry->prefetch.
> 

Ah, right. Well, you're right in this case we perhaps could set just one
of those flags, but the "purpose" of the two places is quite different.

The "prefetch" flag is fully controlled by the prefetcher, and it's up
to it to change it (e.g. I can easily imagine some new logic touching
setting it to "false" for some reason).

The "data" flag is fully controlled by the custom callbacks, so whatever
the callback stores, will be there.

I don't think it's worth simplifying this. In particular, I don't think
the callback can assume it can rely on the "prefetch" flag.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: index prefetching

2024-01-21 Thread Tomas Vondra




On 1/21/24 20:50, Konstantin Knizhnik wrote:
> 
> On 20/01/2024 12:14 am, Tomas Vondra wrote:
>> Looks like I was not true, even if it is not index-only scan but index
>>> condition involves only index attributes, then heap is not accessed
>>> until we find tuple satisfying search condition.
>>> Inclusive index case described above
>>> (https://commitfest.postgresql.org/46/4352/) is interesting but IMHO
>>> exotic case. If keys are actually used in search, then why not to create
>>> normal compound index instead?
>>>
>> Not sure I follow ...
>>
>> Firstly, I'm not convinced the example addressed by that other patch is
>> that exotic. IMHO it's quite possible it's actually quite common, but
>> the users do no realize the possible gains.
>>
>> Also, there are reasons to not want very wide indexes - it has overhead
>> associated with maintenance, disk space, etc. I think it's perfectly
>> rational to design indexes in a way eliminates most heap fetches
>> necessary to evaluate conditions, but does not guarantee IOS (so the
>> last heap fetch is still needed).
> 
> We are comparing compound index (a,b) and covering (inclusive) index (a)
> include (b)
> This indexes have exactly the same width and size and almost the same
> maintenance overhead.
> 
> First index has more expensive comparison function (involving two
> columns)  but I do not think that it can significantly affect
> performance and maintenance cost. Also if selectivity of "a" is good
> enough, then there is no need to compare "b"
> 
> Why we can prefer covering index  to compound index? I see only two good
> reasons:
> 1. Extra columns type do not  have comparison function need for AM.
> 2. The extra columns are never used in query predicate.
> 

Or maybe you don't want to include the columns in a UNIQUE constraint?

> If you are going to use this columns in query predicates I do not see
> much sense in creating inclusive index rather than compound index.
> Do you?
> 

But this is also about conditions that can't be translated into index
scan keys. Consider this:

create table t (a int, b int, c int);
insert into t select 1000 * random(), 1000 * random(), 1000 * random()
from generate_series(1,100) s(i);
create index on t (a,b);
vacuum analyze t;

explain (analyze, buffers) select * from t where a = 10 and mod(b,10) =
111;
   QUERY PLAN

-
 Index Scan using t_a_b_idx on t  (cost=0.42..3670.74 rows=5 width=12)
(actual time=4.562..4.564 rows=0 loops=1)
   Index Cond: (a = 10)
   Filter: (mod(b, 10) = 111)
   Rows Removed by Filter: 974
   Buffers: shared hit=980
   Prefetches: blocks=901
 Planning Time: 0.304 ms
 Execution Time: 5.146 ms
(8 rows)

Notice that this still fetched ~1000 buffers in order to evaluate the
filter on "b", because it's complex and can't be transformed into a nice
scan key. Or this:

explain (analyze, buffers) select a from t where a = 10 and (b+1) < 100
 and c < 0;


   QUERY PLAN

 Index Scan using t_a_b_idx on t  (cost=0.42..3673.22 rows=1 width=4)
(actual time=4.446..4.448 rows=0 loops=1)
   Index Cond: (a = 10)
   Filter: ((c < 0) AND ((b + 1) < 100))
   Rows Removed by Filter: 974
   Buffers: shared hit=980
   Prefetches: blocks=901
 Planning Time: 0.313 ms
 Execution Time: 4.878 ms
(8 rows)

where it's "broken" by the extra unindexed column.

FWIW there are the primary cases I had in mind for this patch.


> 
>> What do you mean by "create normal compound index"? The patch addresses
>> a limitation that not every condition can be translated into a proper
>> scan key. Even if we improve this, there will always be such conditions.
>> The the IOS can evaluate them on index tuple, the regular index scan
>> can't do that (currently).
>>
>> Can you share an example demonstrating the alternative approach?
> 
> May be I missed something.
> 
> This is the example from
> https://www.postgresql.org/message-id/flat/N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU=@yamlcoder.me
>  :
> 
> ```
> 
> And here is the plan with index on (a,b).
> 
> Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884
> rows=0 loops=1)    Output: a, b, d    Buffers: shared hit=613    ->
> Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1
> width=12) (actual time=6.880..6.881 rows=0 loops=1)      Output: a,
> b, d  Index Cond: ((t.a > 100) AND (t.b = 4))  
>    Buffers: shared hit=613 Planning:    Buffers: shared hit=41 Planning
> Time: 0.314 ms Execution Time: 6.910 ms ```
> 
> 
> Isn't it an optimal plan for this query?
> 
> And cite from self reproducible example

Re: pg_rewind WAL segments deletion pitfall

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Ready for Committer", but it is
currently failing some CFbot tests [1]. Please have a look and post an
updated version..

==
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/3874

Kind Regards,
Peter Smith.

Re: Evaluate arguments of correlated SubPlans in the referencing ExprState

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Ready for Committer", but it is
currently failing some CFbot tests [1]. Please have a look and post an
updated version..

==
[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4209

Kind Regards,
Peter Smith.

Re: the s_lock_stuck on perform_spin_delay

2024-01-21 Thread Andy Fan


Hi,

Andy Fan  writes:

> Hi,
>
> v6 attached which addressed all the items Robert suggested except the
> following 2 open items. They are handled differently.
>
>>
>> Here is the summary of the open-items, it would be great that Andres and
>> Matthias have a look at this when they have time.
>>
 The LockBufHdr also used init_local_spin_delay / perform_spin_delay
 infrastruce and then it has the same issue like ${subject}, it is pretty
 like the code in s_lock; Based on my current knowledge, I think we
 should add the check there.
>>>
>>> I'd like to hear from Andres, if possible. @Andres: Should these
>>> sanity checks apply only to spin locks per se, or also to buffer
>>> header locks?
>
> v6 is splitted into 2 commits, one for normal SpinLock and one for
> LockBufHdr lock.
>
> commit 6276d2f66b0760053e3fdfe259971be3abba3c63
> Author: yizhi.fzh 
> Date:   Fri Jan 19 13:52:07 2024 +0800
>
> Detect more misuse of spin lock automatically
> 
> Spin lock are intended for *very* short-term locks, but it is possible
> to be misused in many cases. e.g. Acquiring another LWLocks or regular
> locks, memory allocation, errstart when holding a spin lock. this patch
> would detect such misuse automatically in a USE_ASSERT_CHECKING build.
> 
> CHECK_FOR_INTERRUPTS should be avoided as well when holding a spin lock.
> Depends on what signals are left to handle, PG may raise error/fatal
> which would cause the code jump to some other places which is hardly to
> release the spin lock anyway.
>
> commit 590a0c6f767f62f6c83289d55de99973bc7da417 (HEAD -> s_stuck_v3)
> Author: yizhi.fzh 
> Date:   Fri Jan 19 13:57:46 2024 +0800
>
> Treat (un)LockBufHdr as a SpinLock.
> 
> The LockBufHdr also used init_local_spin_delay / perform_spin_delay
> infrastructure and so it is also possible that PANIC the system
> when it can't be acquired in a short time, and its code is pretty
> similar with s_lock. so treat it same as SPIN lock when regarding to
> misuse of spinlock detection.
>
 they were put into spin.h in v1 but later move to miscadmin.h at [1].
 [1]
 https://www.postgresql.org/message-id/CAEze2WggP-2Dhocmdhp-LxBzic%3DMXRgGA_tmv1G_9n-PDt2MQg%40mail.gmail.com
>>>
>>> I'm not entirely sure what the right thing to do is here, and the
>>> answer may depend on the previous question. But I disagree with
>>> Matthias -- I don't think miscadmin.h can be the right answer
>>> regardless.
>
> I put it into spin.h this time in commit 1, and include the extern
> function VerifyNoSpinLocksHeld in spin.c into miscadmin.h like what we
> did for ProcessInterrupts. This will easy the miscadmin dependency. the
> changes for '#include xxx' looks better than before.

I found a speical case about checking it in errstart. So commit 3 in v7
is added.  I'm not intent to commit 1 and commit 3 should be 2 sperate
commits, but making them 2 will be easy for discussion.

commit 757c67c1d4895ce6a523bcf5217af8eb2351e2a1 (HEAD -> s_stuck_v3)
Author: yizhi.fzh 
Date:   Mon Jan 22 07:14:29 2024 +0800

Bypass SpinLock checking in SIGQUIT signal hander

When a process receives a SIGQUIT signal, it indicates the system has a
crash time. It's possible that the process is just holding a Spin
lock. By our current checking, this process will PANIC with a misuse of
spinlock which is pretty prone to misunderstanding. so we need to bypass
the spin lock holding checking in this case. It is safe since the
overall system will be restarted.

-- 
Best Regards
Andy Fan

>From b6bb33994f479824a8fac6a1d076a103c16e9d69 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" 
Date: Fri, 19 Jan 2024 13:52:07 +0800
Subject: [PATCH v7 1/3] Detect more misuse of spin lock automatically

Spin lock are intended for *very* short-term locks, but it is possible
to be misused in many cases. e.g. Acquiring another LWLocks or regular
locks, memory allocation, errstart when holding a spin lock. this patch
would detect such misuse automatically in a USE_ASSERT_CHECKING build.

CHECK_FOR_INTERRUPTS should be avoided as well when holding a spin lock.
Depends on what signals are left to handle, PG may raise error/fatal
which would cause the code jump to some other places which is hardly to
release the spin lock anyway.
---
 src/backend/storage/lmgr/lock.c   |  6 
 src/backend/storage/lmgr/lwlock.c |  6 
 src/backend/storage/lmgr/spin.c   | 13 +
 src/backend/utils/error/elog.c|  7 +
 src/backend/utils/mmgr/mcxt.c | 16 +++
 src/include/miscadmin.h   | 12 +++-
 src/include/storage/spin.h| 46 +--
 7 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index c70a1adb9a..cb9969b860 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -776,6 +776,12 @@ LockAcquireExtended(const LOCKTAG *locktag,

Re: introduce dynamic shared memory registry

2024-01-21 Thread Nathan Bossart

On Sun, Jan 21, 2024 at 11:21:46AM -0500, Tom Lane wrote:
> Coverity complained about this:
> 
> *** CID 1586660:  Null pointer dereferences  (NULL_RETURNS)
> /srv/coverity/git/pgsql-git/postgresql/src/backend/storage/ipc/dsm_registry.c:
>  185 in GetNamedDSMSegment()
> 179   }
> 180   else if (!dsm_find_mapping(entry->handle))
> 181   {
> 182   /* Attach to existing segment. */
> 183   dsm_segment *seg = dsm_attach(entry->handle);
> 184 
 CID 1586660:  Null pointer dereferences  (NULL_RETURNS)
 Dereferencing a pointer that might be "NULL" "seg" when calling 
 "dsm_pin_mapping".
> 185   dsm_pin_mapping(seg);
> 186   ret = dsm_segment_address(seg);
> 187   }
> 188   else
> 189   {
> 190   /* Return address of an already-attached segment. */
> 
> I think it's right --- the comments for dsm_attach explicitly
> point out that a NULL return is possible.  You need to handle
> that scenario in some way other than SIGSEGV.

Oops.  I've attached an attempt at fixing this.  I took the opportunity to
clean up the surrounding code a bit.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From f4c1c7a7ce8bccf7251e384f895f34beb33f839e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sun, 21 Jan 2024 16:05:16 -0600
Subject: [PATCH v1 1/1] fix coverity complaint

---
 src/backend/storage/ipc/dsm_registry.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_registry.c b/src/backend/storage/ipc/dsm_registry.c
index ac11f51375..c178173653 100644
--- a/src/backend/storage/ipc/dsm_registry.c
+++ b/src/backend/storage/ipc/dsm_registry.c
@@ -177,19 +177,22 @@ GetNamedDSMSegment(const char *name, size_t size,
 (errmsg("requested DSM segment size does not match size of "
 		"existing segment")));
 	}
-	else if (!dsm_find_mapping(entry->handle))
+	else
 	{
-		/* Attach to existing segment. */
-		dsm_segment *seg = dsm_attach(entry->handle);
+		dsm_segment *seg = dsm_find_mapping(entry->handle);
+
+		/* If the existing segment is not already attached, attach it now. */
+		if (seg == NULL)
+		{
+			seg = dsm_attach(entry->handle);
+			if (seg == NULL)
+elog(ERROR, "could not map dynamic shared memory segment");
+
+			dsm_pin_mapping(seg);
+		}
 
-		dsm_pin_mapping(seg);
 		ret = dsm_segment_address(seg);
 	}
-	else
-	{
-		/* Return address of an already-attached segment. */
-		ret = dsm_segment_address(dsm_find_mapping(entry->handle));
-	}
 
 	dshash_release_lock(dsm_registry_table, entry);
 	MemoryContextSwitchTo(oldcontext);
-- 
2.25.1

Re: Things I don't like about \du's "Attributes" column

2024-01-21 Thread Pavel Luzanov


Another approach based on early suggestions.

The Attributes column includes only the enabled logical attributes.
Regardless of whether the attribute is enabled by default or not.
This changes the current behavior, but makes it clearer: everything
that is enabled is displayed. This principle is easy to maintain in
subsequent versions, even if there is a desire to change the default
value for any attribute. In addition, the issue with the 'LOGIN' attribute
is being resolved, the default value of which depends on the command
(CREATE ROLE or CREATE USER).

The attribute names correspond to the keywords of the CREATE ROLE command.
The attributes are listed in the same order as in the documentation.
(I think that the LOGIN attribute should be moved to the first place,
both in the documentation and in the command.)


The "Connection limit" and "Valid until" attributes are placed in separate 
columns.
The "Password?" column has been added.

Sample output.

Patch v3:
=# \du
 List of roles
 Role name |Attributes 
| Password? |  Valid until   | Connection limit
---+---+---++--
 admin | INHERIT   
| no||   -1
 alice | SUPERUSER LOGIN   
| yes   | infinity   |5
 bob   | CREATEDB INHERIT LOGIN REPLICATION BYPASSRLS  
| no| 2022-01-01 00:00:00+03 |   -1
 charlie   | CREATEROLE INHERIT LOGIN  
| yes   ||0
 postgres  | SUPERUSER CREATEDB CREATEROLE INHERIT LOGIN REPLICATION BYPASSRLS 
| no||   -1
(5 rows)


The output of the command is long. But there are other commands of
comparable length: \dApS, \dfS, \doS.

Small modification with newline separator for Attributes column:

Patch v3 with newlines:
=# \du
  List of roles
 Role name | Attributes  | Password? |  Valid until   | Connection limit
---+-+---++--
 admin | INHERIT | no||   -1
 alice | SUPERUSER  +| yes   | infinity   |5
   | LOGIN   |   ||
 bob   | CREATEDB   +| no| 2022-01-01 00:00:00+03 |   -1
   | INHERIT+|   ||
   | LOGIN  +|   ||
   | REPLICATION+|   ||
   | BYPASSRLS   |   ||
 charlie   | CREATEROLE +| yes   ||0
   | INHERIT+|   ||
   | LOGIN   |   ||
 postgres  | SUPERUSER  +| no||   -1
   | CREATEDB   +|   ||
   | CREATEROLE +|   ||
   | INHERIT+|   ||
   | LOGIN  +|   ||
   | REPLICATION+|   ||
   | BYPASSRLS   |   ||
(5 rows)

For comparison, here's what it looks like now:

master:
=# \du
 List of roles
 Role name | Attributes
---+
 admin | Cannot login
 alice | Superuser, No inheritance +
   | 5 connections +
   | Password valid until infinity
 bob   | Create DB, Replication, Bypass RLS+
   | Password valid until 2022-01-01 00:00:00+03
 charlie   | Create role   +
   | No connections
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS


From my point of view, any of the proposed alternatives is better than what we 
have now.
But for moving forward we need to choose some approach.

I will be glad of any opinions.

--
Pavel Luzanov
Postgres Professional:https://postgrespro.com
From 454ff68b64ca2ebcc2ba2e8d176f55ab018606cd Mon Sep 17 00:00:00 2001
From: Pavel Luzanov 
Date: Sun, 21 Jan 2024 23:44:33 +0300
Subject: [PATCH v3] psql: Rethinking of \du command

The Attributes column includes only the enabled logical attributes.
The attribute names correspond to the keywords of the CREATE ROLE
command.

Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler

On Jan 21, 2024, at 14:58, David E. Wheeler  wrote:

> I make this interpretation based on this bit of the docs:

Sorry, that’s from my branch. Here it is in master:

   

A path expression can be a Boolean predicate, although the SQL/JSON
standard allows predicates only in filters. This is necessary for
implementation of the @@ operator. For example,
the following jsonpath expression is valid in
PostgreSQL:

$.track.segments[*].HR  70




In any event, something to do with @@, perhaps to have some compatibility with 
`jsonb @> jsonb`? I don’t know why @@ was important to have.

David

Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler

On Jan 21, 2024, at 14:52, David E. Wheeler  wrote:

> This is the only way the different behaviors make sense to me. @? expects a 
> set, not a boolean, sees there is an item in the set, so returns true:

I make this interpretation based on this bit of the docs:

PostgreSQL's implementation of the SQL/JSON path
language has the following deviations from the SQL/JSON standard.

Boolean Predicate Check Expressions

As an extension to the SQL standard, a PostgreSQL
path expression can be a Boolean predicate, whereas the SQL standard allows
predicates only in filters. Where SQL standard path expressions return the
relevant contents of the queried JSON value, predicate check expressions
return the three-valued result of the predicate: true,
false, or unknown. Compare this
filter jsonpath expression:

= select jsonb_path_query(:'json', 
'$.track.segments ?(@[*].HR  130)');
jsonb_path_query
-
{"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}

To a predicate expression, which returns true

= select jsonb_path_query(:'json', 
'$.track.segments[*].HR  130');
jsonb_path_query
--
true

Best,

David

Re: index prefetching

2024-01-21 Thread Konstantin Knizhnik




On 19/01/2024 2:35 pm, Tomas Vondra wrote:


On 1/19/24 09:34, Konstantin Knizhnik wrote:

On 18/01/2024 6:00 pm, Tomas Vondra wrote:

On 1/17/24 09:45, Konstantin Knizhnik wrote:

I have integrated your prefetch patch in Neon and it actually works!
Moreover, I combined it with prefetch of leaf pages for IOS and it also
seems to work.


Cool! And do you think this is the right design/way to do this?

I like the idea of prefetching TIDs in executor.

But looking though your patch I have some questions:


1. Why it is necessary to allocate and store all_visible flag in data
buffer. Why caller of  IndexPrefetchNext can not look at prefetch field?

+        /* store the all_visible flag in the private part of the entry */
+        entry->data = palloc(sizeof(bool));
+        *(bool *) entry->data = all_visible;


What you mean by "prefetch field"?



I mean "prefetch" field of IndexPrefetchEntry:

+
+typedef struct IndexPrefetchEntry
+{
+    ItemPointerData tid;
+
+    /* should we prefetch heap page for this TID? */
+    bool        prefetch;
+

You store the same flag twice:

+        /* prefetch only if not all visible */
+        entry->prefetch = !all_visible;
+
+        /* store the all_visible flag in the private part of the entry */
+        entry->data = palloc(sizeof(bool));
+        *(bool *) entry->data = all_visible;

My question was: why do we need to allocate something in entry->data and 
store all_visible in it, while we already stored !all-visible in 
entry->prefetch.

Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler

On Jan 21, 2024, at 14:43, Tom Lane  wrote:

> I don't entirely buy this argument --- if that is the interpretation,
> of what use are predicate check expressions?  It seems to me that we
> have to consider them as being a shorthand notation for filter
> expressions, or else they simply do not make sense as jsonpath.

I believe it becomes pretty apparent when using jsonb_path_query(). The filter 
expression returns a set (using the previous \gset example):

david=# select jsonb_path_query(:'json', '$.track.segments[*].HR ? (@ > 10)');
 jsonb_path_query 
--
 73
 135
(2 rows)

The predicate check returns a boolean:

david=# select jsonb_path_query(:'json', '$.track.segments[*].HR > 10');
 jsonb_path_query 
--
 true
(1 row)

This is the only way the different behaviors make sense to me. @? expects a 
set, not a boolean, sees there is an item in the set, so returns true:

david=# select jsonb_path_query(:'json', '$.track.segments[*].HR > 1000');
 jsonb_path_query 
--
 false
(1 row)

david=# select :'json'::jsonb @? '$.track.segments[*].HR > 1000';
 ?column? 
--
 t
(1 row)

Best,

David

Re: index prefetching

2024-01-21 Thread Konstantin Knizhnik



On 20/01/2024 12:14 am, Tomas Vondra wrote:

Looks like I was not true, even if it is not index-only scan but index

condition involves only index attributes, then heap is not accessed
until we find tuple satisfying search condition.
Inclusive index case described above
(https://commitfest.postgresql.org/46/4352/) is interesting but IMHO
exotic case. If keys are actually used in search, then why not to create
normal compound index instead?


Not sure I follow ...

Firstly, I'm not convinced the example addressed by that other patch is
that exotic. IMHO it's quite possible it's actually quite common, but
the users do no realize the possible gains.

Also, there are reasons to not want very wide indexes - it has overhead
associated with maintenance, disk space, etc. I think it's perfectly
rational to design indexes in a way eliminates most heap fetches
necessary to evaluate conditions, but does not guarantee IOS (so the
last heap fetch is still needed).


We are comparing compound index (a,b) and covering (inclusive) index (a) 
include (b)
This indexes have exactly the same width and size and almost the same 
maintenance overhead.


First index has more expensive comparison function (involving two 
columns)  but I do not think that it can significantly affect
performance and maintenance cost. Also if selectivity of "a" is good 
enough, then there is no need to compare "b"


Why we can prefer covering index  to compound index? I see only two good 
reasons:

1. Extra columns type do not  have comparison function need for AM.
2. The extra columns are never used in query predicate.

If you are going to use this columns in query predicates I do not see 
much sense in creating inclusive index rather than compound index.

Do you?



What do you mean by "create normal compound index"? The patch addresses
a limitation that not every condition can be translated into a proper
scan key. Even if we improve this, there will always be such conditions.
The the IOS can evaluate them on index tuple, the regular index scan
can't do that (currently).

Can you share an example demonstrating the alternative approach?


May be I missed something.

This is the example from 
https://www.postgresql.org/message-id/flat/N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU=@yamlcoder.me 
:


```

And here is the plan with index on (a,b).

Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 
rows=0 loops=1)    Output: a, b, d    Buffers: shared hit=613    -> 
Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1 
width=12) (actual time=6.880..6.881 rows=0 loops=1)      Output: a, 
b, d  Index Cond: ((t.a > 100) AND (t.b = 4))   
   Buffers: shared hit=613 Planning:    Buffers: shared hit=41 Planning 
Time: 0.314 ms Execution Time: 6.910 ms ```



Isn't it an optimal plan for this query?

And cite from self reproducible example https://dbfiddle.uk/iehtq44L :
```
create unique index t_a_include_b on t(a) include (b);
-- I'd expecd index above to behave the same as index below for this query
--create unique index on t(a,b);
```

I agree that it is natural to expect the same result for both indexes. 
So this PR definitely makes sense.
My point is only that compound index (a,b) in this case is more natural 
and preferable.

Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread Tom Lane

"David E. Wheeler"  writes:
> On Jan 20, 2024, at 12:34, Tom Lane  wrote:
>> It will take a predicate, but seems to always return true:
>> 
>> regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] < 5' ;
>> ?column? 
>> --
>> t
>> (1 row)
>> 
>> regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] > 5' ;
>> ?column? 
>> --
>> t
>> (1 row)

> Just for the sake of clarity, this return value is “correct,” because @? and 
> other functions and operators that expect SQL standard statements evaluate 
> the SET returned by the JSONPath statement, but predicate check expressions 
> don’t return a set, but a always a single scalar value (true, false, or 
> null). From the POV of the code expecting SQL standard JSONPath results, 
> that’s a set of one. @? sees that the set is not empty so returns true.

I don't entirely buy this argument --- if that is the interpretation,
of what use are predicate check expressions?  It seems to me that we
have to consider them as being a shorthand notation for filter
expressions, or else they simply do not make sense as jsonpath.

regards, tom lane

Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread Tom Lane

"David E. Wheeler"  writes:
> On Jan 20, 2024, at 12:34, Tom Lane  wrote:
>> Surely we're not helping anybody by leaving that behavior in place.
>> Making it do something useful, throwing an error, or returning NULL
>> all seem superior to this.  I observe that @@ returns NULL for the
>> path type it doesn't like, so maybe that's what to do here.

> I agree it would be far better for the behavior to be consistent, but frankly 
> would like to see them raise an error. Ideally the hit would suggest the 
> proper alternative operator or function to use, and maybe link to the docs 
> that describe the difference between SQL-standard JSONPath and "predicate 
> check expressions”, and how they have separate operators and functions.

That ship's probably sailed.  However, I spent some time poking into
the odd behavior I showed for @?, and it seems to me that it's an
oversight in appendBoolResult.  That just automatically returns jperOk
in the !found short-circuit path for any boolean result, which is not
the behavior you'd get if the boolean value were actually returned
(cf. jsonb_path_match_internal).  I experimented with making it do
what seems like the right thing, and found that there is only one
regression test case that changes behavior:

 select jsonb '2' @? '$ == "2"';
  ?column? 
 --
- t
+ f
 (1 row)
 

Now, JSON does not think that numeric 2 equals string "2", so
ISTM the expected output here is flat wrong.  It's certainly
inconsistent with @@:

regression=# select jsonb '2' @@ '$ == "2"';
 ?column? 
--
 
(1 row)

So I think we should consider a patch like the attached
(probably with some more test cases added).  I don't really
understand this code however, so maybe I missed something.

regards, tom lane

diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index ac16f5c85d..63c46cfab5 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -2065,7 +2065,14 @@ appendBoolResult(JsonPathExecContext *cxt, JsonPathItem *jsp,
 	JsonbValue	jbv;
 
 	if (!jspGetNext(jsp, ) && !found)
-		return jperOk;			/* found singleton boolean value */
+	{
+		/*
+		 * We have a predicate check expression, i.e. a path ending in a bare
+		 * boolean operator, and we don't need to return the exact value(s)
+		 * found.  Just report success or failure of the boolean.
+		 */
+		return (res == jpbTrue) ? jperOk : jperNotFound;
+	}
 
 	if (res == jpbUnknown)
 	{
diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out
index 6659bc9091..59e1b71051 100644
--- a/src/test/regress/expected/jsonb_jsonpath.out
+++ b/src/test/regress/expected/jsonb_jsonpath.out
@@ -1157,7 +1157,7 @@ select jsonb_path_query('2', '$ == "2"');
 select jsonb '2' @? '$ == "2"';
  ?column? 
 --
- t
+ f
 (1 row)
 
 select jsonb '2' @@ '$ > 1';

Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E. Wheeler

On Jan 20, 2024, at 12:34, Tom Lane  wrote:

> It will take a predicate, but seems to always return true:
> 
> regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] < 5' ;
> ?column? 
> --
> t
> (1 row)
> 
> regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] > 5' ;
> ?column? 
> --
> t
> (1 row)

Just for the sake of clarity, this return value is “correct,” because @? and 
other functions and operators that expect SQL standard statements evaluate the 
SET returned by the JSONPath statement, but predicate check expressions don’t 
return a set, but a always a single scalar value (true, false, or null). From 
the POV of the code expecting SQL standard JSONPath results, that’s a set of 
one. @? sees that the set is not empty so returns true.

Best,

David

Re: [17] CREATE COLLATION default provider

2024-01-21 Thread Jeff Davis

On Thu, 2024-01-18 at 11:15 +0100, Peter Eisentraut wrote:
> I wonder, however, how useful this would be in practice.  In most 
> interesting cases, you need to know what the provider is to be able
> to 
> spell out the locale name appropriately.  The cases where some
> overlap 
> exists, like the common "ll_CC", are already preloaded, so won't 
> actually need to be specified explicitly in many cases.

Good point.

> Also, I think the default should only flow one way, top-down:  The 
> default provider of CREATE COLLATION is datlocprovider.  There
> shouldn't 
> be a second, botton-up way, based on the other specified CREATE 
> COLLATION parameters.  That's just too much logical zig-zag, IMO. 

Also a good point. I am withdrawing this patch from the CF, and we can
reconsider the idea later perhaps in some other form.

Regards,
Jeff Davis

Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-21 Thread David E . Wheeler

On Jan 20, 2024, at 11:45, Tom Lane  wrote:

> You sure about that?  It would surprise me if we could effectively use
> a not-equal condition with an index.  If it is only == that works,
> then the preceding statement seems sufficient.

I’m not! I just assumed it in the same way creating an SQL = operator 
automatically respects NOT syntax (or so I recall). In fiddling a bit, I can’t 
get it to use an index:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s 
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json 
| jq -c ".[]" | sed "s|||g"';
create index on movies using gin (movie);
analyze movies;

david=# explain analyze select id from movies where movie @? '$ ?(@.genre[*] != 
"Teen")';
QUERY PLAN  
 
-
Seq Scan on movies  (cost=0.00..3741.41 rows=4 width=4) (actual 
time=19.222..19.223 rows=0 loops=1)
  Filter: (movie @? '$?(@."genre"[*] != "Teen")'::jsonpath)
  Rows Removed by Filter: 36273
Planning Time: 1.242 ms
Execution Time: 19.247 ms
(5 rows)

But that might be because the planner knows that the query is going to fetch 
most records, anyway. If I set most records to a single value:

david=# update movies set movie =  jsonb_set(movie, '{year}', '2020'::jsonb) 
where id < 3600;
UPDATE 3599
david=# analyze movies;
ANALYZE
david=# explain analyze select id from movies where movie @? '$ ?(@.year != 
2020)';
QUERY PLAN  


Seq Scan on movies  (cost=0.00..3884.41 rows=32609 width=4) (actual 
time=0.065..43.730 rows=32399 loops=1)
  Filter: (movie @? '$?(@."year" != 2020)'::jsonpath)
  Rows Removed by Filter: 3874
Planning Time: 1.759 ms
Execution Time: 45.368 ms
(5 rows)

Looks like it still doesn’t use the index with !=. Pity.

Best,

David

Re: Collation version tracking for macOS

2024-01-21 Thread Jeff Davis

On Sat, 2024-01-20 at 07:40 +0530, vignesh C wrote:
> This thread has been idle for a year now, It has stalled after a lot
> of discussion.
> @Jeff Davis: Do you want to try to restart the discussion by posting
> an updated version and see what happens?

Thank you for following up. Yes, I'd like to find a path forward here,
but I need some validation from others on my approach.

I rendered the docs I wrote as an HTML page and attached it to this
thread, to make it easier for others to read and comment. It's
basically a tool for experts who are willing to devote effort to
managing their collations and ICU libraries. Is that what we want?

At an implementation level, did I get the extension APIs right? I
considered making the API simpler, but that would require the extension
to do quite a bit more work (including a lot of redundant work) to use
ICU properly.

Regards,
Jeff Davis

FEATURE REQUEST: Role vCPU limit/priority

2024-01-21 Thread Yoni Sade

It would be useful to have the ability to define for a role default vCPU
affinity limits/thread priority settings so that more active sessions could
coexist similar to MySQL resource groups
.

Best Regards,
Yoni Sade

1 2 >

1 - 100 of 108 matches

Mail list logo