Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-06 Thread Amit Kapila
On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 
>  wrote:
> >
> > Dear hackers,
> >
> > PSA the patch to solve the issue [1].
> >
> > Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> > generated in the source directory, even when the VPATH/meson build.
> > This can avoid by changing the directory explicitly.
> >
> > [1]:
> > https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> > 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf
>
> Thanks for the patch, I have confirmed that the files won't be generated
> in source directory after applying the patch.
>
> After running: "meson test -C build/ --suite pg_upgrade",
> The files are in the test directory:
> ./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh
>

Thanks for the patch and verification. Pushed the fix.

-- 
With Regards,
Amit Kapila.




Re: Support run-time partition pruning for hash join

2023-11-06 Thread Richard Guo
On Mon, Nov 6, 2023 at 11:00 PM Alexander Lakhin 
wrote:

> Please look at a warning and an assertion failure triggered by the
> following script:
> set parallel_setup_cost = 0;
> set parallel_tuple_cost = 0;
> set min_parallel_table_scan_size = '1kB';
>
> create table t1 (i int) partition by range (i);
> create table t1_1 partition of t1 for values from (1) to (2);
> create table t1_2 partition of t1 for values from (2) to (3);
> insert into t1 values (1), (2);
>
> create table t2(i int);
> insert into t2 values (1), (2);
> analyze t1, t2;
>
> select * from t1 right join t2 on t1.i = t2.i;
>
> 2023-11-06 14:11:37.398 UTC|law|regression|6548f419.392cf5|WARNING:  Join
> partition pruning $0 has not been performed yet.
> TRAP: failed Assert("node->as_prune_state"), File: "nodeAppend.c", Line:
> 846, PID: 3747061
>

Thanks for the report!  I failed to take care of the parallel-hashjoin
case, and I have to admit that it's not clear to me yet how we should do
join partition pruning in that case.

For now I think it's better to just avoid performing join partition
pruning for parallel hashjoin, so that the patch doesn't become too
complex for review.  We can always extend it in the future.

I have done that in v5.  Thanks for testing!

Thanks
Richard


v5-0001-Support-run-time-partition-pruning-for-hash-join.patch
Description: Binary data


Re: Remove distprep

2023-11-06 Thread Michael Paquier
On Mon, Nov 06, 2023 at 04:21:40PM +0100, Peter Eisentraut wrote:
> done

Nice to see 721856ff24b3 in, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: SQL:2011 application time

2023-11-06 Thread jian he
hi. based on v17. I found several doc related issues. previously I
didn't look closely

+ 
+  In a temporal foreign key, the delete/update will use
+  FOR PORTION OF semantics to constrain the
+  effect to the bounds being deleted/updated in the referenced row.
+ 

The first "para" should be  ?
---
There are many warnings after #define WRITE_READ_PARSE_PLAN_TREES
see: http://cfbot.cputube.org/highlights/all.html#4308
Does that mean oue new change in gram.y is somehow wrong?
--
sgml/html/sql-update.html:
"range_or_period_name
The range column or period to use when performing a temporal update.
This must match the range or period used in the table's temporal
primary key."

Is the second sentence unnecessary? since no primary can still do "for
portion of update".

sgml/html/sql-update.html:
"start_time
The earliest time (inclusive) to change in a temporal update. This
must be a value matching the base type of the range or period from
range_or_period_name. It may also be the special value MINVALUE to
indicate an update whose beginning is unbounded."

probably something like the following:
"lower_bound"
The lower bound (inclusive) to change in an overlap update. This must
be a value matching the base type of the range or period from
range_or_period_name. It may also be the special value UNBOUNDED to
indicate an update whose beginning is unbounded."

Obviously the "start_time" reference also needs to change, and the
sql-delete.html reference also needs to change.
--
UPDATE for_portion_of_test FOR PORTION OF valid_at  FROM NULL TO
"unbounded" SET name = 'NULL to NULL';
should fail, but not. double quoted unbounded is a column reference, I assume.

That's why I am confused with the function transformForPortionOfBound.
"if (nodeTag(n) == T_ColumnRef)" part.
---
in create_table.sgml. you also need to add  WITHOUT OVERLAPS related
info into 




Re: Where can I find the doxyfile?

2023-11-06 Thread John Morris
Another update, this time with an abbreviated Doxyfile. Rather than including 
the full Doxyfile, this updated version only includes modified settings. It is 
more compact and more likely to survive across future doxygen versions.


  *   John Morris


doxygen_v2.patch
Description: doxygen_v2.patch


Re: Popcount optimization using AVX512

2023-11-06 Thread Noah Misch
On Mon, Nov 06, 2023 at 09:59:26PM -0600, Nathan Bossart wrote:
> On Mon, Nov 06, 2023 at 07:15:01PM -0800, Noah Misch wrote:
> > On Mon, Nov 06, 2023 at 09:52:58PM -0500, Tom Lane wrote:
> >> Nathan Bossart  writes:
> >> > Like I said, I don't have any proposals yet, but assuming we do want to
> >> > support newer intrinsics, either open-coded or via auto-vectorization, I
> >> > suspect we'll need to gather consensus for a new policy/strategy.
> >> 
> >> Yeah.  The function-pointer solution kind of sucks, because for the
> >> sort of operation we're considering here, adding a call and return
> >> is probably order-of-100% overhead.  Worse, it adds similar overhead
> >> for everyone who doesn't get the benefit of the optimization.
> > 
> > The glibc/gcc "ifunc" mechanism was designed to solve this problem of 
> > choosing
> > a function implementation based on the runtime CPU, without incurring 
> > function
> > pointer overhead.  I would not attempt to use AVX512 on non-glibc systems, 
> > and
> > I would use ifunc to select the desired popcount implementation on glibc:
> > https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html
> 
> Thanks, that seems promising for the function pointer cases.  I'll plan on
> trying to convert one of the existing ones to use it.  BTW it looks like
> LLVM has something similar [0].
> 
> IIUC this unfortunately wouldn't help for cases where we wanted to keep
> stuff inlined, such as is_valid_ascii() and the functions in pg_lfind.h,
> unless we applied it to the calling functions, but that doesn't ѕound
> particularly maintainable.

Agreed, it doesn't solve inline cases.  If the gains are big enough, we should
move toward packages containing N CPU-specialized copies of the postgres
binary, with bin/postgres just exec'ing the right one.




Re: Intermittent failure with t/003_logical_slots.pl test on windows

2023-11-06 Thread Kyotaro Horiguchi
At Mon, 6 Nov 2023 19:42:21 +0530, Nisha Moond  wrote 
in 
> > Appending '2>&1 test:
> > The command still results in NULL and ends up failing as no data is
> > returned. Which means even no error message is returned. The error log

Thanks for confirmation. So, at least the child process was launced
successfully in the cmd.exe's view.

Upon a quick check on my end with Windows' _popen, I have obseved the
following:

- Once a child process is started, it seems to go undetected as an
  error by _popen or subsequent fgets calls if the process ends
  abnormally, with a non-zero exit status or even with a SEGV.

- After the child process has flushed data to stdout, it is possible
  to read from the pipe even if the child process crashes or ends
  thereafter.

- Even if fgets is called before the program starts, it will correctly
  block until the program outputs something. Specifically, when I used
  popen("sleep 5 & target.exe") and immediately performed fgets on the
  pipe, I was able to read the output of target.exe as the first line.

Therefore, based on the information available, it is conceivable that
the child process was killed by something right after it started, or
the program terminated on its own without any error messages.

By the way, in the case of aforementioned SEGV, Application Errors
corresponding to it were identifiable in the Event
Viewer. Additionally, regarding the exit statuses, they can be
captured by using a wrapper batch file (.bat) that records
%ERRORLEVEL% after running the target program. This may yield
insights, aothough its effectiveness is not guaranteed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-06 Thread Zhijie Hou (Fujitsu)
On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> Dear hackers,
> 
> PSA the patch to solve the issue [1].
> 
> Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> generated in the source directory, even when the VPATH/meson build.
> This can avoid by changing the directory explicitly.
> 
> [1]:
> https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Thanks for the patch, I have confirmed that the files won't be generated
in source directory after applying the patch.

After running: "meson test -C build/ --suite pg_upgrade",
The files are in the test directory:
./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh

Best regards,
Hou zj




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-06 Thread Peter Smith
On Tue, Nov 7, 2023 at 3:14 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> PSA the patch to solve the issue [1].
>
> Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> generated in the source directory, even when the VPATH/meson build.
> This can avoid by changing the directory explicitly.
>

Hi Kuroda-san,

Thanks for the patch.

I reproduced the bug, then after applying your patch, I confirmed the
problem is fixed. I used the VPATH build

~~~

BEFORE
t/001_basic.pl .. ok
t/002_pg_upgrade.pl . ok
t/003_logical_slots.pl .. ok
All tests successful.
Files=3, Tests=39, 128 wallclock secs ( 0.05 usr  0.01 sys + 12.90
cusr  7.43 csys = 20.39 CPU)
Result: PASS

OBSERVE THE BUG
Look in the source folder and notice the file that should not be there.

[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/oss_postgres_misc/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls *.sh
delete_old_cluster.sh

~~~

AFTER
# +++ tap check in src/bin/pg_upgrade +++
t/001_basic.pl .. ok
t/002_pg_upgrade.pl . ok
t/003_logical_slots.pl .. ok
All tests successful.
Files=3, Tests=39, 128 wallclock secs ( 0.06 usr  0.01 sys + 13.02
cusr  7.28 csys = 20.37 CPU)
Result: PASS

CONFIRM THE FIX
Check the offending file is no longer in the src folder

[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/oss_postgres_misc/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls *.sh
ls: cannot access *.sh: No such file or directory

Instead, it is found in the VPATH folder
[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/vpath_dir/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls tmp_check/
delete_old_cluster.sh  log  results

==
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-06 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

PSA the patch to solve the issue [1].

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

[1]: 
https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



change_dir.patch
Description: change_dir.patch


Re: Popcount optimization using AVX512

2023-11-06 Thread Nathan Bossart
On Mon, Nov 06, 2023 at 07:15:01PM -0800, Noah Misch wrote:
> On Mon, Nov 06, 2023 at 09:52:58PM -0500, Tom Lane wrote:
>> Nathan Bossart  writes:
>> > Like I said, I don't have any proposals yet, but assuming we do want to
>> > support newer intrinsics, either open-coded or via auto-vectorization, I
>> > suspect we'll need to gather consensus for a new policy/strategy.
>> 
>> Yeah.  The function-pointer solution kind of sucks, because for the
>> sort of operation we're considering here, adding a call and return
>> is probably order-of-100% overhead.  Worse, it adds similar overhead
>> for everyone who doesn't get the benefit of the optimization.
> 
> The glibc/gcc "ifunc" mechanism was designed to solve this problem of choosing
> a function implementation based on the runtime CPU, without incurring function
> pointer overhead.  I would not attempt to use AVX512 on non-glibc systems, and
> I would use ifunc to select the desired popcount implementation on glibc:
> https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html

Thanks, that seems promising for the function pointer cases.  I'll plan on
trying to convert one of the existing ones to use it.  BTW it looks like
LLVM has something similar [0].

IIUC this unfortunately wouldn't help for cases where we wanted to keep
stuff inlined, such as is_valid_ascii() and the functions in pg_lfind.h,
unless we applied it to the calling functions, but that doesn't ѕound
particularly maintainable.

[0] https://llvm.org/docs/LangRef.html#ifuncs

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: UniqueKey v2

2023-11-06 Thread zhihuifan1213


jian he  writes:

> On Fri, Oct 20, 2023 at 4:33 PM  wrote:
>>
>>
>> > i did some simple tests using text data type.
>> >
>> > it works with the primary key, not with unique indexes.
>> > it does not work when the column is unique, not null.
>> >
>> > The following is my test.
>>
>> Can you simplify your test case please? I can't undertand what "doesn't
>> work" mean here and for which case. FWIW, this feature has nothing with
>> the real data, I don't think inserting any data is helpful unless I
>> missed anything.
>
> Sorry for not explaining it very well.
> "make distinct as no-op."
> my understanding: it means: if fewer rows meet the criteria "columnX <
>  const_a;" , after analyze the table, it should use index only scan

No, "mark distinct as no-op" means the distinct node can be discarded
automatically since it is not needed any more. The simplest case would
be "select distinct pk from t", where it should be same as "select pk
from t".  You can check the testcase for the more cases. 

-- 
Best Regards
Andy Fan





Re: Atomic ops for unlogged LSN

2023-11-06 Thread Nathan Bossart
On Tue, Nov 07, 2023 at 12:57:32AM +, John Morris wrote:
> I incorporated your suggestions and added a few more. The changes are
> mainly related to catching potential errors if some basic assumptions
> aren’t met.

Hm.  Could we move that to a separate patch?  We've lived without these
extra checks for a very long time, and I'm not aware of any related issues,
so I'm not sure it's worth the added complexity.  And IMO it'd be better to
keep it separate from the initial atomics conversion, anyway.

> I found the comment about cache coherency a bit confusing. We are dealing
> with a single address, so there should be no memory ordering or coherency
> issues. (Did I misunderstand?) I see it more as a race condition. Rather
> than merely explaining why it shouldn’t happen, the new version verifies
> the assumptions and throws an Assert() if something goes wrong.

I was thinking of the comment for pg_atomic_read_u32() that I cited earlier
[0].  This comment also notes that pg_atomic_read_u32/64() has no barrier
semantics.  My interpretation of that comment is that these functions
provide no guarantee that the value returned is the most up-to-date value.
But my interpretation could be wrong, and maybe this is meant to highlight
that the value might change before we can use the return value in a
compare/exchange or something.

I spent a little time earlier today reviewing the various underlying
implementations, but apparently I need to spend some more time looking at
those...

[0] https://postgr.es/m/20231102034006.GA85609%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Show WAL write and fsync stats in pg_stat_io

2023-11-06 Thread Michael Paquier
On Mon, Nov 06, 2023 at 03:35:01PM +0300, Nazir Bilal Yavuz wrote:
> Results are:
> 
> ╔═╦═══╦╗
> ║ ║  track_wal_io_timing  ║║
> ╠═╬═══╦═══╬╣
> ║  clock  ║   on  ║  off  ║ change ║
> ║ sources ║   ║   ║║
> ╠═╬═══╬═══╬╣
> ║   tsc   ║   ║   ║║
> ║ ║ 514814.459170 ║ 519826.284139 ║   %1   ║
> ╠═╬═══╬═══╬╣
> ║   hpet  ║   ║   ║║
> ║ ║ 132116.272121 ║ 141820.548447 ║   %7   ║
> ╠═╬═══╬═══╬╣
> ║ acpi_pm ║   ║   ║║
> ║ ║ 394793.092255 ║ 403723.874719 ║   %2   ║
> ╚═╩═══╩═══╩╝

Thanks for the tests.  That's indeed noticeable under this load.
Better to keep track_io_timing and track_wal_io_timing as two
separated beasts, at least that's clear.
--
Michael


signature.asc
Description: PGP signature


Re: Popcount optimization using AVX512

2023-11-06 Thread Noah Misch
On Mon, Nov 06, 2023 at 09:52:58PM -0500, Tom Lane wrote:
> Nathan Bossart  writes:
> > Like I said, I don't have any proposals yet, but assuming we do want to
> > support newer intrinsics, either open-coded or via auto-vectorization, I
> > suspect we'll need to gather consensus for a new policy/strategy.
> 
> Yeah.  The function-pointer solution kind of sucks, because for the
> sort of operation we're considering here, adding a call and return
> is probably order-of-100% overhead.  Worse, it adds similar overhead
> for everyone who doesn't get the benefit of the optimization.

The glibc/gcc "ifunc" mechanism was designed to solve this problem of choosing
a function implementation based on the runtime CPU, without incurring function
pointer overhead.  I would not attempt to use AVX512 on non-glibc systems, and
I would use ifunc to select the desired popcount implementation on glibc:
https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html




Re: [PATCH] Small refactoring of inval.c and inval.h

2023-11-06 Thread Michael Paquier
On Mon, Nov 06, 2023 at 01:17:12PM +0300, Aleksander Alekseev wrote:
> Fair enough, here is the corrected patch.

Okay for me, so applied.  Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Popcount optimization using AVX512

2023-11-06 Thread Tom Lane
Nathan Bossart  writes:
> Like I said, I don't have any proposals yet, but assuming we do want to
> support newer intrinsics, either open-coded or via auto-vectorization, I
> suspect we'll need to gather consensus for a new policy/strategy.

Yeah.  The function-pointer solution kind of sucks, because for the
sort of operation we're considering here, adding a call and return
is probably order-of-100% overhead.  Worse, it adds similar overhead
for everyone who doesn't get the benefit of the optimization.  (One
of the key things you want to be able to say, when trying to sell
a maybe-it-helps-or-maybe-it-doesnt optimization to the PG community,
is "it doesn't hurt anyone who's not able to benefit".)  And you
can't argue that that overhead is negligible either, because if it
is then we're all wasting our time even discussing this.  So we need
a better technology, and I fear I have no good ideas about what.

Your comment about vectorization hints at one answer: if you can
amortize the overhead across multiple applications of the operation,
then it doesn't hurt so much.  But I'm not sure how often we can
make that answer work.

regards, tom lane




Re: 2023-11-09 release announcement draft

2023-11-06 Thread Noah Misch
On Mon, Nov 06, 2023 at 05:04:25PM -0500, Jonathan S. Katz wrote:
> The PostgreSQL Global Development Group has released an update to all 
> supported
> versions of PostgreSQL, including 16.1, 15.5, 14.10, 13.13, 12.17, and 11.22
> This release fixes over 55 bugs reported over the last several months.
> 
> This release includes fixes for indexes where in certain cases, we advise
> reindexing. Please see the "Update" section for more details.

s/"Update" section/"Updating" section/ or change section title below.

Delete lines starting here ...

> This is the **final release of PostgreSQL 11**. PostgreSQL 10 will no longer
> receive
> [security and bug fixes](https://www.postgresql.org/support/versioning/).
> If you are running PostgreSQL 10 in a production environment, we suggest that
> you make plans to upgrade.

... to here.  They're redundant with "PostgreSQL 11 EOL Notice" below:

> For the full list of changes, please review the
> [release notes](https://www.postgresql.org/docs/release/).
> 
> PostgreSQL 11 EOL Notice
> 
> 
> **This is the final release of PostgreSQL 11**. PostgreSQL 11 is now 
> end-of-life
> and will no longer receive security and bug fixes. If you are
> running PostgreSQL 11 in a production environment, we suggest that you make
> plans to upgrade to a newer, supported version of PostgreSQL. Please see our
> [versioning policy](https://www.postgresql.org/support/versioning/) for more
> information.




autovectorize page checksum code included elsewhere

2023-11-06 Thread Nathan Bossart
(Unfortunately, I'm posting this too late for the November commitfest, but
I'm hoping this will be the first in a series of proposed improvements
involving SIMD instructions for v17.)

Presently, we ask compilers to autovectorize checksum.c and numeric.c.  The
page checksum code actually lives in checksum_impl.h, and checksum.c just
includes it.  But checksum_impl.h is also used in pg_upgrade/file.c and
pg_checksums.c, and since we don't ask compilers to autovectorize those
files, the page checksum code may remain un-vectorized.

The attached patch is a quick attempt at adding CFLAGS_UNROLL_LOOPS and
CFLAGS_VECTORIZE to the CFLAGS for the aforementioned objects.  The gains
are modest (i.e., some system CPU and/or a few percentage points on the
total time), but it seemed like a no-brainer.

Separately, I'm wondering whether we should consider using CFLAGS_VECTORIZE
on the whole tree.  Commit fdea253 seems to be responsible for introducing
this targeted autovectorization strategy, and AFAICT this was just done to
minimize the impact elsewhere while optimizing page checksums.  Are there
fundamental problems with adding CFLAGS_VECTORIZE everywhere?  Or is it
just waiting on someone to do the analysis/benchmarking?

[0] https://postgr.es/m/1367013190.11576.249.camel%40sussancws0025

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/bin/pg_checksums/Makefile b/src/bin/pg_checksums/Makefile
index ac736b2260..3f946ee9d6 100644
--- a/src/bin/pg_checksums/Makefile
+++ b/src/bin/pg_checksums/Makefile
@@ -22,6 +22,8 @@ OBJS = \
 	$(WIN32RES) \
 	pg_checksums.o
 
+pg_checksums.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} ${CFLAGS_VECTORIZE}
+
 all: pg_checksums
 
 pg_checksums: $(OBJS) | submake-libpgport
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index bde91e2beb..b344b59da2 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -28,6 +28,8 @@ OBJS = \
 	util.o \
 	version.o
 
+file.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} ${CFLAGS_VECTORIZE}
+
 override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 


Re: A recent message added to pg_upgade

2023-11-06 Thread Michael Paquier
On Tue, Nov 07, 2023 at 07:59:46AM +0530, Amit Kapila wrote:
> Do you mean to say that if 'IsBinaryUpgrade' is true then let's not
> allow to launch launcher or apply worker? If so, I guess this won't be
> any better than prohibiting at an early stage or explicitly overriding
> those with internal values and documenting it, at least that way we
> can be consistent for both variables (max_logical_replication_workers
> and max_slot_wal_keep_size).

Yes, I mean to paint an extra IsBinaryUpgrade before registering the
apply worker launcher.  That would be consistent with what we do for
autovacuum in the postmaster.
--
Michael


signature.asc
Description: PGP signature


Re: A recent message added to pg_upgade

2023-11-06 Thread Amit Kapila
On Sun, Nov 5, 2023 at 5:33 AM Michael Paquier  wrote:
>
> On Fri, Nov 03, 2023 at 01:33:26PM +1100, Peter Smith wrote:
> > On Fri, Nov 3, 2023 at 1:11 PM Amit Kapila  wrote:
> >> Now, that Michael also committed another similar change in commit
> >> 7021d3b176, it is better to be consistent in both cases. So, either we
> >
> > I agree. Both patches are setting a special GUC value at the command
> > line, and both of them don't want the user to somehow override that.
> > Since the requirements are the same, I felt the implementations
> > (regardless if they use a guc hook or something else) should also be
> > done the same way. Yesterday I posted a review comment on the other
> > thread [1] (#2c) trying to express the same point about consistency.
>
> Yeah, I certainly agree about consistency in the implementation for
> both sides of the coin.
>
> Nevertheless, I'm still +-0 on the GUC hook addition as I am wondering
> if there could be a case where one would be interested in enforcing
> the state of the GUCs anyway, and we'd prevent entirely that.  Another
> thing that we can do for max_logical_replication_workers, rather than
> a GUC hook, is to add a check on IsBinaryUpgrade in
> ApplyLauncherRegister().
>

Do you mean to say that if 'IsBinaryUpgrade' is true then let's not
allow to launch launcher or apply worker? If so, I guess this won't be
any better than prohibiting at an early stage or explicitly overriding
those with internal values and documenting it, at least that way we
can be consistent for both variables (max_logical_replication_workers
and max_slot_wal_keep_size).

-- 
With Regards,
Amit Kapila.




Re: Popcount optimization using AVX512

2023-11-06 Thread Nathan Bossart
On Fri, Nov 03, 2023 at 12:16:05PM +0100, Matthias van de Meent wrote:
> On Thu, 2 Nov 2023 at 15:22, Amonson, Paul D  wrote:
>> This proposal showcases the speed-up provided to popcount feature when
>> using AVX512 registers. The intent is to share the preliminary results
>> with the community and get feedback for adding avx512 support for
>> popcount.
>>
>> Revisiting the previous discussion/improvements around this feature, I
>> have created a micro-benchmark based on the pg_popcount() in
>> PostgreSQL's current implementations for x86_64 using the newer AVX512
>> intrinsics. Playing with this implementation has improved performance up
>> to 46% on Intel's Sapphire Rapids platform on AWS. Such gains will
>> benefit scenarios relying on popcount.

Nice.  I've been testing out AVX2 support in src/include/port/simd.h, and
the results look promising there, too.  I intend to start a new thread for
that (hopefully soon), but one open question I don't have a great answer
for yet is how to detect support for newer intrinsics.  So far, we've been
able to use function pointers (e.g., popcount, crc32c) or deduce support
via common predefined compiler macros (e.g., we assume SSE2 is supported if
the compiler is targeting 64-bit x86).  But the former introduces a
performance penalty, and we probably want to inline most of this stuff,
anyway.  And the latter limits us to stuff that has been around for a
decade or two.

Like I said, I don't have any proposals yet, but assuming we do want to
support newer intrinsics, either open-coded or via auto-vectorization, I
suspect we'll need to gather consensus for a new policy/strategy.

> Apart from the two type functions bytea_bit_count and bit_bit_count
> (which are not accessed in postgres' own systems, but which could want
> to cover bytestreams of >BLCKSZ) the only popcount usages I could find
> were on objects that fit on a page, i.e. <8KiB in size. How does
> performance compare for bitstreams of such sizes, especially after any
> CPU clock implications are taken into account?

Yeah, the previous optimizations in this area appear to have used ANALYZE
as the benchmark, presumably because of visibilitymap_count().  I briefly
attempted to measure the difference with and without AVX512 support, but I
haven't noticed any difference thus far.  One complication for
visiblitymap_count() is that the data passed to pg_popcount64() is masked,
which requires a couple more intructions when you're using the intrinsics.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-11-06 Thread Amit Kapila
On Tue, Nov 7, 2023 at 3:56 AM David Rowley  wrote:
>
> On Thu, 2 Nov 2023 at 22:42, Amit Kapila  wrote:
> > The other two look good to me.
>
> Thanks for looking.
>
> I spent some time trying to see if the performance changes much with
> either of these cases. For the XLogWalRcvProcessMsg() I was unable to
> measure any difference even when replaying inserts into a table with a
> single int4 column and no indexes.  I think that change is worthwhile
> regardless as it allows us to get rid of a global variable. I was
> tempted to shorten the name of that variable a bit since it's now
> local, but didn't as it causes a bit more churn.
>
> For the apply_spooled_messages() change, I tried logical decoding but
> quickly saw apply_spooled_messages() isn't the normal case.  I didn't
> quite find a test case that caused the changes to be serialized to a
> file, but I do see that the number of bytes can be large so thought
> that it's worthwhile saving the memcpy for that case.
>

Yeah, and another reason is that the usage of StringInfo becomes
consistent with LogicalRepApplyLoop(). One can always configure the
lower value of logical_decoding_work_mem or use
debug_logical_replication_streaming for a smaller number of changes to
follow that code path. But I am not sure how much practically it will
help because we are anyway reading file to apply the changes.

-- 
With Regards,
Amit Kapila.




Re: Atomic ops for unlogged LSN

2023-11-06 Thread John Morris
I incorporated your suggestions and added a few more. The changes are mainly 
related to catching potential errors if some basic assumptions aren’t met.

There are basically 3 assumptions. Stating them as conditions we want to avoid.

  *   We should not get an unlogged LSN before reading the control file.
  *   We should not get an unlogged LSN when shutting down.
  *   The unlogged LSN written out during a checkpoint shouldn’t be used.

Your suggestion addressed the first problem, and it took only minor changes to 
address the other two.

The essential idea is, we set a value of zero in each of the 3 situations. Then 
we throw an Assert() If somebody tries to allocate an unlogged LSN with the 
value zero.

I found the comment about cache coherency a bit confusing. We are dealing with 
a single address, so there should be no memory ordering or coherency issues. 
(Did I misunderstand?) I see it more as a race condition. Rather than merely 
explaining why it shouldn’t happen, the new version verifies the assumptions 
and throws an Assert() if something goes wrong.



atomic_lsn_v5.patch
Description: atomic_lsn_v5.patch


Re: 2023-11-09 release announcement draft

2023-11-06 Thread Jesper Pedersen

Hi,

On 11/6/23 17:04, Jonathan S. Katz wrote:
Attached is the release announcement draft for the 2023-11-09 release 
(16.1 et al.).


Please review for accuracy and notable omissions. Please have all 
feedback in by 2023-11-09 08:00 UTC at the latest (albeit the sooner 
the better).



s/PostgreSQL 10/PostgreSQL 11/g


Best regards,

 Jesper






Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2023-11-06 Thread Peter Geoghegan
On Mon, Nov 6, 2023 at 1:28 PM Matthias van de Meent
 wrote:
> I'm planning on reviewing this patch tomorrow, but in an initial scan
> through the patch I noticed there's little information about how the
> array keys state machine works in this new design. Do you have a more
> toplevel description of the full state machine used in the new design?

This is an excellent question. You're entirely right: there isn't
enough information about the design of the state machine.

In v1 of the patch, from all the way back in July, the "state machine"
advanced in the hackiest way possible: via repeated "incremental"
advancement (using logic from the function that we call
_bt_advance_array_keys() on HEAD) in a loop -- we just kept doing that
until the function I'm now calling _bt_tuple_before_array_skeys()
eventually reported that the array keys were now sufficiently
advanced. v2 greatly improved matters by totally overhauling
_bt_advance_array_keys(): it was taught to use binary searches to
advance the array keys, with limited remaining use of "incremental"
array key advancement.

However, version 2 (and all later versions to date) have somewhat
wonky state machine transitions, in one important respect: calls to
the new _bt_advance_array_keys() won't always advance the array keys
to the maximum extent possible (possible while still getting correct
behavior, that is). There were still various complicated scenarios
involving multiple "required" array keys (SK_BT_REQFWD + SK_BT_REQBKWD
scan keys that use BTEqualStrategyNumber), where one single call to
_bt_advance_array_keys() would advance the array keys to a point that
was still < caller's tuple. AFAICT this didn't cause wrong answers to
queries (that would require failing to find a set of exactly matching
array keys where a matching set exists), but it was kludgey. It was
sloppy in roughly the same way as the approach in my v1 prototype was
sloppy (just to a lesser degree).

I should be able to post v6 later this week. My current plan is to
commit the other nbtree patch first (the backwards scan "boundary
cases" one from the ongoing CF) -- since I saw your review earlier
today. I think that you should probably wait for this v6 before
starting your review. The upcoming version will have simple
preconditions and postconditions for the function that advances the
array key state machine (the new _bt_advance_array_keys). These are
enforced by assertions at the start and end of the function. So the
rules for the state machine become crystal clear and fairly easy to
keep in your head (e.g., tuple must be >= required array keys on entry
and <= required array keys on exit, the array keys must always either
advance by one increment or be completely exhausted for the top-level
scan in the current scan direction).

Unsurprisingly, I found that adding and enforcing these invariants led
to a simpler and more general design within _bt_advance_array_keys.
That code is still the most complicated part of the patch, but it's
much less of a bag of tricks. Another reason for you to hold off for a
few more days.

-- 
Peter Geoghegan




Re: Fix search_path for all maintenance commands

2023-11-06 Thread Isaac Morland
On Mon, 6 Nov 2023 at 15:54, Tom Lane  wrote:

> Isaac Morland  writes:
> > I still think the right default is that CREATE FUNCTION stores the
> > search_path in effect when it runs with the function, and that is the
> > search_path used to run the function (and don't "BEGIN ATOMIC" functions
> > partially work this way already?).
>
> I don't see how that would possibly fly.  Yeah, that behavior is
> often what you want, but not always; we would break some peoples'
> applications with that rule.
>

The behaviour I want is just “SET search_path FROM CURRENT".

I agree there is a backward compatibility issue; if somebody has a schema
creation/update script with function definitions with no "SET search_path"
they would suddenly start getting the search_path from definition time
rather than the caller's search_path.

I don't like adding GUCs but a single one specifying whether no search_path
specification means "FROM CURRENT" or the current behaviour (new explicit
syntax "FROM CALLER"?) would I think address the backward compatibility
issue. This would allow a script to specify at the top which convention it
is using; a typical old script could be adapted to a new database by adding
a single line at the top to get the old behaviour.

Also, one place where it's clearly NOT what you want is while
> restoring a pg_dump script.  And we don't have any way that we could
> bootstrap ourselves out of breaking everything for everybody during
> their next upgrade --- even if you insist that people use a newer
> pg_dump, where is it going to find the info in an existing database?
>

A function with a stored search_path will have a "SET search_path" clause
in the pg_dump output, so for these functions pg_dump would be unaffected
by my preferred way of doing things. Already I don't believe pg_dump ever
puts "SET search_path FROM CURRENT" in its output; it puts the actual
search_path. A bigger problem is with existing functions that use the
caller's search_path; these would need to specify "FROM CALLER" explicitly;
but the new GUC could come into this. In effect a pg_dump created by an old
version is an old script which would need the appropriate setting at the
top.

But all this is premature if there is still disagreement on the proper
default behaviour. To me it is absolutely clear that the right default, in
the absence of an installed base with backward compatibility concerns, is
"SET search_path FROM CURRENT". This is how substantially all programming
languages work: it is quite unusual in modern programming languages to have
the meaning of a procedure definition depend on which modules the caller
has imported. The tricky bit is dealing smoothly with the installed base.
But some of the discussion here makes me think that people have a different
attitude about stored procedures.


Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-11-06 Thread David Rowley
On Thu, 2 Nov 2023 at 22:42, Amit Kapila  wrote:
> The other two look good to me.

Thanks for looking.

I spent some time trying to see if the performance changes much with
either of these cases. For the XLogWalRcvProcessMsg() I was unable to
measure any difference even when replaying inserts into a table with a
single int4 column and no indexes.  I think that change is worthwhile
regardless as it allows us to get rid of a global variable. I was
tempted to shorten the name of that variable a bit since it's now
local, but didn't as it causes a bit more churn.

For the apply_spooled_messages() change, I tried logical decoding but
quickly saw apply_spooled_messages() isn't the normal case.  I didn't
quite find a test case that caused the changes to be serialized to a
file, but I do see that the number of bytes can be large so thought
that it's worthwhile saving the memcpy for that case.

I pushed those two changes.

David




2023-11-09 release announcement draft

2023-11-06 Thread Jonathan S. Katz

Hi,

Attached is the release announcement draft for the 2023-11-09 release 
(16.1 et al.).


Please review for accuracy and notable omissions. Please have all 
feedback in by 2023-11-09 08:00 UTC at the latest (albeit the sooner the 
better).


Thanks,

Jonathan
The PostgreSQL Global Development Group has released an update to all supported
versions of PostgreSQL, including 16.1, 15.5, 14.10, 13.13, 12.17, and 11.22
This release fixes over 55 bugs reported over the last several months.

This release includes fixes for indexes where in certain cases, we advise
reindexing. Please see the "Update" section for more details.

This is the **final release of PostgreSQL 11**. PostgreSQL 10 will no longer
receive
[security and bug fixes](https://www.postgresql.org/support/versioning/).
If you are running PostgreSQL 10 in a production environment, we suggest that
you make plans to upgrade.

For the full list of changes, please review the
[release notes](https://www.postgresql.org/docs/release/).

PostgreSQL 11 EOL Notice


**This is the final release of PostgreSQL 11**. PostgreSQL 11 is now end-of-life
and will no longer receive security and bug fixes. If you are
running PostgreSQL 11 in a production environment, we suggest that you make
plans to upgrade to a newer, supported version of PostgreSQL. Please see our
[versioning policy](https://www.postgresql.org/support/versioning/) for more
information.

Bug Fixes and Improvements
--
 
This update fixes over 55 bugs that were reported in the last several months.
The issues listed below affect PostgreSQL 16. Some of these issues may also
affect other supported versions of PostgreSQL.

* Fix issue where GiST indexes had an incorrect behavior during a "page split"
operation that could lead to incorrect results in subsequent index searches.
Please [reindex](https://www.postgresql.org/docs/current/sql-reindex.html) GiST
indexes after installing this update.
* Fix issue where B-tree indexes would incorrectly de-duplicate `interval`
columns. Please 
[reindex](https://www.postgresql.org/docs/current/sql-reindex.html)
any B-tree index that includes an `interval` column after installing this
update.
* Provide more efficient indexing of `date`, `timestamptz`, and `timestamp`
values in BRIN indexes. While not required, we recommend
[reindexing](https://www.postgresql.org/docs/current/sql-reindex.html) BRIN
indexes that include these data types after installing this update.
* Fix for bulk table insertion into partitioned tables.
* Fix for hash-partitioned tables with multiple partition keys during step
generation and runtime pruning that could lead to crashes in some cases.
* Throw the correct error if 
[`pgrowlocks()`](https://www.postgresql.org/docs/current/pgrowlocks.html) is 
applied to a partitioned table

* Fix inconsistent rechecking of concurrently-updated rows during
[`MERGE`](https://www.postgresql.org/docs/current/sql-merge.html) when using
[`READ 
COMMITTED`](https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED)
mode.
* Correctly identify the target table in an inherited `UPDATE`/`DELETE`/`MERGE`
even when the parent table is excluded by constraints.
* Fix over-allocation of a constructed 
[`tsvector`](https://www.postgresql.org/docs/current/datatype-textsearch.html#DATATYPE-TSVECTOR).
* Fix [`ALTER 
SUBSCRIPTION`](https://www.postgresql.org/docs/current/sql-altersubscription.html)
to apply changes in the `run_as_owner` option.
* Several fixes for [`COPY 
FROM`](https://www.postgresql.org/docs/current/sql-copy.html),
* Several fixes for handling torn reads with 
[`pg_control`](https://www.postgresql.org/docs/current/wal-internals.html).
* Fix "could not find pathkey item to sort" errors occurring while planning
aggregate functions with `ORDER BY` or `DISTINCT` options.
* When 
[`track_io_timing`](https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-IO-TIMING)
is enabled, include the time taken by relation extension operations as write
time.
* Track the dependencies of cached 
[`CALL`](https://www.postgresql.org/docs/current/sql-call.html)
statements, and re-plan them when needed.
* Treat out-of-memory failures as `FATAL` while reading WAL.
* Fix [`pg_dump`](https://www.postgresql.org/docs/current/app-pgdump.html) to
dump the new `run_as_owner` option of subscriptions.
* Fix [`pg_restore`](https://www.postgresql.org/docs/current/app-pgrestore.html)
so that selective restores will include both table-level and column-level ACLs
for selected tables.
* Add logic to 
[`pg_upgrade`](https://www.postgresql.org/docs/current/pgupgrade.html)
to check for use of obsolete data types `abstime`, `reltime`, and `tinterval`.
* Fix [`vacuumdb`](https://www.postgresql.org/docs/current/app-vacuumdb.html)
to have multiple `-N` switches actually exclude tables in multiple schemas.
* [`amcheck`](https://www.postgresql.org/docs/current/amcheck.html)
will no longer report interrupted page d

Re: Explicitly skip TAP tests under Meson if disabled

2023-11-06 Thread Andres Freund
Hi,

On 2023-11-06 17:46:23 +0100, Peter Eisentraut wrote:
> On 04.11.23 01:51, Andres Freund wrote:
> > I'd just use a single test() invocation here, and add an argument to 
> > testwrap
> > indicating that it should print out the skipped message. That way we a) 
> > don't
> > need two test() invocations, b) could still see the test name etc in the 
> > test
> > invocation.
> 
> Is testwrap only meant to be used with the tap protocol mode of meson's
> test()?  Otherwise, this skip option would have produce different output for
> different protocols.

Since Daniel added tap support to pg_regress it's only used with tap. If we
add something else, we can add a format parameter?

Greetings,

Andres Freund




Re: Add recovery to pg_control and remove backup_label

2023-11-06 Thread David Steele

On 11/6/23 02:35, Michael Paquier wrote:

On Sun, Nov 05, 2023 at 01:45:39PM -0400, David Steele wrote:

Rebased on 151ffcf6.


I like this patch a lot.  Even if the backup_label file is removed, we
still have all the debug information from the backup history file,
thanks to its LABEL, BACKUP METHOD and BACKUP FROM, so no information
is lost.  It does a 1:1 replacement of the contents parsed from the
backup_label needed by recovery by fetching them from the control
file.  Sounds like a straight-forward change to me.


That's the plan, at least!


The patch is failing the recovery test 039_end_of_wal.pl.  Could you
look at the failure?


I'm not seeing this failure, and CI seems happy [1]. Can you give 
details of the error message?



  /* Build and save the contents of the backup history file */
-history_file = build_backup_content(state, true);
+history_file = build_backup_content(state);

build_backup_content() sounds like an incorrect name if it is a
routine onlyused to build the contents of backup history files.


Good point, I have renamed this to build_backup_history_content().


Why is there nothing updated in src/bin/pg_controldata/?


Oops, added.


+/* Clear fields used to initialize recovery */
+ControlFile->backupCheckPoint = InvalidXLogRecPtr;
+ControlFile->backupStartPointTLI = 0;
+ControlFile->backupRecoveryRequired = false;
+ControlFile->backupFromStandby = false;

These variables in the control file are cleaned up when the
backup_label file was read previously, but backup_label is renamed to
backup_label.old a bit later than that.  Your logic looks correct seen
from here, but shouldn't these variables be set much later, aka just
*after* UpdateControlFile().  This gap between the initialization of
the control file and the in-memory reset makes the code quite brittle,
IMO.


If we set these fields where backup_label was renamed, the logic would 
not be exactly the same since pg_control won't be updated until the next 
time through the loop. Since the fields should be updated before 
UpdateControlFile() I thought it made sense to keep all the updates 
together.


Overall I think it is simpler, and we don't need to acquire a lock on 
ControlFile.



-   basebackup_progress_wait_wal_archive(&state);
-   do_pg_backup_stop(backup_state, !opt->nowait);

Why is that moved?


do_pg_backup_stop() generates the updated pg_control so it needs to run 
before we transmit pg_control.



-The backup label
-file includes the label string you gave to 
pg_backup_start,
-as well as the time at which pg_backup_start was run, 
and
-the name of the starting WAL file.  In case of confusion it is therefore
-possible to look inside a backup file and determine exactly which
-backup session the dump file came from.  The tablespace map file includes
+The tablespace map file includes

It may be worth mentioning that the backup history file holds this
information on the primary's pg_wal, as well.


OK, reworded.


The changes in sendFileWithContent() may be worth a patch of its own.


Thomas included this change in his pg_basebackup changes so I did the 
same. Maybe wait a bit before we split this out? Seems like a pretty 
small change...



--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -146,6 +146,9 @@ typedef struct ControlFileData
@@ -160,14 +163,25 @@ typedef struct ControlFileData
  XLogRecPtrminRecoveryPoint;
  TimeLineIDminRecoveryPointTLI;
+XLogRecPtrbackupCheckPoint;
  XLogRecPtrbackupStartPoint;
+TimeLineIDbackupStartPointTLI;
  XLogRecPtrbackupEndPoint;
+bool backupRecoveryRequired;
+bool backupFromStandby;

This increases the size of the control file from 296B to 312B with an
8-byte alignment, as far as I can see.  The size of the control file
has been always a sensitive subject especially with the hard limit of
PG_CONTROL_MAX_SAFE_SIZE.  Well, the point of this patch is that this
is the price to pay to prevent users from doing something stupid with
a removal of a backup_label when they should not.  Do others have an
opinion about this increase in size?

Actually, grouping backupStartPointTLI and minRecoveryPointTLI should
reduce more the size with some alignment magic, no?


I thought about this, but it seemed to me that existing fields had been 
positioned to make the grouping logical rather than to optimize 
alignment, e.g. minRecoveryPointTLI. Ideally that would have been placed 
near backupEndRequired (or vice versa). But if the general opinion is to 
rearrange for alignment, I'm OK with that.



backupRecoveryRequired in the control file is switched to false for
pg_rewind and true for streamed backups.  My gut feeling is telling me
that this should be OK, as out-of-core tools would need an upgrade if
they relied on the backend_label file anyway.  I can see that this
change makes use

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2023-11-06 Thread Matthias van de Meent
On Sat, 21 Oct 2023 at 00:40, Peter Geoghegan  wrote:
>
> On Sun, Oct 15, 2023 at 1:50 PM Peter Geoghegan  wrote:
> > Attached is v4, which applies cleanly on top of HEAD. This was needed
> > due to Alexandar Korotkov's commit e0b1ee17, "Skip checking of scan
> > keys required for directional scan in B-tree".
> >
> > Unfortunately I have more or less dealt with the conflicts on HEAD by
> > disabling the optimization from that commit, for the time being.
>
> Attached is v5, which deals with the conflict with the optimization
> added by Alexandar Korotkov's commit e0b1ee17 sensibly: the
> optimization is now only disabled in cases without array scan keys.
> (It'd be very hard to make it work with array scan keys, since an
> important principle for my patch is that we can change search-type
> scan keys right in the middle of any _bt_readpage() call).

I'm planning on reviewing this patch tomorrow, but in an initial scan
through the patch I noticed there's little information about how the
array keys state machine works in this new design. Do you have a more
toplevel description of the full state machine used in the new design?
If not, I'll probably be able to discover my own understanding of the
mechanism used in the patch, but if there is a framework to build that
understanding on (rather than having to build it from scratch) that'd
be greatly appreciated.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-06 Thread Alexander Korotkov
On Mon, Nov 6, 2023 at 4:38 PM Aleksander Alekseev 
wrote:
> > > PFE the corrected patchset v58.
> >
> > I'd like to revive this thread.
>
> Many thanks for your comments and suggestions.
>
> > I think it worth adding asserts here to verify there is no overflow
making us mapping different segments into the same files.
>
> Sorry, I didn't understand this one. Maybe you could provide the exact
code?

I actually meant this.

static int  inline
SlruFileName(SlruCtl ctl, char *path, int64 segno)
{
if (ctl->long_segment_names)
{
/*
 * We could use 16 characters here but the disadvantage would be
that
 * the SLRU segments will be hard to distinguish from WAL segments.
 *
 * For this reason we use 15 characters. It is enough but also means
 * that in the future we can't decrease SLRU_PAGES_PER_SEGMENT
easily.
 */
Assert(segno >= 0 && segno <= 0x1000);
return snprintf(path, MAXPGPATH, "%s/%015llX", ctl->Dir,
(long long) segno);
}
else
{
Assert(segno >= 0 && segno <= 0x1);
return snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir,
(unsigned int) segno);
}
}

As I now get, snprintf() wouldn't just truncate the high signs, instead it
will use more characters.  But I think assertions are useful anyway.

--
Regards,
Alexander Korotkov


Re: Fix search_path for all maintenance commands

2023-11-06 Thread Tom Lane
Isaac Morland  writes:
> I still think the right default is that CREATE FUNCTION stores the
> search_path in effect when it runs with the function, and that is the
> search_path used to run the function (and don't "BEGIN ATOMIC" functions
> partially work this way already?).

I don't see how that would possibly fly.  Yeah, that behavior is
often what you want, but not always; we would break some peoples'
applications with that rule.

Also, one place where it's clearly NOT what you want is while
restoring a pg_dump script.  And we don't have any way that we could
bootstrap ourselves out of breaking everything for everybody during
their next upgrade --- even if you insist that people use a newer
pg_dump, where is it going to find the info in an existing database?

regards, tom lane




Re: Version 14/15 documentation Section "Alter Default Privileges"

2023-11-06 Thread Laurenz Albe
On Mon, 2023-11-06 at 10:55 -0500, Bruce Momjian wrote:
> Okay, I think I have good wording for this.  I didn't like the wording
> of other roles, so I restructured that in the attached patch too.

> 
> !Default privileges apply only to the active role;  the default
> !privileges of member roles have no affect on object permissions.
> !SET ROLE can be used to change the active user and
> !apply their default privileges.
> !   

You don't mean member roles, but roles that the active role is a member of,
right?

How do you like my version, as attached?

Yours,
Laurenz Albe
From eb251f74ee10eff5cbd30ca9ee038a01b6f3 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 6 Nov 2023 21:44:23 +0100
Subject: [PATCH] Improve ALTER DEFAULT PRIVILEGES documentation

Rewrite the documentation to emphasize two aspects that were
previously missing, which frequently confused users:

- you cannot inherit altered default privileges

- you cannot alter the default privileges for any creating
  role by omitting FOR ROLE

Author: Bruce Momjian, Laurenz Albe
Reviewed-by: Michael Banck
Discussion: https://postgr.es/m/LV2PR12MB5725F7C1B8EB2FC38829F276E7399%40LV2PR12MB5725.namprd12.prod.outlook.com
---
 .../sgml/ref/alter_default_privileges.sgml| 52 ---
 1 file changed, 33 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/ref/alter_default_privileges.sgml b/doc/src/sgml/ref/alter_default_privileges.sgml
index f1d54f5aa3..7fab707a0d 100644
--- a/doc/src/sgml/ref/alter_default_privileges.sgml
+++ b/doc/src/sgml/ref/alter_default_privileges.sgml
@@ -90,34 +90,47 @@ REVOKE [ GRANT OPTION FOR ]
   
ALTER DEFAULT PRIVILEGES allows you to set the privileges
that will be applied to objects created in the future.  (It does not
-   affect privileges assigned to already-existing objects.)  Currently,
-   only the privileges for schemas, tables (including views and foreign
-   tables), sequences, functions, and types (including domains) can be
-   altered.  For this command, functions include aggregates and procedures.
-   The words FUNCTIONS and ROUTINES are
-   equivalent in this command.  (ROUTINES is preferred
-   going forward as the standard term for functions and procedures taken
-   together.  In earlier PostgreSQL releases, only the
-   word FUNCTIONS was allowed.  It is not possible to set
-   default privileges for functions and procedures separately.)
-  
-
-  
-   You can change default privileges only for objects that will be created by
-   yourself or by roles that you are a member of.  The privileges can be set
-   globally (i.e., for all objects created in the current database),
-   or just for objects created in specified schemas.
+   affect privileges assigned to already-existing objects.)  Privileges can be
+   set globally (i.e., for all objects created in the current database), or
+   just for objects created in specified schemas.
   
 
   
As explained in ,
the default privileges for any object type normally grant all grantable
permissions to the object owner, and may grant some privileges to
PUBLIC as well.  However, this behavior can be changed by
altering the global default privileges with
ALTER DEFAULT PRIVILEGES.
   
 
+  
+   As a non-superuser, you can change default privileges only on objects created
+   by yourself or by roles that you are a member of.  However, you don't inherit
+   altered default privileges from roles you are a member of; objects you create
+   will receive the default privileges for your current role.
+  
+
+  
+   There is no way to change the default privileges for objects created by
+   arbitrary roles.  You have run ALTER DEFAULT PRIVILEGES
+   for any role that can create objects whose default privileges should be
+   modified.
+  
+
+  
+   Currently,
+   only the privileges for schemas, tables (including views and foreign
+   tables), sequences, functions, and types (including domains) can be
+   altered.  For this command, functions include aggregates and procedures.
+   The words FUNCTIONS and ROUTINES are
+   equivalent in this command.  (ROUTINES is preferred
+   going forward as the standard term for functions and procedures taken
+   together.  In earlier PostgreSQL releases, only the
+   word FUNCTIONS was allowed.  It is not possible to set
+   default privileges for functions and procedures separately.)
+  
+
   
Default privileges that are specified per-schema are added to whatever
the global default privileges are for the particular object type.
@@ -136,8 +149,9 @@ REVOKE [ GRANT OPTION FOR ]
 target_role
 
  
-  The name of an existing role of which the current role is a member.
-  If FOR ROLE is omitted, the current role is assumed.
+  Default privileges are changed for objects created by the
+  target_role, or the current
+  role if unspecified.
  
 

-- 
2.41.0



Re: Atomic ops for unlogged LSN

2023-11-06 Thread Nathan Bossart
On Wed, Nov 01, 2023 at 10:40:06PM -0500, Nathan Bossart wrote:
> Since this isn't a tremendously performance-sensitive area, IMHO we should
> code defensively to eliminate any doubts about correctness and to make it
> easier to reason about.

Concretely, like this.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b541be8eec..1068f96c2d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -469,9 +469,8 @@ typedef struct XLogCtlData
 
 	XLogSegNo	lastRemovedSegNo;	/* latest removed/recycled XLOG segment */
 
-	/* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
-	XLogRecPtr	unloggedLSN;
-	slock_t		ulsn_lck;
+	/* Fake LSN counter, for unlogged relations. */
+	pg_atomic_uint64 unloggedLSN;
 
 	/* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
 	pg_time_t	lastSegSwitchTime;
@@ -4294,14 +4293,7 @@ DataChecksumsEnabled(void)
 XLogRecPtr
 GetFakeLSNForUnloggedRel(void)
 {
-	XLogRecPtr	nextUnloggedLSN;
-
-	/* increment the unloggedLSN counter, need SpinLock */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	nextUnloggedLSN = XLogCtl->unloggedLSN++;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
-
-	return nextUnloggedLSN;
+	return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
 }
 
 /*
@@ -4714,7 +4706,7 @@ XLOGShmemInit(void)
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
-	SpinLockInit(&XLogCtl->ulsn_lck);
+	pg_atomic_init_u64(&XLogCtl->unloggedLSN, 0);
 }
 
 /*
@@ -5319,9 +5311,9 @@ StartupXLOG(void)
 	 * the unlogged LSN counter can be reset too.
 	 */
 	if (ControlFile->state == DB_SHUTDOWNED)
-		XLogCtl->unloggedLSN = ControlFile->unloggedLSN;
+		pg_atomic_write_u64(&XLogCtl->unloggedLSN, ControlFile->unloggedLSN);
 	else
-		XLogCtl->unloggedLSN = FirstNormalUnloggedLSN;
+		pg_atomic_write_u64(&XLogCtl->unloggedLSN, FirstNormalUnloggedLSN);
 
 	/*
 	 * Copy any missing timeline history files between 'now' and the recovery
@@ -6902,10 +6894,12 @@ CreateCheckPoint(int flags)
 	 * Persist unloggedLSN value. It's reset on crash recovery, so this goes
 	 * unused on non-shutdown checkpoints, but seems useful to store it always
 	 * for debugging purposes.
+	 *
+	 * pg_atomic_read_u64() isn't guaranteed to return the most up-to-date
+	 * value, so this is implemented via a compare/exchange with 0 to ensure
+	 * the necessary cache coherency interactions.
 	 */
-	SpinLockAcquire(&XLogCtl->ulsn_lck);
-	ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
-	SpinLockRelease(&XLogCtl->ulsn_lck);
+	pg_atomic_compare_exchange_u64(&XLogCtl->unloggedLSN, &ControlFile->unloggedLSN, 0);
 
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);


Re: Fix search_path for all maintenance commands

2023-11-06 Thread Isaac Morland
On Thu, 2 Nov 2023 at 14:22, Jeff Davis  wrote:

> On Tue, 2023-10-31 at 13:16 -0400, Isaac Morland wrote:
>
> > Perhaps the search_path for running a maintenance command should be
> > the search_path set for the table owner (ALTER ROLE … SET search_path
> > …)?
>
> That's an interesting idea; I hadn't considered that, or at least not
> very deeply. I feel like it came up before but I can't remember what
> (if anything) was wrong with it.
>
> If we expanded this idea a bit, I could imagine it applying to SECURITY
> DEFINER functions as well, and that would make writing SECURITY DEFINER
> functions a lot safer.
>

I still think the right default is that CREATE FUNCTION stores the
search_path in effect when it runs with the function, and that is the
search_path used to run the function (and don't "BEGIN ATOMIC" functions
partially work this way already?). But I suggest the owner search_path as
an option which is clearly better than using the caller's search_path in
most cases.

I think the problems are essentially the same for security invoker vs.
security definer. The difference is that the problems are security problems
only for security definers.

>  After that, change search_path on function invocation as usual
> > rather than having special rules for what happens when a function is
> > invoked during a maintenance command.
>
> I don't follow what you mean here.
>

I’m referring to the idea that the search_path during function execution
should be determined at function creation time (or, at least, not at
function execution time). While this is a security requirement for security
definer functions, I think it’s what is wanted about 99.9% of the time
for security invoker functions as well. So when the maintenance command
ends up running a function, the search_path in effect during the function
execution will be the one established at function definition time; or if we
go with this "search_path associated with function owner" idea, then again
the search_path is determined by the usual rule (function owner), rather
than by any special rules associated with maintenance commands.


Re: Wrong security context for deferred triggers?

2023-11-06 Thread Laurenz Albe
On Mon, 2023-11-06 at 18:29 +0100, Tomas Vondra wrote:
> On 11/6/23 14:23, Laurenz Albe wrote:
> > This behavior looks buggy to me.  What do you think?
> > I cannot imagine that it is a security problem, though.
> 
> How could code getting executed under the wrong role not be a security
> issue? Also, does this affect just the role, or are there some other
> settings that may unexpectedly change (e.g. search_path)?

Perhaps it is a security issue, and I am just lacking imagination.

Yes, changes to "search_path" should also have an effect.

Yours,
Laurenz Albe




Re: Adding facility for injection points (or probe points?) for more advanced tests

2023-11-06 Thread Nazir Bilal Yavuz
Hi,

On Wed, 25 Oct 2023 at 07:13, Michael Paquier  wrote:
>
> Hi all,
>
> I don't remember how many times in the last few years when I've had to
> hack the backend to produce a test case that involves a weird race
> condition across multiple processes running in the backend, to be able
> to prove a point or just test a fix (one recent case: 2b8e5273e949).
> Usually, I come to hardcoding stuff for the following situations:
> - Trigger a PANIC, to force recovery.
> - A FATAL, to take down a session, or just an ERROR.
> - palloc() failure injection.
> - Sleep to slow down a code path.
> - Pause and release with condition variable.

I liked the idea; thanks for working on this!

What do you think about creating a function for updating the already
created injection point's callback or name (mostly callback)? For now,
you need to drop and recreate the injection point to change the
callback or the name.

Here is my code correctness review:

diff --git a/meson_options.txt b/meson_options.txt
+option('injection_points', type: 'boolean', value: true,
+  description: 'Enable injection points')
+

It is enabled by default while building with meson.


diff --git a/src/backend/utils/misc/injection_point.c
b/src/backend/utils/misc/injection_point.c
+LWLockRelease(InjectionPointLock);
+
+/* If not found, do nothing? */
+if (!found)
+return;

It would be good to log a warning message here.


I tried to compile that with -Dwerror=true -Dinjection_points=false
and got some errors (warnings):

injection_point.c: In function ‘InjectionPointShmemSize’:
injection_point.c:59:1: error: control reaches end of non-void
function [-Werror=return-type]

injection_point.c: At top level:
injection_point.c:32:14: error: ‘InjectionPointHashByName’ defined but
not used [-Werror=unused-variable]

test_injection_points.c: In function ‘test_injection_points_run’:
test_injection_points.c:69:21: error: unused variable ‘name’
[-Werror=unused-variable]


The test_injection_points test runs and passes although I set
-Dinjection_points=false. That could be misleading, IMO the test
should be skipped if Postgres is not compiled with the injection
points.

Regards,
Nazir Bilal Yavuz
Microsoft




Re: CRC32C Parallel Computation Optimization on ARM

2023-11-06 Thread Nathan Bossart
On Fri, Nov 03, 2023 at 10:46:57AM +, Xiang Gao wrote:
> On Date: Thu, 2 Nov 2023 09:35:50AM -0500, Nathan Bossart wrote:
>> The idea is that we don't want to start forcing runtime checks on builds
>> where we aren't already doing runtime checks.  IOW if the compiler can use
>> the ARMv8 CRC instructions with the default compiler flags, we should only
>> use vmull_p64() if it can also be used with the default compiler flags.
>
> This is the newest patch, I think the code for which crc algorithm to
> choose is a bit complicated. Maybe we can just use USE_ARMV8_VMULL only,
> and do runtime checks on the vmull_p64 instruction at all times. This
> will not affect the existing builds, because this is a new instruction
> and new logic. In addition, it  can also reduce the complexity of the
> code.

I don't think we can.  AFAICT a runtime check necessitates a function
pointer or a branch, both of which incurred an impact on performance in my
tests.  It looks like this latest patch still does the runtime check even
for the USE_ARMV8_CRC32C case.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add bump memory context type and use it for tuplesorts

2023-11-06 Thread Matthias van de Meent
On Tue, 11 Jul 2023 at 01:51, David Rowley  wrote:
>
> On Tue, 27 Jun 2023 at 21:19, David Rowley  wrote:
> > I've attached the bump allocator patch and also the script I used to
> > gather the performance results in the first 2 tabs in the attached
> > spreadsheet.
>
> I've attached a v2 patch which changes the BumpContext a little to
> remove some of the fields that are not really required.  There was no
> need for the "keeper" field as the keeper block always comes at the
> end of the BumpContext as these are allocated in a single malloc().
> The pointer to the "block" also isn't really needed. This is always
> the same as the head element in the blocks dlist.

Neat idea, +1.

I think it would make sense to split the "add a bump allocator"
changes from the "use the bump allocator in tuplesort" patches.

Tangent: Do we have specific notes on worst-case memory usage of
memory contexts with various allocation patterns? This new bump
allocator seems to be quite efficient, but in a worst-case allocation
pattern it can still waste about 1/3 of its allocated memory due to
never using free space on previous blocks after an allocation didn't
fit on that block.
It probably isn't going to be a huge problem in general, but this
seems like something that could be documented as a potential problem
when you're looking for which allocator to use and compare it with
other allocators that handle different allocation sizes more
gracefully.

> +++ b/src/backend/utils/mmgr/bump.c
> +BumpBlockIsEmpty(BumpBlock *block)
> +{
> +/* it's empty if the freeptr has not moved */
> +return (block->freeptr == (char *) block + Bump_BLOCKHDRSZ);
> [...]
> +static inline void
> +BumpBlockMarkEmpty(BumpBlock *block)
> +{
> +#if defined(USE_VALGRIND) || defined(CLOBBER_FREED_MEMORY)
> +char   *datastart = ((char *) block) + Bump_BLOCKHDRSZ;

These two use different definitions of the start pointer. Is that deliberate?

> +++ b/src/include/utils/tuplesort.h
> @@ -109,7 +109,8 @@ typedef struct TuplesortInstrumentation
>  * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
>  * which is a separate palloc chunk --- we assume it is just one chunk and
>  * can be freed by a simple pfree() (except during merge, when we use a
> - * simple slab allocator).  SortTuples also contain the tuple's first key
> + * simple slab allocator and when performing a non-bounded sort where we
> + * use a bump allocator).  SortTuples also contain the tuple's first key

I'd go with something like the following:

+ * ...(except during merge *where* we use a
+ * simple slab allocator, and during a non-bounded sort where we
+ * use a bump allocator).

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: pgsql: jit: Support opaque pointers in LLVM 16.

2023-11-06 Thread Alvaro Herrera
On 2023-Oct-18, Thomas Munro wrote:

> jit: Support opaque pointers in LLVM 16.
> 
> Remove use of LLVMGetElementType() and provide the type of all pointers
> to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM
> versions[1].
> 
>  * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions.

I have LLVM 14 (whatever Debian ships[*]), and running headerscheck results
in a bunch of warnings from this:

In file included from /tmp/headerscheck.s89Gdv/test.c:2:
/pgsql/source/master/src/include/jit/llvmjit_emit.h: In function ‘l_call’:
/pgsql/source/master/src/include/jit/llvmjit_emit.h:141:9: warning: 
‘LLVMBuildCall’ is deprecated [-Wdeprecated-declarations]
  141 | return LLVMBuildCall(b, fn, args, nargs, name);
  | ^~
In file included from /usr/include/llvm-c/Core.h:18,
 from /pgsql/source/master/src/include/jit/llvmjit_emit.h:18:
/usr/include/llvm-c/Core.h:3991:1: note: declared here
 3991 | LLVM_ATTRIBUTE_C_DEPRECATED(
  | ^~~


These warnings go away if I change the conditional from
LLVM_VERSION_MAJOR < 16 to 14.

Let's ... do that?  As in the attached patch.

In 13, there's a comment about it being deprecated, but no macro to make
the compiler whine:
https://github.com/hdoc/llvm-project/blob/release/13.x/llvm/include/llvm-c/Core.h#L3953

This changed in 14:
https://github.com/hdoc/llvm-project/blob/release/14.x/llvm/include/llvm-c/Core.h#L3898


[*] apt policy llvm:
llvm:
  Installed: 1:14.0-55.7~deb12u1
  Candidate: 1:14.0-55.7~deb12u1
  Version table:
 *** 1:14.0-55.7~deb12u1 500
500 http://ftp.de.debian.org/debian bookworm/main amd64 Packages
100 /var/lib/dpkg/status

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From d151e2aa327e865f7c6f68dd0493b52ae1736f82 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 6 Nov 2023 19:08:39 +0100
Subject: [PATCH] use non-deprecated LLVM functions starting with its 14, not
 16

---
 src/include/jit/llvmjit_emit.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/include/jit/llvmjit_emit.h b/src/include/jit/llvmjit_emit.h
index b1f0ea56c0..e140da2152 100644
--- a/src/include/jit/llvmjit_emit.h
+++ b/src/include/jit/llvmjit_emit.h
@@ -107,7 +107,7 @@ l_pbool_const(bool i)
 static inline LLVMValueRef
 l_struct_gep(LLVMBuilderRef b, LLVMTypeRef t, LLVMValueRef v, int32 idx, const char *name)
 {
-#if LLVM_VERSION_MAJOR < 16
+#if LLVM_VERSION_MAJOR < 14
 	return LLVMBuildStructGEP(b, v, idx, "");
 #else
 	return LLVMBuildStructGEP2(b, t, v, idx, "");
@@ -117,7 +117,7 @@ l_struct_gep(LLVMBuilderRef b, LLVMTypeRef t, LLVMValueRef v, int32 idx, const c
 static inline LLVMValueRef
 l_gep(LLVMBuilderRef b, LLVMTypeRef t, LLVMValueRef v, LLVMValueRef *indices, int32 nindices, const char *name)
 {
-#if LLVM_VERSION_MAJOR < 16
+#if LLVM_VERSION_MAJOR < 14
 	return LLVMBuildGEP(b, v, indices, nindices, name);
 #else
 	return LLVMBuildGEP2(b, t, v, indices, nindices, name);
@@ -127,7 +127,7 @@ l_gep(LLVMBuilderRef b, LLVMTypeRef t, LLVMValueRef v, LLVMValueRef *indices, in
 static inline LLVMValueRef
 l_load(LLVMBuilderRef b, LLVMTypeRef t, LLVMValueRef v, const char *name)
 {
-#if LLVM_VERSION_MAJOR < 16
+#if LLVM_VERSION_MAJOR < 14
 	return LLVMBuildLoad(b, v, name);
 #else
 	return LLVMBuildLoad2(b, t, v, name);
@@ -137,7 +137,7 @@ l_load(LLVMBuilderRef b, LLVMTypeRef t, LLVMValueRef v, const char *name)
 static inline LLVMValueRef
 l_call(LLVMBuilderRef b, LLVMTypeRef t, LLVMValueRef fn, LLVMValueRef *args, int32 nargs, const char *name)
 {
-#if LLVM_VERSION_MAJOR < 16
+#if LLVM_VERSION_MAJOR < 14
 	return LLVMBuildCall(b, fn, args, nargs, name);
 #else
 	return LLVMBuildCall2(b, t, fn, args, nargs, name);
-- 
2.39.2



Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-11-06 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2023-10-31 17:11:26 +, John Morris wrote:
> > Postgres memory reservations come from multiple sources.
> > 
> >   *   Malloc calls made by the Postgres memory allocators.
> >   *   Static shared memory created by the postmaster at server startup,
> >   *   Dynamic shared memory created by the backends.
> >   *   A fixed amount (1Mb) of “initial” memory reserved whenever a process 
> > starts up.
> > 
> > Each process also maintains an accurate count of its actual memory
> > allocations. The process-private variable “my_memory” holds the total
> > allocations for that process. Since there can be no contention, each process
> > updates its own counters very efficiently.
> 
> I think this will introduce measurable overhead in low concurrency cases and
> very substantial overhead / contention when there is a decent amount of
> concurrency.  This makes all memory allocations > 1MB contend on a single
> atomic.  Massive amount of energy have been spent writing multi-threaded
> allocators that have far less contention than this - the current state is to
> never contend on shared resources on any reasonably common path. This gives
> away one of the few major advantages our process model has away.

We could certainly adjust the size of each reservation to reduce the
frequency of having to hit the atomic.  Specific suggestions about how
to benchmark and see the regression that's being worried about here
would be great.  Certainly my hope has generally been that when we do a
larger allocation, it's because we're about to go do a bunch of work,
meaning that hopefully the time spent updating the atomic is minor
overall.

> The patch doesn't just introduce contention when limiting is enabled - it
> introduces it even when memory usage is just tracked. It makes absolutely no
> sense to have a single contended atomic in that case - just have a per-backend
> variable in shared memory that's updated. It's *WAY* cheaper to compute the
> overall memory usage during querying than to keep a running total in shared
> memory.

Agreed that we should avoid the contention when limiting isn't being
used, certainly easy to do so, and had actually intended to but that
seems to have gotten lost along the way.  Will fix.

Other than that change inside update_global_reservation though, the code
for reporting per-backend memory usage and querying it does work as
you're outlining above inside the stats system.

That said- I just want to confirm that you would agree that querying the
amount of memory used by every backend, to add it all up to enforce an
overall limit, surely isn't something we're going to want to do during
an allocation and that having a global atomic for that is better, right?

> > Pgstat now includes global memory counters. These shared memory counters
> > represent the sum of all reservations made by all Postgres processes. For
> > efficiency, these global counters are only updated when new reservations
> > exceed a threshold, currently 1 Mb for each process. Consequently, the
> > global reservation counters are approximate totals which may differ from the
> > actual allocation totals by up to 1 Mb per process.
> 
> I see that you added them to the "cumulative" stats system - that doesn't
> immediately makes sense to me - what you're tracking here isn't an
> accumulating counter, it's something showing the current state, right?

Yes, this is current state, not an accumulation.

> > The max_total_memory limit is checked whenever the global counters are
> > updated. There is no special error handling if a memory allocation exceeds
> > the global limit. That allocation returns a NULL for malloc style
> > allocations or an ENOMEM for shared memory allocations. Postgres has
> > existing mechanisms for dealing with out of memory conditions.
> 
> I still think it's extremely unwise to do tracking of memory and limiting of
> memory in one patch.  You should work towards and acceptable patch that just
> tracks memory usage in an as simple and low overhead way as possible. Then we
> later can build on that.

Frankly, while tracking is interesting, the limiting is the feature
that's needed more urgently imv.  We could possibly split it up but
there's an awful lot of the same code that would need to be changed and
that seems less than ideal.  Still, we'll look into this.

> > For sanity checking, pgstat now includes the pg_backend_memory_allocation
> > view showing memory allocations made by the backend process. This view
> > includes a scan of the top memory context, so it compares memory allocations
> > reported through pgstat with actual allocations. The two should match.
> 
> Can't you just do this using the existing pg_backend_memory_contexts view?

Not and get a number that you can compare to the local backend number
due to the query itself happening and performing allocations and
creating new contexts.  We wanted to be able to show that we are
accounting correctl

Re: Wrong security context for deferred triggers?

2023-11-06 Thread Tomas Vondra
On 11/6/23 14:23, Laurenz Albe wrote:
> ...
> 
> This behavior looks buggy to me.  What do you think?
> I cannot imagine that it is a security problem, though.
> 

How could code getting executed under the wrong role not be a security
issue? Also, does this affect just the role, or are there some other
settings that may unexpectedly change (e.g. search_path)?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Wrong security context for deferred triggers?

2023-11-06 Thread Isaac Morland
On Mon, 6 Nov 2023 at 11:58, Laurenz Albe  wrote:

> Become a superuser again and commit:
> >
> >  RESET ROLE;
> >
> >  COMMIT;
> >  NOTICE:  current_user = postgres
> >
> >
> > So a deferred constraint trigger does not run with the same security
> context
> > as an immediate trigger.  This is somewhat nasty in combination with
> > SECURITY DEFINER functions: if that function performs an operation, and
> that
> > operation triggers a deferred trigger, that trigger will run in the wrong
> > security context.
> >
> > This behavior looks buggy to me.  What do you think?
> > I cannot imagine that it is a security problem, though.
>

This looks to me like another reason that triggers should run as the
trigger owner. Which role owns the trigger won’t change as a result of
constraints being deferred or not, or any role setting done during the
transaction, including that relating to security definer functions.

Right now triggers can’t do anything that those who can
INSERT/UPDATE/DELETE (i.e., cause the trigger to fire) can’t do, which in
particular breaks the almost canonical example of using triggers to log
changes — I can’t do it without also allowing users to make spurious log
entries.

Also if I cause a trigger to fire, I’ve just given the trigger owner the
opportunity to run arbitrary code as me.


> I just realized one problem with running a deferred constraint trigger as
> the triggering role: that role might have been dropped by the time the
> trigger
> executes.  But then we could still error out.
>

This problem is also fixed by running triggers as their owners: there
should be a dependency between an object and its owner. So the
trigger-executing role can’t be dropped without dropping the trigger.


Re: psql not responding to SIGINT upon db reconnection

2023-11-06 Thread Tristan Partin

On Thu Nov 2, 2023 at 4:03 AM CDT, Shlok Kyal wrote:

Hi,

> That sounds like a much better solution. Attached you will find a v4
> that implements your suggestion. Please let me know if there is
> something that I missed. I can confirm that the patch works.
>
> $ ./build/src/bin/psql/psql -h pg.neon.tech
> NOTICE:  Welcome to Neon!
> Authenticate by visiting:
> https://console.neon.tech/psql_session/xxx
>
>
> NOTICE:  Connecting to database.
> psql (17devel, server 15.3)
> SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, 
compression: off)
> Type "help" for help.
>
> tristan957=> \c
> NOTICE:  Welcome to Neon!
> Authenticate by visiting:
> https://console.neon.tech/psql_session/yyy
>
>
> ^Cconnection to server at "pg.neon.tech" (3.18.6.96), port 5432 
failed:
> Previous connection kept
> tristan957=>

I went through CFbot and found out that some tests are timing out.
Links:
https://cirrus-ci.com/task/6735437444677632
https://cirrus-ci.com/task/4536414189125632
https://cirrus-ci.com/task/5046587584413696
https://cirrus-ci.com/task/6172487491256320
https://cirrus-ci.com/task/5609537537835008

Some tests which are timing out are as follows:
[00:48:49.243] Summary of Failures:
[00:48:49.243]
[00:48:49.243] 4/270 postgresql:regress / regress/regress TIMEOUT 1000.01s
[00:48:49.243] 6/270 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
TIMEOUT 1000.02s
[00:48:49.243] 34/270 postgresql:recovery /
recovery/027_stream_regress TIMEOUT 1000.02s
[00:48:49.243] 48/270 postgresql:plpgsql / plpgsql/regress TIMEOUT 1000.02s
[00:48:49.243] 49/270 postgresql:plperl / plperl/regress TIMEOUT 1000.01s
[00:48:49.243] 61/270 postgresql:dblink / dblink/regress TIMEOUT 1000.03s
[00:48:49.243] 89/270 postgresql:postgres_fdw / postgres_fdw/regress
TIMEOUT 1000.01s
[00:48:49.243] 93/270 postgresql:test_decoding / test_decoding/regress
TIMEOUT 1000.02s
[00:48:49.243] 110/270 postgresql:test_extensions /
test_extensions/regress TIMEOUT 1000.03s
[00:48:49.243] 123/270 postgresql:unsafe_tests / unsafe_tests/regress
TIMEOUT 1000.02s
[00:48:49.243] 152/270 postgresql:pg_dump / pg_dump/010_dump_connstr
TIMEOUT 1000.03s

Just want to make sure you are aware of the issue.


Hi Shlok!

Thanks for taking a look. I will check these failures out to see if 
I can reproduce.


--
Tristan Partin
Neon (https://neon.tech)




Re: PoC: prefetching index leaf pages (for inserts)

2023-11-06 Thread Tomas Vondra
Seems cfbot was not entirely happy about the patch, for two reasons:

1) enable_insert_prefetching definition was inconsistent (different
boot/default values, missing in .conf and so on)

2) stupid bug in execReplication, inserting index entries twice

The attached v3 should fix all of that, I believe.


As for the path forward, I think the prefetching is demonstrably
beneficial. There are cases where it can't help or even harms
performance. I think the success depends on three areas:

(a) reducing the costs of the prefetching - For example right now we
build the index tuples twice (once for prefetch, once for the insert),
but maybe there's a way to do that only once? There are also predicate
indexes, and so on.

(b) being smarter about when to prefetch - For example if we only have
one "prefetchable" index, it's somewhat pointless to prefetch (for
single-row cases). And so on.

(c) not prefetching when already cached - This is somewhat related to
the previous case, but perhaps it'd be cheaper to first check if the
data is already cached. For shared buffers it should not be difficult,
for page cache we could use preadv2 with RWF_NOWAIT flag. The question
is if this is cheap enough to be cheaper than just doing posix_fadvise
(which however only deals with shared buffers).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 5b2c18986422f5dda3cd0dfebd16e22443051d92 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 6 Nov 2023 17:42:48 +0100
Subject: [PATCH] v3

---
 src/backend/access/brin/brin.c|   1 +
 src/backend/access/gin/ginutil.c  |   1 +
 src/backend/access/gist/gist.c|   1 +
 src/backend/access/hash/hash.c|   1 +
 src/backend/access/index/indexam.c|  22 
 src/backend/access/nbtree/nbtinsert.c |  43 +++
 src/backend/access/nbtree/nbtree.c|  21 
 src/backend/access/nbtree/nbtsearch.c |  95 
 src/backend/access/spgist/spgutils.c  |   1 +
 src/backend/commands/copyfrom.c   |  21 
 src/backend/executor/execIndexing.c   | 106 ++
 src/backend/executor/execReplication.c|  12 ++
 src/backend/executor/nodeModifyTable.c|  22 
 src/backend/utils/misc/guc_tables.c   |  10 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h|   8 ++
 src/include/access/genam.h|   5 +
 src/include/access/nbtree.h   |   8 ++
 src/include/executor/executor.h   |   6 +
 src/include/optimizer/cost.h  |   1 +
 src/test/regress/expected/sysviews.out|   3 +-
 21 files changed, 388 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 25338a90e29..3b9c22847a8 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -117,6 +117,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = brinbuild;
 	amroutine->ambuildempty = brinbuildempty;
 	amroutine->aminsert = brininsert;
+	amroutine->amprefetch = NULL;
 	amroutine->ambulkdelete = brinbulkdelete;
 	amroutine->amvacuumcleanup = brinvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 7a4cd93f301..666d58a750f 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -64,6 +64,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = ginbuild;
 	amroutine->ambuildempty = ginbuildempty;
 	amroutine->aminsert = gininsert;
+	amroutine->amprefetch = NULL;
 	amroutine->ambulkdelete = ginbulkdelete;
 	amroutine->amvacuumcleanup = ginvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 8ef5fa03290..3ed72cce448 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -86,6 +86,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = gistbuild;
 	amroutine->ambuildempty = gistbuildempty;
 	amroutine->aminsert = gistinsert;
+	amroutine->amprefetch = NULL;
 	amroutine->ambulkdelete = gistbulkdelete;
 	amroutine->amvacuumcleanup = gistvacuumcleanup;
 	amroutine->amcanreturn = gistcanreturn;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 7a025f33cfe..4f92fb4e115 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -83,6 +83,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->ambuild = hashbuild;
 	amroutine->ambuildempty = hashbuildempty;
 	amroutine->aminsert = hashinsert;
+	amroutine->amprefetch = NULL;
 	amroutine->ambulkdelete = hashbulkdelete;
 	amroutine->amvacuumcleanup = hashvacuumcleanup;
 	amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index b25b03f7abc..fac126da421 100644
--- a/src/ba

Re: Regression on pg_restore to 16.0: DOMAIN not available to SQL function

2023-11-06 Thread Tom Lane
Mark Hills  writes:
> On Fri, 3 Nov 2023, Tom Lane wrote:
>> Right, so the 910eb61b2 fix explains it.  I guess I'd better
>> expand the release note entry, because we'd not foreseen this
>> particular failure mode.

> Indeed, and curiosity got the better of me so I constructed a minimal test 
> case (see below)

I checked this against 16 branch tip (about to become 16.1),
and it works, as expected.

> I assumed I'd need at least one row of data to trigger the bug (to call on 
> a default), but that's not the case and here it is with an empty table.

Right.  The step 16.0 is taking that fails is not evaluating the
default expression, but merely prepping it for execution during COPY
startup.  This error occurs while trying to inline the inline-able
outer() function.

We worked around this by skipping the expression prep step when it's
clear from the COPY arguments that we won't need it, which should
be true for all of pg_dump's uses of COPY.

regards, tom lane




Re: Wrong security context for deferred triggers?

2023-11-06 Thread Laurenz Albe
On Mon, 2023-11-06 at 14:23 +0100, Laurenz Albe wrote:
>  CREATE FUNCTION trig() RETURNS trigger
> LANGUAGE plpgsql AS
>  $$BEGIN
> RAISE NOTICE 'current_user = %', current_user;
> RETURN NEW;
>  END;$$;
> 
>  CREATE CONSTRAINT TRIGGER trig AFTER INSERT ON tab
> DEFERRABLE INITIALLY IMMEDIATE
> FOR EACH ROW EXECUTE FUNCTION trig();
> 
>  SET ROLE duff;
> 
>  BEGIN;
> 
>  INSERT INTO tab VALUES (1);
>  NOTICE:  current_user = duff
> 
> That looks ok; the current user is "duff".
> 
>  SET CONSTRAINTS ALL DEFERRED;
> 
>  INSERT INTO tab VALUES (2);
> 
> Become a superuser again and commit:
> 
>  RESET ROLE;
> 
>  COMMIT;
>  NOTICE:  current_user = postgres
> 
> 
> So a deferred constraint trigger does not run with the same security context
> as an immediate trigger.  This is somewhat nasty in combination with
> SECURITY DEFINER functions: if that function performs an operation, and that
> operation triggers a deferred trigger, that trigger will run in the wrong
> security context.
> 
> This behavior looks buggy to me.  What do you think?
> I cannot imagine that it is a security problem, though.

I just realized one problem with running a deferred constraint trigger as
the triggering role: that role might have been dropped by the time the trigger
executes.  But then we could still error out.

Yours,
Laurenz Albe




Re: Explicitly skip TAP tests under Meson if disabled

2023-11-06 Thread Peter Eisentraut

On 04.11.23 01:51, Andres Freund wrote:

I'd just use a single test() invocation here, and add an argument to testwrap
indicating that it should print out the skipped message. That way we a) don't
need two test() invocations, b) could still see the test name etc in the test
invocation.


Is testwrap only meant to be used with the tap protocol mode of meson's 
test()?  Otherwise, this skip option would have produce different output 
for different protocols.






Re: Regression on pg_restore to 16.0: DOMAIN not available to SQL function

2023-11-06 Thread Mark Hills
On Fri, 3 Nov 2023, Tom Lane wrote:

> Mark Hills  writes:
> > On Fri, 3 Nov 2023, Tom Lane wrote:
> >> However, then it's not clear why it would've worked
> >> in 15.4 which does the same thing.  I wonder whether you are
> >> using this function in a column default for the troublesome
> >> table.
> 
> > Yes, it's just a simple DEFAULT:
> 
> >   CREATE TABLE authentic (
> >key hash NOT NULL UNIQUE DEFAULT gen_hash(32),
> 
> > and so every row would have a value.
> 
> Right, so the 910eb61b2 fix explains it.  I guess I'd better
> expand the release note entry, because we'd not foreseen this
> particular failure mode.

Indeed, and curiosity got the better of me so I constructed a minimal test 
case (see below)

This minimal test demonstrates a database which will pg_dump but cannot 
restore (custom with pg_restore, or plain SQL with psql.)

I assumed I'd need at least one row of data to trigger the bug (to call on 
a default), but that's not the case and here it is with an empty table.

I then tested REL_16_STABLE branch (e24daa94b) the problem does not occur, 
as expected.

Also, the stable branch version was able to restore the pg_dump from 16.0 
release, which is as expected and is probably important (and helpful)

Thanks

-- 
Mark


==> test.sql <==
CREATE FUNCTION inner()
RETURNS integer AS
$$
SELECT 1;
$$ LANGUAGE SQL;

CREATE FUNCTION outer()
RETURNS integer AS
$$
SELECT inner();
$$ LANGUAGE SQL;

CREATE TABLE test (
v integer NOT NULL DEFAULT outer()
);


$ createdb test
$ psql test < test.sql


$ pg_dump --format custom --file test.pgdump test
$ createdb restore
$ pg_restore --dbname restore test.pgdump
pg_restore: error: could not execute query: ERROR:  function inner() does not 
exist
LINE 2: SELECT inner();
   ^
HINT:  No function matches the given name and argument types. You might 
need to add explicit type casts.
QUERY:
SELECT inner();

CONTEXT:  SQL function "outer" during inlining
Command was: COPY public.test (v) FROM stdin;
pg_restore: warning: errors ignored on restore: 1


$ pg_dump --format plain --file test.pgdump test
$ createdb restore
$ psql restore < test.pgdump
SET
SET
SET
SET
SET
 set_config


(1 row)

SET
SET
SET
SET
CREATE FUNCTION
ALTER FUNCTION
CREATE FUNCTION
ALTER FUNCTION
SET
SET
CREATE TABLE
ALTER TABLE
ERROR:  function inner() does not exist
LINE 2: SELECT inner();
   ^
HINT:  No function matches the given name and argument types. You might 
need to add explicit type casts.
QUERY:
SELECT inner();

CONTEXT:  SQL function "outer" during inlining
invalid command \.




Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread David Christensen
On Fri, Nov 3, 2023 at 9:53 PM Andres Freund  wrote:

> On 2023-11-02 19:32:28 -0700, Andres Freund wrote:
> > > From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> > > From: David Christensen 
> > > Date: Fri, 29 Sep 2023 15:16:00 -0400
> > > Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
> > >
> > > When using an encrypted cluster, we need to ensure that the WAL is also
> > > encrypted. While we could go with an page-based approach, we use
> instead a
> > > per-record approach, using GCM for the encryption method and storing
> the AuthTag
> > > in the xl_crc field.
>
> What was the reason for this decision?
>

This was mainly to prevent IV reuse by using a per-record encryption rather
than per-page, since partial writes out on the WAL buffer would result in
reuse there.  This was somewhat of an experiment since authenticated data
per record was basically equivalent in function to the CRC.

There was a switch here so normal clusters use the crc field with the
existing CRC implementation, only encrypted clusters use this alternate
approach.


Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread David Christensen
Hi, thanks for the detailed feedback here.

I do think it's worth addressing the question Stephen raised as far as what
we use for the IV[1]; whether LSN or something else entirely, and if so
what.  The choice of LSN here is fairly fundamental to the existing
implementation, so if we decide to do something different some of this
might be moot.

Best,

David

[1]
https://www.mail-archive.com/pgsql-hackers@lists.postgresql.org/msg154613.html


Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread Bruce Momjian
On Mon, Nov  6, 2023 at 09:56:37AM -0500, Stephen Frost wrote:
> The gist is, without a suggestion of things to try, we're left
> to our own devices to try and figure out things which might be
> successful, only to have those turned down too when we come back with
> them, see [1] for what feels like an example of this.  Given your
> feedback overall, which I'm very thankful for, I'm hopeful that you see
> that this is, indeed, a useful feature that people are asking for and
> therefore are willing to spend some time on it, but if the feedback is
> that nothing on the page is acceptable to use for the nonce, we can't
> put the nonce somewhere else, and we can't change the page format, then
> everything else is just moving deck chairs around on the titanic that
> has been this effort.

Yeah, I know the feeling, though I thought XTS was immune enough to
nonce/LSN reuse that it was acceptable.

What got me sunk on the feature was the complexity of adding temporary
file encryption support and that tipped the scales in the negative for
me in community value of the feature vs. added complexity. (Yeah, I used
a Titanic reference in the last sentence. ;-) )  However, I am open to
the community value and complexity values changing over time.  My blog
post on the topic:

https://momjian.us/main/blogs/pgblog/2023.html#October_19_2023

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: apply pragma system_header to python headers

2023-11-06 Thread Tom Lane
Andrew Dunstan  writes:
> On 2023-11-06 Mo 09:57, Tom Lane wrote:
>> +1 for the concept --- I was just noticing yesterday that my buildfarm
>> warning scraping script is turning up some of these.  However, we ought
>> to try to minimize the amount of our own code that is subject to the
>> pragma.  So I think a prerequisite ought to be to get this out of
>> plpython.h:
>> 
>> /*
>> * Used throughout, so it's easier to just include it everywhere.
>> */
>> #include "plpy_util.h"
>> 
>> Alternatively, is there a way to reverse the effect of the
>> pragma after we've included what we need?

> There's "GCC diagnostic push" and "GCC diagnostic pop" but I don't know 
> if they apply to "GCC system_header". Instead of using "GCC 
> system_header" we could just ignore the warnings we're seeing. e.g. "GCC 
> diagnostic ignored \"-Wdeclaration-after-statement\""

Probably a better way is to invent a separate header "plpython_system.h"
that just includes the Python headers, to scope the pragma precisely.
(I guess it could have the fixup #defines we're wrapping those headers
in, too.)

The same idea would work in plperl.

regards, tom lane




Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread Bruce Momjian
On Thu, Nov  2, 2023 at 07:32:28PM -0700, Andres Freund wrote:
> On 2023-10-31 16:23:17 -0500, David Christensen wrote:
> > +Implementation
> > +--
> > +
> > +To enable cluster file encryption, the initdb option
> > +--cluster-key-command must be used, which specifies a command to
> > +retrieve the KEK.
> 
> FWIW, I think "cluster file encryption" is somewhat ambiguous. That could also
> mean encrypting on the file system level or such.

We could call it:

*  cluster data file encryption
*  cluster data encryption
*  database cluster encryption

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Version 14/15 documentation Section "Alter Default Privileges"

2023-11-06 Thread Bruce Momjian
On Mon, Nov  6, 2023 at 09:44:14AM +0100, Laurenz Albe wrote:
> On Sat, 2023-11-04 at 21:14 -0400, Bruce Momjian wrote:
> > > It is not the role that is modified.  Perhaps:
> > > 
> > >    [...]; if omitted, the current role is used.
> > 
> > Sure, attached.  Here is the issue I have though, we are really not
> > changing default privileges for objects created in the future, we are
> > changing the role _now_ so future objects will have different default
> > privileges, right?  I think wording like the above is kind of odd.
> 
> I see what you mean.  The alternative is to be precise, at the risk of
> repeating ourselves:
> 
>   if omitted, default privileges will be changed for objects created by
>   the current role.

Okay, I think I have good wording for this.  I didn't like the wording
of other roles, so I restructured that in the attached patch too.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/ref/alter_default_privileges.sgml b/doc/src/sgml/ref/alter_default_privileges.sgml
new file mode 100644
index 8a60061..a868779
*** a/doc/src/sgml/ref/alter_default_privileges.sgml
--- b/doc/src/sgml/ref/alter_default_privileges.sgml
*** REVOKE [ GRANT OPTION FOR ]
*** 90,112 

 ALTER DEFAULT PRIVILEGES allows you to set the privileges
 that will be applied to objects created in the future.  (It does not
!affect privileges assigned to already-existing objects.)  Currently,
!only the privileges for schemas, tables (including views and foreign
!tables), sequences, functions, and types (including domains) can be
!altered.  For this command, functions include aggregates and procedures.
!The words FUNCTIONS and ROUTINES are
!equivalent in this command.  (ROUTINES is preferred
!going forward as the standard term for functions and procedures taken
!together.  In earlier PostgreSQL releases, only the
!word FUNCTIONS was allowed.  It is not possible to set
!default privileges for functions and procedures separately.)

  

!You can change default privileges only for objects that will be created by
!yourself or by roles that you are a member of.  The privileges can be set
!globally (i.e., for all objects created in the current database),
!or just for objects created in specified schemas.

  

--- 90,113 

 ALTER DEFAULT PRIVILEGES allows you to set the privileges
 that will be applied to objects created in the future.  (It does not
!affect privileges assigned to already-existing objects.)   Privileges can be
!set globally (i.e., for all objects created in the current database), or
!just for objects created in specified schemas.

  

!Default privileges apply only to the active role;  the default
!privileges of member roles have no affect on object permissions.
!SET ROLE can be used to change the active user and
!apply their default privileges.
!   
! 
!   
!As a non-superuser, you can change your own default privileges and
!the defauls of roles that you are a member of.  There is no way to
!set default privileges for a role and all its members with a single
!command;  individual ALTER DEFAULT PRIVILEGES
!commands must be run to achieve this.

  

*** REVOKE [ GRANT OPTION FOR ]
*** 119,124 
--- 120,138 

  

+Currently,
+only the privileges for schemas, tables (including views and foreign
+tables), sequences, functions, and types (including domains) can be
+altered.  For this command, functions include aggregates and procedures.
+The words FUNCTIONS and ROUTINES are
+equivalent in this command.  (ROUTINES is preferred
+going forward as the standard term for functions and procedures taken
+together.  In earlier PostgreSQL releases, only the
+word FUNCTIONS was allowed.  It is not possible to set
+default privileges for functions and procedures separately.)
+   
+ 
+   
 Default privileges that are specified per-schema are added to whatever
 the global default privileges are for the particular object type.
 This means you cannot revoke privileges per-schema if they are granted
*** REVOKE [ GRANT OPTION FOR ]
*** 136,147 
  target_role
  
   
!   The name of an existing role of which the current role is a member.
!   Default access privileges are not inherited, so member roles
!   must use SET ROLE to access these privileges,
!   or ALTER DEFAULT PRIVILEGES must be run for
!   each member role.  If FOR ROLE is omitted,
!   the current role is assumed.
   
  
 
--- 150,158 
  target_role
  
   
!   Change default privileges for objects created by the
!   target_role, or the current
!   role if unspecified.
   
  
 


Re: apply pragma system_header to python headers

2023-11-06 Thread Andrew Dunstan



On 2023-11-06 Mo 09:57, Tom Lane wrote:

Peter Eisentraut  writes:

Analogous to 388e80132c (which was for Perl) but for Python, I propose
adding #pragma GCC system_header to plpython.h.  Without it, you get
tons of warnings about -Wdeclaration-after-statement, starting with
Python 3.12.  (In the past, I have regularly sent feedback to Python to
fix their header files, but this is getting old, and we have an easier
solution now.)

+1 for the concept --- I was just noticing yesterday that my buildfarm
warning scraping script is turning up some of these.  However, we ought
to try to minimize the amount of our own code that is subject to the
pragma.  So I think a prerequisite ought to be to get this out of
plpython.h:

/*
  * Used throughout, so it's easier to just include it everywhere.
  */
#include "plpy_util.h"

Alternatively, is there a way to reverse the effect of the
pragma after we've included what we need?



There's "GCC diagnostic push" and "GCC diagnostic pop" but I don't know 
if they apply to "GCC system_header". Instead of using "GCC 
system_header" we could just ignore the warnings we're seeing. e.g. "GCC 
diagnostic ignored \"-Wdeclaration-after-statement\""



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Version 14/15 documentation Section "Alter Default Privileges"

2023-11-06 Thread Bruce Momjian
On Mon, Nov  6, 2023 at 09:32:27AM +0100, Michael Banck wrote:
> Hi,
> 
> On Sat, Nov 04, 2023 at 09:14:01PM -0400, Bruce Momjian wrote:
> > +   There is no way to change the default privileges for objects created by
> > +   any role.  You have run ALTER DEFAULT PRIVILEGES for 
> > all
> > +   roles that can create objects whose default privileges should be 
> > modified.
> 
> That second sentence is broken, it should be "You have to run [...]" I
> think.

Agreed, fixed, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Remove distprep

2023-11-06 Thread Peter Eisentraut

On 02.11.23 23:34, Andres Freund wrote:

On 2023-11-01 16:39:24 -0400, Peter Eisentraut wrote:

OTOH, it seems somewhat unlikely that maintainer-clean is utilized much in
extensions. I see it in things like postgis, but that has it's own configure
etc, even though it also invokes pgxs.


I thought about this.  I don't think this is something that any extension
would use.  If they care about the distinction between distclean and
maintainer-clean, are they also doing their own distprep and dist?  Seems
unlikely.  I mean, if some extension is actually affected, I'm happy to
accommodate, but we can deal with that when we learn about it.  Moreover, if
we are moving forward in this direction, we would presumably also like the
extensions to get rid of their distprep step.

So I think we are ready to move ahead with this patch.  There have been some
light complaints earlier in this thread that people wanted to keep some way
to clean only some of the files.  But there hasn't been any concrete
follow-up on that, as far as I can see, so I don't know what to do about
that.


+1, let's do this. We can add dedicated target for more specific cases later
if we decide we want that.


done





Re: Support run-time partition pruning for hash join

2023-11-06 Thread Alexander Lakhin

Hello Richard,

06.11.2023 06:05, Richard Guo wrote:


Fixed this issue in v4.



Please look at a warning and an assertion failure triggered by the
following script:
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set min_parallel_table_scan_size = '1kB';

create table t1 (i int) partition by range (i);
create table t1_1 partition of t1 for values from (1) to (2);
create table t1_2 partition of t1 for values from (2) to (3);
insert into t1 values (1), (2);

create table t2(i int);
insert into t2 values (1), (2);
analyze t1, t2;

select * from t1 right join t2 on t1.i = t2.i;

2023-11-06 14:11:37.398 UTC|law|regression|6548f419.392cf5|WARNING: Join 
partition pruning $0 has not been performed yet.
TRAP: failed Assert("node->as_prune_state"), File: "nodeAppend.c", Line: 846, 
PID: 3747061

Best regards,
Alexander

Re: pg_rewind WAL segments deletion pitfall

2023-11-06 Thread Alexander Kukushkin
Hi Torikoshia,


On Thu, 2 Nov 2023 at 04:24, torikoshia  wrote:

>
>
> > +extern void preserve_file(char *filepath);
>
> Is this necessary?
> This function was defined in older version patch, but no longer seems to
> exist.
>
> +# We use "perl -e 'exit(1)'" as a alternative to "false", because the
> last one
> 'a' should be 'an'?
>
>
Thanks for the feedback

Please find the new version attached.

Regards,
--
Alexander Kukushkin

On Thu, 2 Nov 2023 at 04:24, torikoshia  wrote:

> On 2023-10-31 00:26, Alexander Kukushkin wrote:
> > Hi,
> >
> > On Wed, 18 Oct 2023 at 08:50, torikoshia 
> > wrote:
> >
> >> I have very minor questions on the regression tests mainly regarding
> >> the
> >> consistency with other tests for pg_rewind:
> >
> > Please find attached a new version of the patch. It addresses all your
> > comments.
>
> Thanks for updating the patch!
>
> > +extern void preserve_file(char *filepath);
>
> Is this necessary?
> This function was defined in older version patch, but no longer seems to
> exist.
>
> +# We use "perl -e 'exit(1)'" as a alternative to "false", because the
> last one
> 'a' should be 'an'?
>
>
> --
> Regards,
>
> --
> Atsushi Torikoshi
> NTT DATA Group Corporation
>


-- 
Regards,
--
Alexander Kukushkin
From 9b4e41c91312d8a79a03c929224cd4dfe99e7802 Mon Sep 17 00:00:00 2001
From: Alexander Kukushkin 
Date: Mon, 06 Aug 2023 15:56:39 +0100
Subject: [PATCH] Be more picky with WAL segment deletion in pg_rewind

Make pg_rewind to be a bit wiser in terms of creating filemap:
preserve on the target all WAL segments that contain records between the
last common checkpoint and the point of divergence.

Co-authored-by: Polina Bungina 
---
 src/bin/pg_rewind/filemap.c   | 62 +-
 src/bin/pg_rewind/filemap.h   |  3 +
 src/bin/pg_rewind/parsexlog.c | 22 +++
 src/bin/pg_rewind/pg_rewind.c |  3 +
 src/bin/pg_rewind/t/010_keep_recycled_wals.pl | 65 +++
 5 files changed, 153 insertions(+), 2 deletions(-)
 create mode 100644 src/bin/pg_rewind/t/010_keep_recycled_wals.pl

diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index ecadd69dc5..ebaebfb149 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -64,6 +64,27 @@ static file_entry_t *lookup_filehash_entry(const char *path);
 static int	final_filemap_cmp(const void *a, const void *b);
 static bool check_file_excluded(const char *path, bool is_source);
 
+typedef struct skipwal_t {
+	const char *path;
+	uint32		status;
+} skipwal_t;
+
+#define SH_PREFIX		keepwalhash
+#define SH_ELEMENT_TYPE	skipwal_t
+#define SH_KEY_TYPE		const char *
+#define	SH_KEY			path
+#define SH_HASH_KEY(tb, key)	hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b)		(strcmp(a, b) == 0)
+#define	SH_SCOPE		static inline
+#define SH_RAW_ALLOCATOR	pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+static keepwalhash_hash *keepwalhash = NULL;
+
+static bool keepwalhash_entry_exists(const char *path);
+
 /*
  * Definition of one element part of an exclusion list, used to exclude
  * contents when rewinding.  "name" is the name of the file or path to
@@ -207,6 +228,35 @@ lookup_filehash_entry(const char *path)
 	return filehash_lookup(filehash, path);
 }
 
+/* Initialize a hash table to store WAL file names that must be kept */
+void
+keepwalhash_init(void)
+{
+	keepwalhash = keepwalhash_create(FILEHASH_INITIAL_SIZE, NULL);
+}
+
+/* Prevent a given file deletion during rewind */
+void
+insert_keepwalhash_entry(const char *path)
+{
+	skipwal_t   *entry;
+	bool		found;
+
+	/* Should only be called with keepwalhash initialized */
+	Assert(keepwalhash);
+
+	entry = keepwalhash_insert(keepwalhash, path, &found);
+
+	if (!found)
+		entry->path = pg_strdup(path);
+}
+
+static bool
+keepwalhash_entry_exists(const char *path)
+{
+	return keepwalhash_lookup(keepwalhash, path) != NULL;
+}
+
 /*
  * Callback for processing source file list.
  *
@@ -682,7 +732,16 @@ decide_file_action(file_entry_t *entry)
 	}
 	else if (entry->target_exists && !entry->source_exists)
 	{
-		/* File exists in target, but not source. Remove it. */
+		/* File exists in target, but not source. */
+
+		if (keepwalhash_entry_exists(path))
+		{
+			/* This is a WAL file that should be kept. */
+			pg_log_debug("Not removing %s because it is required for recovery", path);
+			return FILE_ACTION_NONE;
+		}
+
+		/* Otherwise remove an unexpected file. */
 		return FILE_ACTION_REMOVE;
 	}
 	else if (!entry->target_exists && !entry->source_exists)
@@ -818,7 +877,6 @@ decide_file_actions(void)
 	return filemap;
 }
 
-
 /*
  * Helper function for filemap hash table.
  */
diff --git a/src/bin/pg_rewind/filemap.h b/src/bin/pg_rewind/filemap.h
index 988d4590e0..611b8cf49e 100644
--- a/src/bin/pg_rewind/filemap.h
+++ b/src/bin/pg_rewind/filemap.h
@@ -110,4 +110,7 @@ extern filemap_t *decide_file_actions(void);
 extern void calculate_totals(filemap_t *filemap);

Re: apply pragma system_header to python headers

2023-11-06 Thread Tom Lane
Peter Eisentraut  writes:
> Analogous to 388e80132c (which was for Perl) but for Python, I propose 
> adding #pragma GCC system_header to plpython.h.  Without it, you get 
> tons of warnings about -Wdeclaration-after-statement, starting with 
> Python 3.12.  (In the past, I have regularly sent feedback to Python to 
> fix their header files, but this is getting old, and we have an easier 
> solution now.)

+1 for the concept --- I was just noticing yesterday that my buildfarm
warning scraping script is turning up some of these.  However, we ought
to try to minimize the amount of our own code that is subject to the
pragma.  So I think a prerequisite ought to be to get this out of
plpython.h:

/*
 * Used throughout, so it's easier to just include it everywhere.
 */
#include "plpy_util.h"

Alternatively, is there a way to reverse the effect of the
pragma after we've included what we need?

(I'm not too happy about the state of plperl.h on this point, either.)

regards, tom lane




Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread Stephen Frost
Greetings,

Thanks for your feedback on this.

* Andres Freund (and...@anarazel.de) wrote:
> I still am quite quite unconvinced that using the LSN as a nonce is a good
> design decision.

This is a really important part of the overall path to moving this
forward, so I wanted to jump to it and have a specific discussion
around this.  I agree that there are downsides to using the LSN, some of
which we could possibly address (eg: include the timeline ID in the IV),
but others that would be harder to deal with.

The question then is- what's the alternative?

One approach would be to change the page format to include space for an
explicit nonce.  I don't see the community accepting such a new page
format as the only format we support though as that would mean no
pg_upgrade support along with wasted space if TDE isn't being used.
Ideally, we'd instead be able to support multiple page formats where
users could decide when they create their cluster what features they
want- and luckily we even have such an effort underway with patches
posted for review [1].  Certainly, with the base page-special-feature
patch, we could have an option for users to choose that they want a
better nonce than the LSN, or we could bundle that assumption in with,
say, the authenticated-encryption feature (if you want authenticated
encryption, then you want more from the encryption system than the
basics, and therefore we presume you also want a better nonce than the
LSN).

Another approach would be a separate fork, but that then has a number of
downsides too- every write has to touch that too, and a single page of
nonces would cover a pretty large number of pages also.

Ultimately, I agree with you that the LSN isn't perfect and we really
shouldn't be calling it 'ideal' as it isn't, and we can work to fix that
language in the patch, but the lack of any alternative being proposed
that might be acceptable makes this feel a bit like rock management [2].

My apologies if there's something good that's already been specifically
pushed and I just missed it; if so, a link to that suggestion and
discussion would be greatly appreciated.

Thanks again!

Stephen

[1]: https://commitfest.postgresql.org/45/3986/
[2]: https://en.wikipedia.org/wiki/Wikipedia:Bring_me_a_rock ; though
that isn't great for a quick summary (which I tried to find on an
isolated page somewhere and didn't).

The gist is, without a suggestion of things to try, we're left
to our own devices to try and figure out things which might be
successful, only to have those turned down too when we come back with
them, see [1] for what feels like an example of this.  Given your
feedback overall, which I'm very thankful for, I'm hopeful that you see
that this is, indeed, a useful feature that people are asking for and
therefore are willing to spend some time on it, but if the feedback is
that nothing on the page is acceptable to use for the nonce, we can't
put the nonce somewhere else, and we can't change the page format, then
everything else is just moving deck chairs around on the titanic that
has been this effort.


signature.asc
Description: PGP signature


Re: Add recovery to pg_control and remove backup_label

2023-11-06 Thread David Steele

On 11/6/23 01:05, Michael Paquier wrote:

On Fri, Oct 27, 2023 at 10:10:42AM -0400, David Steele wrote:

We are still planning to address this issue in the back branches.


FWIW, redesigning the backend code in charge of doing base backups in
the back branches is out of scope.  Based on a read of the proposed
patch, it includes catalog changes which would require a catversion
bump, so that's not going to work anyway.


I did not mean this patch -- rather some variation of what Thomas has 
been working on, more than likely.


Regards,
-David




Re: Compiling warnings on old GCC

2023-11-06 Thread Tom Lane
Richard Guo  writes:
> On Mon, Nov 6, 2023 at 2:51 PM David Rowley  wrote:
>> There's some relevant discussion in
>> https://postgr.es/m/flat/20220602024243.GJ29853%40telsasoft.com

> It seems that the controversial '-Og' coupled with the old GCC version
> (4.8) makes it not worth fixing.  So please ignore this thread.

Although nobody tried to enunciate a formal policy in the other thread,
I think where we ended up is that we'd consider suppressing warnings
seen with -Og, but only with reasonably-modern compiler versions.
For LTS compilers we are only going to care about warnings seen with
production flags (i.e., -O2 or better).

regards, tom lane




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-06 Thread Aleksander Alekseev
Alexander,

> > PFE the corrected patchset v58.
>
> I'd like to revive this thread.

Many thanks for your comments and suggestions.

> I think it worth adding asserts here to verify there is no overflow making us 
> mapping different segments into the same files.

Sorry, I didn't understand this one. Maybe you could provide the exact code?

-- 
Best regards,
Aleksander Alekseev




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-06 Thread Aleksander Alekseev
Hi,

> > > If I remember right, the compiler will make equivalent code from
> > > inline functions and macros, and functions has an additional benefit:
> > > the compiler will report type mismatch if any. That was the only
> > > reason.
> >
> > Then it's OK to leave it as an inline function.

+1

> > BTW, I'm a bit puzzled on who should be the first author now?
> Thanks! It's long, I agree, but the activity around this was huge and
> involved many people, the patch itself already has more than
> half-hundred iterations to date. I think at least people mentioned in
> the commit message of v58 are fair to have author status.
>
> As for the first, I'm not sure, it's hard for me to evaluate what is
> more important, the initial prototype, or the final improvement
> iterations. I don't think the existing order in a commit message has
> some meaning. Maybe it's worth throwing a dice.

Personally I was not aware that the order of the authors in a commit
message carries any information. To my knowledge it doesn't not. I
believe this patchset is a team effort, so I suggest keeping the
commit message as is or shuffling the authors randomly. I believe we
should do our best to reflect the input of people who previously
contributed to the effort, if anyone are aware of them, and add them
to the commit message if they are not there yet. Pretty sure Git will
forgive us if the list ends up being long. Hopefully so will people
who we mistakenly forget.

-- 
Best regards,
Aleksander Alekseev




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-06 Thread Pavel Borisov
On Mon, 6 Nov 2023 at 18:01, Alexander Korotkov  wrote:
>
> On Mon, Nov 6, 2023 at 3:42 PM Pavel Borisov  wrote:
> >
> > On Mon, 6 Nov 2023 at 17:07, Alexander Korotkov  
> > wrote:
> > > On Wed, Jul 5, 2023 at 4:46 PM Aleksander Alekseev 
> > >  wrote:
> > > > PFE the corrected patchset v58.
> > >
> > > I'd like to revive this thread.
> > >
> > > This patchset is extracted from a larger patchset implementing 64-bit 
> > > xids.  It converts page numbers in SLRUs into 64 bits.  The most SLRUs 
> > > save the same file naming scheme, thus their on-disk representation 
> > > remains the same.  However, the patch 0002 converts pg_notify to actually 
> > > use a new naming scheme.  Therefore pg_notify can benefit from 
> > > simplification and getting rid of wraparound.
> > >
> > > -#define TransactionIdToCTsPage(xid) \
> > > -   ((xid) / (TransactionId) COMMIT_TS_XACTS_PER_PAGE)
> > > +
> > > +/*
> > > + * Although we return an int64 the actual value can't currently exceeed 
> > > 2**32.
> > > + */
> > > +static inline int64
> > > +TransactionIdToCTsPage(TransactionId xid)
> > > +{
> > > +   return xid / (int64) COMMIT_TS_XACTS_PER_PAGE;
> > > +}
> > >
> > > Is there any reason we transform macro into a function?  If not, I 
> > > propose to leave this as a macro.  BTW, there is a typo in a word 
> > > "exceeed".
> > If I remember right, the compiler will make equivalent code from
> > inline functions and macros, and functions has an additional benefit:
> > the compiler will report type mismatch if any. That was the only
> > reason.
>
> Then it's OK to leave it as an inline function.
>
> > Also, I looked at v58-0001 and don't quite agree with mentioning the
> > authors of the original 64-xid patch, from which the current patch is
> > derived, as just "privious input" persons.
>
> +1, for converting all "previous input" persons as additional authors.
> That would be a pretty long list of authors though.
> BTW, I'm a bit puzzled on who should be the first author now?
Thanks! It's long, I agree, but the activity around this was huge and
involved many people, the patch itself already has more than
half-hundred iterations to date. I think at least people mentioned in
the commit message of v58 are fair to have author status.

As for the first, I'm not sure, it's hard for me to evaluate what is
more important, the initial prototype, or the final improvement
iterations. I don't think the existing order in a commit message has
some meaning. Maybe it's worth throwing a dice.

Regards, Pavel




Re: Intermittent failure with t/003_logical_slots.pl test on windows

2023-11-06 Thread Nisha Moond
On Fri, Nov 3, 2023 at 5:02 PM Nisha Moond  wrote:
>
> On Thu, Nov 2, 2023 at 11:52 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Tue, 31 Oct 2023 18:11:48 +0530, vignesh C  wrote in
> > > Few others are also facing this problem with similar code like in:
> > > https://stackoverflow.com/questions/15882799/fgets-returning-error-for-file-returned-by-popen
> >
> > I'm inclined to believe that the pipe won't enter the EOF state until
> > the target command terminates (then the top-level cmd.exe). The target
> > command likely terminated prematurely due to an error before priting
> > any output.
> >
> > If we append "2>&1" to the command line, we can capture the error
> > message through the returned pipe if any. Such error messages will
> > cause the subsequent code to fail with an error such as "unexpected
> > string: 'the output'". I'm not sure, but if this is permissive, the
> > returned error messages could potentially provide insight into the
> > underlying issue, paving the way for a potential solution.
> >
>
> Appending '2>&1 test:
> The command still results in NULL and ends up failing as no data is
> returned. Which means even no error message is returned. The error log
> with appended '2>$1' is -
>
> no data was returned by command
> ""D:/Project/pg1/postgres/tmp_install/bin/pg_resetwal" -V 2>&1"
>
> check for "D:/Project/pg1/postgres/tmp_install/bin/pg_resetwal"
> failed: cannot execute
> Failure, exiting
>
> Further observations:
> 1. To make sure the forked process completes before fgets(), I tested
> with Sleep(100) before fgets() call.
> ...
> ...
> if ((pgver = popen(cmd, "r")) == NULL)
> {
> perror("popen failure");
> return NULL;
> }
>
> errno = 0;
> Sleep(100);
> if (fgets(line, maxsize, pgver) == NULL)
> {
> if (feof(pgver))
> fprintf(stderr, "no data was returned by command \"%s\"\n", cmd);
> ...
> ...
>
> This also doesn't resolve the issue, the error is still seen intermittently.
>
> 2.  Even though fgets() fails, the output is still getting captured in
> 'line' string.
> Tested with printing the 'line' in case of failure:
> ...
> ...
> if ((pgver = popen(cmd, "r")) == NULL)
> {
> perror("popen failure");
> return NULL;
> }
>
> errno = 0;
> if (fgets(line, maxsize, pgver) == NULL)
> {
> if (line)
>   fprintf(stderr, "cmd output - %s\n", line);
>
> if (feof(pgver))
>   fprintf(stderr, "no data was returned by command \"%s\"\n", cmd);
> …
> …
> And the log looks like -
> cmd output - postgres (PostgreSQL) 17devel
> no data was returned by command
> ""D:/Project/pg1/postgres/tmp_install/bin/pg_controldata" -V"
>
> check for "D:/Project/pg1/postgres/tmp_install/bin/pg_controldata"
> failed: cannot execute
> Failure, exiting
>
> Attached test result log for the same - "regress_log_003_logical_slots".

The same failure is observed with test 't\002_pg_upgrade.pl' too
(intermittently). So, it is not limited to "t/003_logical_slots.pl"
test alone. It is more closely associated with the pg_upgrade command
run.

--
Thanks,
Nisha Moond




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-06 Thread Alexander Korotkov
On Mon, Nov 6, 2023 at 3:42 PM Pavel Borisov  wrote:
>
> On Mon, 6 Nov 2023 at 17:07, Alexander Korotkov  wrote:
> > On Wed, Jul 5, 2023 at 4:46 PM Aleksander Alekseev 
> >  wrote:
> > > PFE the corrected patchset v58.
> >
> > I'd like to revive this thread.
> >
> > This patchset is extracted from a larger patchset implementing 64-bit xids. 
> >  It converts page numbers in SLRUs into 64 bits.  The most SLRUs save the 
> > same file naming scheme, thus their on-disk representation remains the 
> > same.  However, the patch 0002 converts pg_notify to actually use a new 
> > naming scheme.  Therefore pg_notify can benefit from simplification and 
> > getting rid of wraparound.
> >
> > -#define TransactionIdToCTsPage(xid) \
> > -   ((xid) / (TransactionId) COMMIT_TS_XACTS_PER_PAGE)
> > +
> > +/*
> > + * Although we return an int64 the actual value can't currently exceeed 
> > 2**32.
> > + */
> > +static inline int64
> > +TransactionIdToCTsPage(TransactionId xid)
> > +{
> > +   return xid / (int64) COMMIT_TS_XACTS_PER_PAGE;
> > +}
> >
> > Is there any reason we transform macro into a function?  If not, I propose 
> > to leave this as a macro.  BTW, there is a typo in a word "exceeed".
> If I remember right, the compiler will make equivalent code from
> inline functions and macros, and functions has an additional benefit:
> the compiler will report type mismatch if any. That was the only
> reason.

Then it's OK to leave it as an inline function.

> Also, I looked at v58-0001 and don't quite agree with mentioning the
> authors of the original 64-xid patch, from which the current patch is
> derived, as just "privious input" persons.

+1, for converting all "previous input" persons as additional authors.
That would be a pretty long list of authors though.

BTW, I'm a bit puzzled on who should be the first author now?

--
Regards,
Alexander Korotkov




Re: POC, WIP: OR-clause support for indexes

2023-11-06 Thread Alena Rybakina

On 30.10.2023 17:06, Alexander Korotkov wrote:

On Mon, Oct 30, 2023 at 3:40 PM Robert Haas  wrote:

On Thu, Oct 26, 2023 at 5:05 PM Peter Geoghegan  wrote:

On Thu, Oct 26, 2023 at 12:59 PM Robert Haas  wrote:

Alexander's example seems to show that it's not that simple. If I'm
reading his example correctly, with things like aid = 1, the
transformation usually wins even if the number of things in the OR
expression is large, but with things like aid + 1 * bid = 1, the
transformation seems to lose at least with larger numbers of items. So
it's not JUST the number of OR elements but also what they contain,
unless I'm misunderstanding his point.

Alexander said "Generally, I don't see why ANY could be executed
slower than the equivalent OR clause". I understood that this was his
way of expressing the following idea:

"In principle, there is no reason to expect execution of ANY() to be
slower than execution of an equivalent OR clause (except for
noise-level differences). While it might not actually look that way
for every single type of plan you can imagine right now, that doesn't
argue for making a cost-based decision. It actually argues for fixing
the underlying issue, which can't possibly be due to some kind of
fundamental advantage enjoyed by expression evaluation with ORs".

This is also what I think of all this.

I agree with that, with some caveats, mainly that the reverse is to
some extent also true. Maybe not completely, because arguably the
ANY() formulation should just be straight-up easier to deal with, but
in principle, the two are equivalent and it shouldn't matter which
representation we pick.

But practically, it may, and we need to be sure that we don't put in
place a translation that is theoretically a win but in practice leads
to large regressions. Avoiding regressions here is more important than
capturing all the possible gains. A patch that wins in some scenarios
and does nothing in others can be committed; a patch that wins in even
more scenarios but causes serious regressions in some cases probably
can't.

+1
Sure, I've identified two cases where patch shows regression [1].  The
first one (quadratic complexity of expression processing) should be
already addressed by usage of hash.  The second one (planning
regression with Bitmap OR) is not yet addressed.

Links
1. 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com

I also support this approach. I have almost finished writing a patch 
that fixes the first problem related to the quadratic complexity of 
processing expressions by adding a hash table.


I also added a check: if the number of groups is equal to the number of 
OR expressions, we assume that no expressions need to be converted and 
interrupt further execution.


Now I am trying to fix the last problem in this patch: three tests have 
indicated a problem related to incorrect conversion. I don't think it 
can be serious, but I haven't figured out where the mistake is yet.


I added log like that: ERROR:  unrecognized node type: 0.

--
Regards,
Alena Rybakina
Postgres Professional
From 3062a2408af258b9e327219020221b75501e8530 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Mon, 6 Nov 2023 16:48:00 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than set
 or_transform_limit. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

Authors: Alena Rybakina , Andrey Lepikhov 
Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 
---
 src/backend/parser/parse_expr.c   | 299 +-
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/include/parser/parse_expr.h   |   1 +
 src/test/regress/expected/create_index.out| 115 +++
 src/test/regress/expected/guc.out |   3 +-
 src/test/regress/expected/join.out|  50 +++
 src/test/regress/expected/partition_prune.out | 179 +++
 src/test/regress/expected/tidscan.out |  17 +
 src/test/regress/sql/create_index.sql |  32 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   1 +
 13 files changed, 743 insertions(+), 2 deletions(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 64c582c344c..25a4235dbd9 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -18,6 +18,7 @@
 #include "catalog/pg_aggregate.h

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-06 Thread Pavel Borisov
On Mon, 6 Nov 2023 at 17:07, Alexander Korotkov  wrote:
>
> Hi!
>
> On Wed, Jul 5, 2023 at 4:46 PM Aleksander Alekseev  
> wrote:
> > PFE the corrected patchset v58.
>
> I'd like to revive this thread.
>
> This patchset is extracted from a larger patchset implementing 64-bit xids.  
> It converts page numbers in SLRUs into 64 bits.  The most SLRUs save the same 
> file naming scheme, thus their on-disk representation remains the same.  
> However, the patch 0002 converts pg_notify to actually use a new naming 
> scheme.  Therefore pg_notify can benefit from simplification and getting rid 
> of wraparound.
>
> -#define TransactionIdToCTsPage(xid) \
> -   ((xid) / (TransactionId) COMMIT_TS_XACTS_PER_PAGE)
> +
> +/*
> + * Although we return an int64 the actual value can't currently exceeed 
> 2**32.
> + */
> +static inline int64
> +TransactionIdToCTsPage(TransactionId xid)
> +{
> +   return xid / (int64) COMMIT_TS_XACTS_PER_PAGE;
> +}
>
> Is there any reason we transform macro into a function?  If not, I propose to 
> leave this as a macro.  BTW, there is a typo in a word "exceeed".
If I remember right, the compiler will make equivalent code from
inline functions and macros, and functions has an additional benefit:
the compiler will report type mismatch if any. That was the only
reason.

Also, I looked at v58-0001 and don't quite agree with mentioning the
authors of the original 64-xid patch, from which the current patch is
derived, as just "privious input" persons.

Regards, Pavel Borisov




Re: apply pragma system_header to python headers

2023-11-06 Thread Andrew Dunstan



On 2023-11-06 Mo 07:02, Peter Eisentraut wrote:
Analogous to 388e80132c (which was for Perl) but for Python, I propose 
adding #pragma GCC system_header to plpython.h.  Without it, you get 
tons of warnings about -Wdeclaration-after-statement, starting with 
Python 3.12.  (In the past, I have regularly sent feedback to Python 
to fix their header files, but this is getting old, and we have an 
easier solution now.)



WFM


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Wrong security context for deferred triggers?

2023-11-06 Thread Laurenz Albe
Create a table and a deferrable constraint trigger:

 CREATE TABLE tab (i integer);

 CREATE FUNCTION trig() RETURNS trigger
LANGUAGE plpgsql AS
 $$BEGIN
RAISE NOTICE 'current_user = %', current_user;
RETURN NEW;
 END;$$;

 CREATE CONSTRAINT TRIGGER trig AFTER INSERT ON tab
DEFERRABLE INITIALLY IMMEDIATE
FOR EACH ROW EXECUTE FUNCTION trig();

Create a role and allow it INSERT on the table:

 CREATE ROLE duff;

 GRANT INSERT ON tab TO duff;

Now become that role and try some inserts:

 SET ROLE duff;

 BEGIN;

 INSERT INTO tab VALUES (1);
 NOTICE:  current_user = duff

That looks ok; the current user is "duff".

 SET CONSTRAINTS ALL DEFERRED;

 INSERT INTO tab VALUES (2);

Become a superuser again and commit:

 RESET ROLE;

 COMMIT;
 NOTICE:  current_user = postgres


So a deferred constraint trigger does not run with the same security context
as an immediate trigger.  This is somewhat nasty in combination with
SECURITY DEFINER functions: if that function performs an operation, and that
operation triggers a deferred trigger, that trigger will run in the wrong
security context.

This behavior looks buggy to me.  What do you think?
I cannot imagine that it is a security problem, though.

Yours,
Laurenz Albe




Re: Optimizing "boundary cases" during backward scan B-Tree index descents

2023-11-06 Thread Matthias van de Meent
On Sun, 15 Oct 2023 at 22:56, Peter Geoghegan  wrote:
>
> On Mon, Sep 18, 2023 at 4:58 PM Peter Geoghegan  wrote:
> > Attached is v3, which is a straightforward rebase of v2. v3 is needed
> > to get the patch to apply cleanly against HEAD - so no real changes
> > here.
>
> Attached is v4. Just to keep CFTester happy.

> @@ -402,10 +405,27 @@ _bt_binsrch(Relation rel,
> +if (unlikely(key->backward))
> +return OffsetNumberPrev(low);
> +
> return low;

I wonder if this is (or can be) optimized to the mostly equivalent
"return low - (OffsetNumber) key->backward", as that would remove an
"unlikely" branch that isn't very unlikely during page deletion, even
if page deletion by itself is quite rare.
I'm not sure it's worth the additional cognitive overhead, or if there
are any significant performance implications for the hot path.

> @@ -318,9 +318,12 @@ _bt_moveright(Relation rel,
> [...]
>  * On a leaf page, _bt_binsrch() returns the OffsetNumber of the first
> [...]
> + * key >= given scankey, or > scankey if nextkey is true for forward scans.
> + * _bt_binsrch() also "steps back" by one item/tuple on the leaf level in the
> + * case of backward scans.  (NOTE: this means it is possible to return a 
> value
> + * that's 1 greater than the number of keys on the leaf page.  It also means
> + * that we can return an item 1 less than the first non-pivot tuple on any
> + * leaf page.)

I think this can use a bit more wordsmithing: the use of "also" with
"steps back" implies we also step back in other cases, which aren't
mentioned. Could you update the wording to be more clear about this?

> @@ -767,7 +787,7 @@ _bt_compare(Relation rel,
> [...]
> - * Most searches have a scankey that is considered greater than a
> + * Forward scans have a scankey that is considered greater than a

Although it's not strictly an issue for this patch, the comment here
doesn't describe backward scans in as much detail as forward scans
here. The concepts are mostly "do the same but in reverse", but the
difference is noticable.

Apart from these comments, no further noteworthy comments. Looks good.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-06 Thread Alexander Korotkov
Hi!

On Wed, Jul 5, 2023 at 4:46 PM Aleksander Alekseev 
wrote:
> PFE the corrected patchset v58.

I'd like to revive this thread.

This patchset is extracted from a larger patchset implementing 64-bit
xids.  It converts page numbers in SLRUs into 64 bits.  The most SLRUs save
the same file naming scheme, thus their on-disk representation remains the
same.  However, the patch 0002 converts pg_notify to actually use a new
naming scheme.  Therefore pg_notify can benefit from simplification and
getting rid of wraparound.

-#define TransactionIdToCTsPage(xid) \
-   ((xid) / (TransactionId) COMMIT_TS_XACTS_PER_PAGE)
+
+/*
+ * Although we return an int64 the actual value can't currently exceeed
2**32.
+ */
+static inline int64
+TransactionIdToCTsPage(TransactionId xid)
+{
+   return xid / (int64) COMMIT_TS_XACTS_PER_PAGE;
+}

Is there any reason we transform macro into a function?  If not, I propose
to leave this as a macro.  BTW, there is a typo in a word "exceeed".

+static int inline
+SlruFileName(SlruCtl ctl, char *path, int64 segno)
+{
+   if (ctl->long_segment_names)
+   /*
+* We could use 16 characters here but the disadvantage would be
that
+* the SLRU segments will be hard to distinguish from WAL segments.
+*
+* For this reason we use 15 characters. It is enough but also means
+* that in the future we can't decrease SLRU_PAGES_PER_SEGMENT
easily.
+*/
+   return snprintf(path, MAXPGPATH, "%s/%015llX", ctl->Dir,
+   (long long) segno);
+   else
+   return snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir,
+   (unsigned int) segno);
+}

I think it worth adding asserts here to verify there is no overflow making
us mapping different segments into the same files.

+   return occupied == max_notify_queue_pages;

I'm not sure if the current code could actually allow to occupy more than
max_notify_queue_pages.  Probably not even in extreme cases.  But I still
think it will more safe and easier to read to write "occupied >=
max_notify_queue"_pages here.

diff --git a/src/test/modules/test_slru/test_slru.c
b/src/test/modules/test_slru/test_slru.c

The actual 64-bitness of SLRU pages isn't much exercised in our automated
tests.  It would be too exhausting to make pg_notify actually use higher
than 2**32 page numbers.  Thus, I think test/modules/test_slru is a good
place to give high page numbers a good test.

--
Regards,
Alexander Korotkov


Re: Show WAL write and fsync stats in pg_stat_io

2023-11-06 Thread Nazir Bilal Yavuz
Hi,

On Tue, 31 Oct 2023 at 16:57, Nazir Bilal Yavuz  wrote:
> On Thu, 26 Oct 2023 at 09:28, Michael Paquier  wrote:
> >
> > And perhaps just putting that everything that calls
> > pgstat_count_io_op_time() under track_io_timing is just natural?
> > What's the performance regression you would expect if both WAL and
> > block I/O are controlled by that, still one would expect only one of
> > them?
>
> I will check these and I hope I will come back with something meaningful.

I applied the patches on upstream postgres and then run pgbench for each
available clock sources couple of times:
# Set fsync = off and track_io_timing = on
# pgbench -i -s 100 test
# pgbench -M prepared -c16 -j8 -f <( echo "SELECT
pg_logical_emit_message(true, \:client_id::text, '1234567890');") -T60 test

Results are:

╔═╦═══╦╗
║ ║  track_wal_io_timing  ║║
╠═╬═══╦═══╬╣
║  clock  ║   on  ║  off  ║ change ║
║ sources ║   ║   ║║
╠═╬═══╬═══╬╣
║   tsc   ║   ║   ║║
║ ║ 514814.459170 ║ 519826.284139 ║   %1   ║
╠═╬═══╬═══╬╣
║   hpet  ║   ║   ║║
║ ║ 132116.272121 ║ 141820.548447 ║   %7   ║
╠═╬═══╬═══╬╣
║ acpi_pm ║   ║   ║║
║ ║ 394793.092255 ║ 403723.874719 ║   %2   ║
╚═╩═══╩═══╩╝

Regards,
Nazir Bilal Yavuz
Microsoft


Re: Synchronizing slots from primary to standby

2023-11-06 Thread Amit Kapila
On Mon, Nov 6, 2023 at 1:57 PM Amit Kapila  wrote:
>
> On Mon, Nov 6, 2023 at 7:01 AM Zhijie Hou (Fujitsu)
>  wrote:
> >

+static void
+WalSndGetStandbySlots(List **standby_slots, bool force)
+{
+ if (!MyReplicationSlot->data.failover)
+ return;
+
+ if (standby_slot_names_list == NIL && strcmp(standby_slot_names, "") != 0)
+ SlotSyncInitConfig();
+
+ if (force || StandbySlotNamesPreReload == NULL ||
+ strcmp(StandbySlotNamesPreReload, standby_slot_names) != 0)
+ {
+ list_free(*standby_slots);
+
+ if (StandbySlotNamesPreReload)
+ pfree(StandbySlotNamesPreReload);
+
+ StandbySlotNamesPreReload = pstrdup(standby_slot_names);
+ *standby_slots = list_copy(standby_slot_names_list);
+ }
+}

I find this code bit difficult to understand. I think we don't need to
maintain a global variable like StandbySlotNamesPreReload. We can use
a local variable for it on the lines of what we do in
StartupRereadConfig(). Then, can we think of maintaining
standby_slot_names_list in something related to decoding like
LogicalDecodingContext as this will be used during decoding only?

-- 
With Regards,
Amit Kapila.




apply pragma system_header to python headers

2023-11-06 Thread Peter Eisentraut
Analogous to 388e80132c (which was for Perl) but for Python, I propose 
adding #pragma GCC system_header to plpython.h.  Without it, you get 
tons of warnings about -Wdeclaration-after-statement, starting with 
Python 3.12.  (In the past, I have regularly sent feedback to Python to 
fix their header files, but this is getting old, and we have an easier 
solution now.)From 7ca9e82d3c5e30ea21fe5d428b3fa0d6e5b3e942 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 6 Nov 2023 12:57:54 +0100
Subject: [PATCH] python: Hide warnings inside Python headers when using gcc
 compatible compiler

Analogous to 388e80132c but for Python.  This is necessary from Python
3.12 to get a warning-free build.
---
 src/pl/plpython/plpython.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/pl/plpython/plpython.h b/src/pl/plpython/plpython.h
index 91f6f8d860..eb01e62fb6 100644
--- a/src/pl/plpython/plpython.h
+++ b/src/pl/plpython/plpython.h
@@ -25,6 +25,16 @@
  */
 #define HAVE_SNPRINTF 1
 
+/*
+ * Newer versions of the Python headers trigger a lot of warnings with our
+ * compiler flags (at least -Wdeclaration-after-statement is known to be
+ * problematic). The system_header pragma hides warnings from within the rest
+ * of this file, if supported.
+ */
+#ifdef HAVE_PRAGMA_GCC_SYSTEM_HEADER
+#pragma GCC system_header
+#endif
+
 #if defined(_MSC_VER) && defined(_DEBUG)
 /* Python uses #pragma to bring in a non-default libpython on VC++ if
  * _DEBUG is defined */
-- 
2.42.0



Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-11-06 Thread Dilip Kumar
On Mon, Nov 6, 2023 at 4:44 PM Andrey M. Borodin  wrote:
> > On 6 Nov 2023, at 14:31, Alvaro Herrera  wrote:
> >
> > dynahash is notoriously slow, which is why we have simplehash.h since
> > commit b30d3ea824c5.  Maybe we could use that instead.
>
> Dynahash has lock partitioning. Simplehash has not, AFAIK.

Yeah, Simplehash doesn't have partitioning so with simple hash we will
be stuck with the centralized control lock that is one of the main
problems trying to solve here.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Add XMLText function (SQL/XML X038)

2023-11-06 Thread Jim Jones


On 06.11.23 11:49, Daniel Gustafsson wrote:
> I took another look at this today, fixes the above mentioned typos and some
> tiny cosmetic things and pushed it.
>
> --
> Daniel Gustafsson
>
Awesome! Thanks Daniel and Vik for reviewing and pushing this patch :)

-- 
Jim





Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-11-06 Thread Andrey M. Borodin



> On 6 Nov 2023, at 14:31, Alvaro Herrera  wrote:
> 
> dynahash is notoriously slow, which is why we have simplehash.h since
> commit b30d3ea824c5.  Maybe we could use that instead.

Dynahash has lock partitioning. Simplehash has not, AFAIK.
The thing is we do not really need a hash function - pageno is already a best 
hash function itself. And we do not need to cope with collisions much - we can 
evict a collided buffer.

Given this we do not need a hashtable at all. That’s exact reasoning how banks 
emerged, I started implementing dynahsh patch in April 2021 and found out that 
“banks” approach is cleaner. However the term “bank” is not common in software, 
it’s taken from hardware cache.


Best regards, Andrey Borodin.



Re: Statistics Import and Export

2023-11-06 Thread Shubham Khanna
On Mon, Nov 6, 2023 at 4:16 PM Corey Huinker  wrote:
>>
>>
>> Yeah, that use makes sense as well, and if so then postgres_fdw would likely 
>> need to be aware of the appropriate query for several versions back - they 
>> change, not by much, but they do change. So now we'd have each query text in 
>> three places: a system view, postgres_fdw, and the bin/scripts pre-upgrade 
>> program. So I probably should consider the best way to share those in the 
>> codebase.
>>
>
> Attached is v2 of this patch.

While applying Patch, I noticed few Indentation issues:
1) D:\Project\Postgres>git am v2-0003-Add-pg_import_rel_stats.patch
.git/rebase-apply/patch:1265: space before tab in indent.
errmsg("invalid statistics
format, stxndeprs must be array or null");
.git/rebase-apply/patch:1424: trailing whitespace.
   errmsg("invalid statistics format,
stxndistinct attnums elements must be strings, but one is %s",
.git/rebase-apply/patch:1315: new blank line at EOF.
+
warning: 3 lines add whitespace errors.
Applying: Add pg_import_rel_stats().

2) D:\Project\Postgres>git am v2-0004-Add-pg_export_stats-pg_import_stats.patch
.git/rebase-apply/patch:282: trailing whitespace.
const char *export_query_v14 =
.git/rebase-apply/patch:489: trailing whitespace.
const char *export_query_v12 =
.git/rebase-apply/patch:648: trailing whitespace.
const char *export_query_v10 =
.git/rebase-apply/patch:826: trailing whitespace.

.git/rebase-apply/patch:1142: trailing whitespace.
result = PQexec(conn,
warning: squelched 4 whitespace errors
warning: 9 lines add whitespace errors.
Applying: Add pg_export_stats, pg_import_stats.

Thanks and Regards,
Shubham Khanna.




Re: [PATCH] Add XMLText function (SQL/XML X038)

2023-11-06 Thread Daniel Gustafsson
> On 4 Nov 2023, at 15:01, Vik Fearing  wrote:
> 
> On 11/3/23 21:28, Jim Jones wrote:
>> On 03.11.23 19:05, Vik Fearing wrote:
>>> I was thinking of something much shorter than that.  Such as
>>> 
>>> X038XMLText YES supported except for RETURNING
>> v6 attached includes this change and the doc addition from Daniel.
> 
> There are some typos in the commit message, but otherwise this looks 
> commitable to me.

I took another look at this today, fixes the above mentioned typos and some
tiny cosmetic things and pushed it.

--
Daniel Gustafsson





Re: ResourceOwner refactoring

2023-11-06 Thread Peter Eisentraut
It looks like this patch set needs a bit of surgery to adapt to the LLVM 
changes in 9dce22033d.  The cfbot is reporting compiler warnings about 
this, and also some crashes, which might also be caused by this.


I do like the updated APIs.  (Maybe the repeated ".DebugPrint = NULL, 
 /* default message is fine */" lines could be omitted?)


I like that one can now easily change the elog(WARNING) in 
ResourceOwnerReleaseAll() to a PANIC or something to get automatic 
verification during testing.  I wonder if we should make this the 
default if assertions are on?  This would need some adjustments to 
src/test/modules/test_resowner because it would then fail.






Re: Introduction and Inquiry on Beginner-Friendly Issues

2023-11-06 Thread Aleksander Alekseev
Hi,

> I'm reaching out to request your guidance in identifying beginner-friendly 
> issues that I can start working on. I'm eager to become an active contributor 
> to the PostgreSQL community and help improve the project. Whether it's bug 
> fixes, documentation updates, or other tasks, I'm open to getting involved in 
> various areas.
>
> I would greatly appreciate any recommendations or pointers to where I can 
> find suitable issues or specific resources that can aid my journey as a 
> contributor.

You can find my thoughts on the subject here [1].

I wish you all the best on the journey and look forward to your contributions!

[1]: 
https://www.timescale.com/blog/how-and-why-to-become-a-postgresql-contributor/

-- 
Best regards,
Aleksander Alekseev




Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread Matthias van de Meent
On Sat, 4 Nov 2023 at 03:38, Andres Freund  wrote:
>
> Hi,
>
> On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > I'm quite surprised at the significant number of changes being made
> > outside the core storage manager files. I thought that changing out
> > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > would be the most obvious change to implement cluster-wide encryption
> > with the least code touched, as relations don't need to know whether
> > the files they're writing are encrypted, right? Is there a reason to
> > not implement this at the smgr level that I overlooked in the
> > documentation of these patches?
>
> You can't really implement encryption transparently inside an smgr without
> significant downsides. You need a way to store an initialization vector
> associated with the page (or you can store that elsewhere, but then you've
> doubled the worst cse amount of random reads/writes). The patch uses the LSN
> as the IV (which I doubt is a good idea). For authenticated encryption further
> additional storage space is required.

I am unaware of any user of the smgr API that doesn't also use the
buffer cache, and thus implicitly the Page layout with PageHeader
[^1]. The API of smgr is also tailored to page-sized quanta of data
with mostly relation-level information. I don't see why there would be
a veil covering the layout of Page for smgr when all other information
already points to the use of PageHeader and Page layouts. In my view,
it would even make sense to allow the smgr to get exclusive access to
some part of the page in the current Page layout.

Yes, I agree that there will be an impact on usable page size if you
want authenticated encryption, and that AMs will indeed need to
account for storage space now being used by the smgr - inconvenient,
but it serves a purpose. That would happen regardless of whether smgr
or some higher system decides where to store the data for encryption -
as long as it is on the page, the AM effectively can't use those
bytes.
But I'd say that's best solved by making the Page documentation and
PageInit API explicit about the potential use of that space by the
chosen storage method (encrypted, plain, ...) instead of requiring the
various AMs to manually consider encryption when using Postgres' APIs
for writing data to disk without hitting shared buffers; page space
management is already a task of AMs, but handling the actual
encryption is not.

Should the AM really care whether the data on disk is encrypted or
not? I don't think so. When the disk contains encrypted bytes, but
smgrread() and smgrwrite() both produce and accept plaintext data,
who's going to complain? Requiring AMs to be mindful about encryption
on all common paths only adds pitfalls where encryption would be
forgotten by the developer of AMs in one path or another.

> To be able to to use the LSN as the IV, the patch needs to ensure that the LSN
> increases in additional situations (a more aggressive version of
> wal_log_hint_bits) - which can't be done below smgr, where we don't know about
> what WAL logging was done. Nor can you easily just add space on the page below
> md.c, for the purpose of storing an LSN independent IV and the authentication
> data.

I think that getting PageInit to allocate the smgr-specific area would
take some effort, too (which would potentially require adding some
relational context to PageInit, so that it knows which page of which
relation it is going to initialize), but IMHO that would be more
natural than requiring all index and table AMs to be aware the actual
encryption of its pages and require manual handling of that encryption
when the page needs to be written to disk, when it otherwise already
conforms to the various buffer management and file extension APIs
currently in use in PostgreSQL. I would expect "transparent" data
encryption to be handled at the file write layer (i.e. smgr), not
inside the AMs.

Kind regards,

Matthias van de Meent

[^1] ReadBuffer_common uses PageIsVerifiedExtended which verifies that
a page conforms with Postgres' Page layout if checksums are enabled.
Furthermore, all builtin index AMs utilize pd_special, further
implying the use of a PageInit/PageHeader-based page layout.
Additionally, the heap tableAM also complies, and both FSM and VM also
use postgres' Page layout.
As for other AMs that I could check: bloom, rum, and pgvector's
ivfflat and hnsw all use page layouts.




Re: [PATCH] Small refactoring of inval.c and inval.h

2023-11-06 Thread Aleksander Alekseev
Hi,

>  extern void InvalidateSystemCaches(void);
> -extern void InvalidateSystemCachesExtended(bool debug_discard);
>
> Indeed, that looks a bit strange, but is there a strong need in
> removing it, as you are proposing?  There is always a risk that this
> could be called by some out-of-core code.  This is one of these
> things where we could break something just for the sake of breaking
> it, so there is no real benefit IMO.

Fair enough, here is the corrected patch.

-- 
Best regards,
Aleksander Alekseev


v3-0001-Refactor-inval.c.patch
Description: Binary data


Re: Query execution in Perl TAP tests needs work

2023-11-06 Thread Jehan-Guillaume de Rorthais
On Wed, 18 Oct 2023 18:25:01 +0200
Alvaro Herrera  wrote:

> On 2023-Oct-18, Robert Haas wrote:
> 
> > Without FFI::Platypus, we have to write Perl code that can speak the
> > wire protocol directly. Basically, we're writing our own PostgreSQL
> > driver for Perl, though we might need only a subset of the things a
> > real driver would need to handle, and we might add some extra things,
> > like code that can send intentionally botched protocol messages.  
> 
> We could revive the old src/interfaces/perl5 code, which was a libpq
> wrapper -- at least the subset of it that the tests need.  It was moved
> to gborg by commit 9a0b4d7f8474 and a couple more versions were made
> there, which can be seen at
> https://www.postgresql.org/ftp/projects/gborg/pgperl/stable/,
> version 2.1.1 being apparently the latest.  The complete driver was
> about 3000 lines, judging by the commit that removed it.  Presumably we
> don't need the whole of that.

+1 to test this. I can give it some time to revive it and post results here if
you agree and no one think of some show stopper.




Re: meson documentation build open issues

2023-11-06 Thread Christoph Berg
Re: Andres Freund
> > > The reason for that is simply that the docs take too long to build.
> >
> > That why I'd prefer to be able to separate arch:all and arch:any
> > builds, yes.
> 
> What's stopping you from doing that?  I think the only arch:any content we
> have is the docs, and those you can build separately? Doc builds do trigger
> generation of a handful of files besides the docs, but not more.

Historically, .deb files have been required to contain the manpages
for all executables even when there's a separate -doc package. This
means we'd need a separate (hopefully fast) manpage build even when
the arch:any binaries are built. We might get around that by
introducing a new postgresql-manpages-XX arch:all package, but that
might be too much micropackaging.

The install-doc-man target will likely fix it, will play with it a bit
more, thanks.

> > > I'm working on merging it. Having it for core PG isn't a huge difficulty, 
> > > the
> > > extension story is what's been holding me back...
> >
> > In-core extensions or external ones?
> 
> Both, although the difficulty of doing it is somewhat separate for each.

I'd think most external extensions could stay with pgxs.mk for the
time being.


> + 
> +  -Dselinux={ disabled | auto | enabled }
> +  
> +   
> +Build with selinux support, enabling the 
> +extension.

This option defaults to ... auto?


> index 90e2c062fa8..003b57498bb 100644
> --- a/doc/src/sgml/meson.build
> +++ b/doc/src/sgml/meson.build
> @@ -142,6 +142,7 @@ if docs_dep.found()
>'--install-dir-contents', dir_doc_html, html],
>  build_always_stale: true, build_by_default: false,
>)
> +  alias_target('doc-html', install_doc_html)
>alias_target('install-doc-html', install_doc_html)

Shouldn't this just build the html docs, without installing?

> +  alias_target('doc-man', install_doc_html)
>alias_target('install-doc-man', install_doc_man)

... same


> + 
> +   install-install-world

install-world

> + 
> +   install-doc-html
> +   
> +
> + Install documentation in man page format.

install-doc-man

> +   
> +Documentation Targets

> + 
> +   docs
> +   doc-html
> +   
> +
> + Build documentation in multi-page HTML format.  Note that
> + docs does not include building
> + man page documentation, as man page generation seldom fails when
> + building HTML documentation succeeds.

Why is that a reason for not building the manpages?

> +   
> +Code Targets

I would have expected the sections to be in the order
build-docs-install. Having install first seems weird to me.

> +   
> +Other Targets
> +
> +
> +
> + 
> +   clean
> +   
> +
> + Remove all build products
> +
> +   
> + 
> +
> + 
> +   test
> +   
> +
> + Remove all enabled tests. Support for some classes of tests can be
> + enabled / disabled with 
> + and .

This should explicitly say if contrib tests are included (or there
needs to be a separate test-world target.)


> Subject: [PATCH v1 5/5] meson: Add -Dpkglibdir option

Will give that a try, thanks!

Christoph




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-11-06 Thread Alvaro Herrera
On 2023-Nov-06, Dilip Kumar wrote:

> Yeah so we can see with a small bank size <=16 slots we are seeing
> that the fetching page with hash is 30% slower than the sequential
> search, but beyond 32 slots sequential search is become slower as you
> grow the number of slots whereas with hash it stays constant as
> expected.  But now as you told if keep the lock partition range
> different than the bank size then we might not have any problem by
> having more numbers of banks and with that, we can keep the bank size
> small like 16.  Let me put some more thought into this and get back.
> Any other opinions on this?

dynahash is notoriously slow, which is why we have simplehash.h since
commit b30d3ea824c5.  Maybe we could use that instead.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)




Re: collect_corrupt_items_vacuum.patch

2023-11-06 Thread Alexander Korotkov
Hi!

On Tue, Jul 4, 2023 at 10:21 AM Daniel Gustafsson  wrote:
> This patch has been waiting on the author for about a year now, so I will 
> close
> it as Returned with Feedback.  Plesae feel free to resubmit to a future CF 
> when
> there is renewed interest in working on this.

I'd like to revive this thread.  While the patch proposed definitely
makes things better.  But as pointed out by Robert and Tom, it didn't
allow to avoid all false reports.  The reason is that the way we
currently calculate the oldest xmin, it could move backwards (see
comments to ComputeXidHorizons()).  The attached patch implements own
function to calculate strict oldest xmin, which should be always
greater or equal to any xid horizon calculated before.  I have to do
the following changes in comparison to what ComputeXidHorizons() do.

1. Ignore processes xmin's, because they take into account connection
to other databases which were ignored before.
2. Ignore KnownAssignedXids, because they are not database-aware.
While primary could compute its horizons database-aware.
3. Ignore walsender xmin, because it could go backward if some
replication connections don't use replication slots.

Surely these would significantly sacrifice accuracy. But we have to do
so in order to avoid reporting false errors.

Any thoughts?

--
Regards,
Alexander Korotkov


0001-Fix-false-reports-in-pg_visibility-v2.patch
Description: Binary data


Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-11-06 Thread Dilip Kumar
On Mon, Nov 6, 2023 at 1:05 PM Andrey M. Borodin  wrote:

> > On 6 Nov 2023, at 09:09, Dilip Kumar  wrote:
> >
> >
> >> Having hashtable to find SLRU page in the buffer IMV is too slow. Some 
> >> comments on this approach can be found here [0].
> >> I'm OK with having HTAB for that if we are sure performance does not 
> >> degrade significantly, but I really doubt this is the case.
> >> I even think SLRU buffers used HTAB in some ancient times, but I could not 
> >> find commit when it was changed to linear search.
> >
> > The main intention of having this buffer mapping hash is to find the
> > SLRU page faster than sequence search when banks are relatively bigger
> > in size, but if we find the cases where having hash creates more
> > overhead than providing gain then I am fine to remove the hash because
> > the whole purpose of adding hash here to make the lookup faster.  So
> > far in my test I did not find the slowness.  Do you or anyone else
> > have any test case based on the previous research on whether it
> > creates any slowness?
> PFA test benchmark_slru_page_readonly(). In this test we run 
> SimpleLruReadPage_ReadOnly() (essential part of TransactionIdGetStatus())
> before introducing HTAB for buffer mapping I get
> Time: 14837.851 ms (00:14.838)
> with buffer HTAB I get
> Time: 22723.243 ms (00:22.723)
>
> This hash table makes getting transaction status ~50% slower.
>
> Benchmark script I used:
> make -C $HOME/postgresMX -j 8 install && (pkill -9 postgres; rm -rf test; 
> ./initdb test && echo "shared_preload_libraries = 'test_slru'">> 
> test/postgresql.conf && ./pg_ctl -D test start && ./psql -c 'create extension 
> test_slru' postgres && ./pg_ctl -D test restart && ./psql -c "SELECT 
> count(test_slru_page_write(a, 'Test SLRU'))
>   FROM generate_series(12346, 12393, 1) as a;" -c '\timing' -c "SELECT 
> benchmark_slru_page_readonly(12377);" postgres)

With this test, I got below numbers,

nslots.  no-hashhash
810s   13s
16  10s   13s
32  15s   13s
64  17s   13s

Yeah so we can see with a small bank size <=16 slots we are seeing
that the fetching page with hash is 30% slower than the sequential
search, but beyond 32 slots sequential search is become slower as you
grow the number of slots whereas with hash it stays constant as
expected.  But now as you told if keep the lock partition range
different than the bank size then we might not have any problem by
having more numbers of banks and with that, we can keep the bank size
small like 16.  Let me put some more thought into this and get back.
Any other opinions on this?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2023-11-06 Thread Ashutosh Bapat
explain.out in 0001 needed some adjustments. Without those CIbot shows
failures. Fixed in the attached patchset. 0001 is just for diagnosis,
not for actual commit. 0002 which is the actual patch has no changes
wrt to the previous version.

On Tue, Oct 31, 2023 at 11:06 AM Ashutosh Bapat
 wrote:
>
> On Fri, Sep 8, 2023 at 3:22 PM Ashutosh Bapat
>  wrote:
> >
> > On Fri, Jul 28, 2023 at 3:16 PM Ashutosh Bapat
> >  wrote:
> > >
> > > Hi Tom, Richard,
> > >
> > > On Mon, Jul 24, 2023 at 8:17 AM Richard Guo  
> > > wrote:
> > > >
> > > > Thanks for pushing it!
> > >
> > > With this fix, I saw a noticeable increase in the memory consumption
> > > of planner. I was running experiments mentioned in [1] The reason is
> > > the Bitmapset created by bms_union() are not freed during planning and
> > > when there are thousands of child joins involved, bitmapsets takes up
> > > a large memory and there any a large number of bitmaps.
> > >
> > > Attached 0002 patch fixes the memory consumption by calculating
> > > appinfos only once and using them twice. The number look like below
> > >
> > > Number of tables joined | without patch | with patch |
> > > --
> > >   2 |  40.8 MiB |   40.3 MiB |
> > >   3 | 151.6 MiB |  146.9 MiB |
> > >   4 | 463.9 MiB |  445.5 MiB |
> > >   5 |1663.9 MiB | 1563.3 MiB |
> > >
> > > The memory consumption is prominent at higher number of joins as that
> > > exponentially increases the number of child joins.
> > >
> > > Attached setup.sql and queries.sql and patch 0001 were used to measure
> > > memory similar to [1].
> > >
> > > I don't think it's a problem with the patch itself. We should be able
> > > to use Bitmapset APIs similar to what patch is doing. But there's a
> > > problem with our Bitmapset implementation. It's not space efficient
> > > for thousands of partitions. A quick calculation reveals this.
> > >
> > > If the number of partitions is 1000, the matching partitions will
> > > usually be 1000, 2000, 3000 and so on. Thus the bitmapset represnting
> > > the relids will be {b 1000, 2000, 3000, ...}. To represent a single
> > > 1000th digit current Bitmapset implementation will allocate 1000/8 =
> > > 125 bytes of memory. A 5 way join will require 125 * 5 = 625 bytes of
> > > memory. This is even true for lower relid numbers since they will be
> > > 1000 bits away e.g. (1, 1001, 2001, 3001, ...). So 1000 such child
> > > joins require 625KiB memory. Doing this as many times as the number of
> > > joins we get 120 * 625 KiB = 75 MiB which is closer to the memory
> > > difference we see above.
> > >
> > > Even if we allocate a list to hold 5 integers it will not take 625
> > > bytes. I think we need to improve our Bitmapset representation to be
> > > efficient in such cases. Of course whatever representation we choose
> > > has to be space efficient for a small number of tables as well and
> > > should gel well with our planner logic. So I guess some kind of
> > > dynamic representation which changes the underlying layout based on
> > > the contents is required. I have looked up past hacker threads to see
> > > if this has been discussed previously.
> > >
> > > I don't think this is the thread to discuss it and also I am not
> > > planning to work on it in the immediate future. But I thought I would
> > > mention it where I found it.
> > >
> > > [1] 
> > > https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com
> > >
> >
> > Adding this small patch to the commitfest in case somebody finds it
> > worth fixing this specific memory consumption. With a new subject.
>
> Rebased patches.
> 0001 - is same as the squashed version of patches at
> https://www.postgresql.org/message-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1csrjcmyyj8pd...@mail.gmail.com.
> 0002 is the actual fix described earlier
>
> --
> Best Wishes,
> Ashutosh Bapat



-- 
Best Wishes,
Ashutosh Bapat
From 84d59123ba6cec8717368ce9cefa290b711d7b3d Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 26 Jul 2023 12:08:55 +0530
Subject: [PATCH 2/2] Reuse child_relids bitmapset in partitionwise joinrels

Bitmapsets containing child relids consume large memory when thousands of
partitions are involved. Create them once, use multiple times and free them up
once their use is over.

Ashutosh Bapat
---
 src/backend/optimizer/path/joinrels.c | 10 ++
 src/backend/optimizer/util/relnode.c  | 18 +++---
 src/include/optimizer/pathnode.h  |  3 ++-
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index d03ace50a1..1feb1f99c0 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1526,6 +1526,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		RelOptIn

Re: Version 14/15 documentation Section "Alter Default Privileges"

2023-11-06 Thread Laurenz Albe
On Sat, 2023-11-04 at 21:14 -0400, Bruce Momjian wrote:
> > It is not the role that is modified.  Perhaps:
> > 
> >    [...]; if omitted, the current role is used.
> 
> Sure, attached.  Here is the issue I have though, we are really not
> changing default privileges for objects created in the future, we are
> changing the role _now_ so future objects will have different default
> privileges, right?  I think wording like the above is kind of odd.

I see what you mean.  The alternative is to be precise, at the risk of
repeating ourselves:

  if omitted, default privileges will be changed for objects created by
  the current role.

Yours,
Laurenz Albe




Re: pg_resetwal tests, logging, and docs update

2023-11-06 Thread Peter Eisentraut

On 01.11.23 12:12, Aleksander Alekseev wrote:

Hi,


Hmm.  I think maybe we should fix the behavior of
GetDataDirectoryCreatePerm() to be more consistent between Windows and
non-Windows.  This is usually the first function a program uses on the
proposed data directory, so it's also responsible for reporting if the
data directory does not exist.  But then on Windows, because the
function does nothing, those error scenarios end up on quite different
code paths, and I'm not sure if those are really checked that carefully.
   I think we can make this more robust if we have
GetDataDirectoryCreatePerm() still run the stat() call on the proposed
data directory and report the error.  See attached patch.


Yep, that would be much better.

Attaching all three patches together in order to make sure cfbot is
still happy with them while the `master` branch is evolving.

Assuming cfbot will have no complaints I suggest merging them.


Done, thanks.





Re: Version 14/15 documentation Section "Alter Default Privileges"

2023-11-06 Thread Michael Banck
Hi,

On Sat, Nov 04, 2023 at 09:14:01PM -0400, Bruce Momjian wrote:
> +   There is no way to change the default privileges for objects created by
> +   any role.  You have run ALTER DEFAULT PRIVILEGES for 
> all
> +   roles that can create objects whose default privileges should be modified.

That second sentence is broken, it should be "You have to run [...]" I
think.


Michael




Re: Add new option 'all' to pg_stat_reset_shared()

2023-11-06 Thread Bharath Rupireddy
On Mon, Nov 6, 2023 at 11:39 AM torikoshia  wrote:
>
> Thanks all for the comments!
>
> On Fri, Nov 3, 2023 at 5:17 AM Matthias van de Meent
>  wrote:
> > Knowing that your metrics have a shared starting point can be quite
> > valuable, as it allows you to do some math that would otherwise be
> > much less accurate when working with stats over a short amount of
> > time. I've not used these stats systems much myself, but skew between
> > metrics caused by different reset points can be difficult to detect
> > and debug, so I think an atomic call to reset all these stats could be
> > worth implementing.
>
> Since each stats, except wal_prefetch was reset acquiring LWLock,
> attached PoC patch makes the call atomic by using these LWlocks.
>
> If this is the right direction, I'll try to make wal_prefetch also take
> LWLock.

+// Acquire LWLocks
+LWLock *locks[] = {&stats_archiver->lock,  &stats_bgwriter->lock,
+   &stats_checkpointer->lock, &stats_wal->lock};
+
+for (int i = 0; i < STATS_SHARED_NUM_LWLOCK; i++)
+LWLockAcquire(locks[i], LW_EXCLUSIVE);
+
+for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+{
+LWLock*bktype_lock = &stats_io->locks[i];
+LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+}

Well, that's a total of ~17 LWLocks this new function takes to make
the stats reset atomic. I'm not sure if this atomicity is worth the
effort which can easily be misused - what if someone runs something
like SELECT pg_stat_reset_shared() FROM generate_series(1,
10n); to cause heavy lock acquisition and release cycles?

IMV, atomicity is not something that applies for the stats reset
operation because stats are approximate numbers by nature after all.
If the pg_stat_reset_shared() resets stats for only a bunch of stats
types and fails, it's the basic application programming style that
when a query fails it's the application that needs to have a retry
mechanism. FWIW, the atomicity doesn't apply today if someone wants to
reset stats in a loop for all stats types.

> On 2023-11-04 10:49, Andres Freund wrote:
>
> > Yes, agreed. However, I'd suggest adding pg_stat_reset_shared(),
> > instead of
> > pg_stat_reset_shared('all') for this purpose.
>
> In the attached PoC patch the shared statistics are reset by calling
> pg_stat_reset_shared() not pg_stat_reset_shared('all').

Some quick comments:

1.
+/*
+pg_stat_reset_shared_all(PG_FUNCTION_ARGS)
+{
+pgstat_reset_shared_all();
+PG_RETURN_VOID();
+}

IMO, simpler way is to move the existing code in
pg_stat_reset_shared() to a common internal function like
pgstat_reset_shared(char *target) and the pg_stat_reset_shared_all()
can just loop over all the stats targets.

2.
+{ oid => '8000',
+  descr => 'statistics: reset collected statistics shared across the cluster',
+  proname => 'pg_stat_reset_shared', provolatile => 'v', prorettype => 'void',
+  proargtypes => '', prosrc => 'pg_stat_reset_shared_all' },

Why a new function consuming the oid? Why can't we just do the trick
of proisstrict => 'f' and if (PG_ARGISNULL(0)) { reset all stats} else
{reset specified stats kind} like the pg_stat_reset_slru()?

3. I think the new reset all stats function must also consider
resetting all SLRU stats, no?
/* stats for fixed-numbered objects */
PGSTAT_KIND_ARCHIVER,
PGSTAT_KIND_BGWRITER,
PGSTAT_KIND_CHECKPOINTER,
PGSTAT_KIND_IO,
PGSTAT_KIND_SLRU,
PGSTAT_KIND_WAL,

4. I think the new reset all stats function must also consider
resetting recovery_prefetch.

5.
+If no argument is specified, reset all these views at once.
+Note current patch is WIP and pg_stat_recovery_prefetch is not reset.


How about "If the argument is NULL, all counters shown in all of these
views are reset."?

6. Add a test case to cover the code in stats.sql.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Synchronizing slots from primary to standby

2023-11-06 Thread Amit Kapila
On Mon, Nov 6, 2023 at 7:01 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, November 3, 2023 7:32 PM Amit Kapila 
> >
> > 5.
> > @@ -228,6 +230,28 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo
> > fcinfo, bool confirm, bool bin
> >   NameStr(MyReplicationSlot->data.plugin),
> >   format_procedure(fcinfo->flinfo->fn_oid;
> > ..
> > + if (XLogRecPtrIsInvalid(upto_lsn))
> > + wal_to_wait = end_of_wal;
> > + else
> > + wal_to_wait = Min(upto_lsn, end_of_wal);
> > +
> > + /* Initialize standby_slot_names_list */ SlotSyncInitConfig();
> > +
> > + /*
> > + * Wait for specified streaming replication standby servers (if any)
> > + * to confirm receipt of WAL upto wal_to_wait.
> > + */
> > + WalSndWaitForStandbyConfirmation(wal_to_wait);
> > +
> > + /*
> > + * The memory context used to allocate standby_slot_names_list will be
> > + * freed at the end of this call. So free and nullify the list in
> > + * order to avoid usage of freed list in the next call to this
> > + * function.
> > + */
> > + SlotSyncFreeConfig();
> >
> > What if there is an error in WalSndWaitForStandbyConfirmation() before 
> > calling
> > SlotSyncFreeConfig()? I think the problem you are trying to avoid by 
> > freeing it
> > here can occur. I think it is better to do this in a logical decoding 
> > context and
> > free the list along with it as we are doing in commit c7256e6564(see PG15).
>
> I will analyze more about this case and update in next version.
>

Okay, thanks for considering it.

> > Also,
> > it is better to allocate this list somewhere in
> > WalSndWaitForStandbyConfirmation(), probably in WalSndGetStandbySlots,
> > that will make the code look neat and also avoid allocating this list when
> > failover is not enabled for the slot.
>
> Changed as suggested.
>

After doing this, do we need to call SlotSyncInitConfig() from other
places as below?

+ SlotSyncInitConfig();
+ WalSndGetStandbySlots(&standby_slot_cpy, false);

Can we entirely get rid of calling SlotSyncInitConfig() from all
places except WalSndGetStandbySlots()? Also, after that or otherwise,
the comments atop also need modification.

-- 
With Regards,
Amit Kapila.




  1   2   >