Re: libpqrcv_connect() leaks PGconn

2023-01-21 Thread Noah Misch
On Sat, Jan 21, 2023 at 12:04:53PM -0800, Andres Freund wrote:
> On 2023-01-21 08:16:42 -0800, Noah Misch wrote:
> > On Fri, Jan 20, 2023 at 06:50:37PM -0800, Andres Freund wrote:
> > > I seems we don't have any tests for creating a subscription that fails 
> > > during
> > > connection establishment? That doesn't seem optimal - I guess there may 
> > > have
> > > been concern around portability of the error messages?
> > 
> > Perhaps.  We have various (non-subscription) tests using "\set VERBOSITY
> > sqlstate" for that problem.  If even the sqlstate varies, a DO block is the
> > next level of error swallowing.
> 
> That's a good trick I need to remember. And the errcode for an invalid
> connection string luckily differs from the one for a not working one.
> 
> 
> I think found an even easier way - port=-1 is rejected during PQconnectPoll()
> and will never even open a socket. That'd make it reasonable for the test to
> happen in subscription.sql, instead of a tap test, I think (faster, easier to
> maintain). It may be that we'll one day move that error into the
> PQconninfoParse() phase, but I don't think we need to worry about it now.
> 
> Any reason not to go for that?

No, a port=-1 test in subscription.sql sounds ideal.

> If not, I'll add a test for an invalid conninfo and a non-working connection
> string to subscription.sql.




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Andres Freund
Hi,

On 2023-01-21 23:21:22 -0500, Bruce Momjian wrote:
> On Sat, Jan 21, 2023 at 08:08:34PM -0800, Peter Geoghegan wrote:
> > On Sat, Jan 21, 2023 at 2:05 PM Peter Geoghegan  wrote:
> > > There is one thing about clang-format that I find mildly infuriating:
> > > it can indent function declarations in the way that I want it to, and
> > > it can indent variable declarations in the way that I want it to. It
> > > just can't do both at the same time, because they're both controlled
> > > by AlignConsecutiveDeclarations.
> > 
> > Looks like I'm not the only one that doesn't like this behavior:
> > 
> > https://github.com/llvm/llvm-project/issues/55605
> 
> Wow, that is very weird.  When I work with other open source projects, I
> am regularly surprised by their low quality requirements.

I don't think not fulfulling precisely our needs is the same as low quality
requirements.

Greetings,

Andres Freund




Re: Remove source code display from \df+?

2023-01-21 Thread Justin Pryzby
On Sun, Jan 22, 2023 at 12:18:34AM -0500, Isaac Morland wrote:
> On Thu, 19 Jan 2023 at 13:02, Isaac Morland  wrote:
> > On Thu, 19 Jan 2023 at 11:30, Justin Pryzby  wrote:
> >> On Wed, Jan 18, 2023 at 10:27:46AM -0500, Isaac Morland wrote:
> >> >
> >> > I thought I had: https://commitfest.postgresql.org/42/4133/
> >>
> >> This is failing tests:
> >> http://cfbot.cputube.org/isaac-morland.html
> >>
> >> It seems like any "make check" would fail .. but did you also try
> >> cirrusci from your own github account?
> >> ./src/tools/ci/README
> >
> > I definitely ran "make check" but I did not realize there is also
> > cirrusci. I will look at that. I'm having trouble interpreting the cfbot
> > page to which you linked but I'll try to run cirrusci myself before
> > worrying too much about that.
> >
> > Running "make check" the first time I was surprised to see no failures -
> > so I added tests for \df+, which passed when I did "make check".
> >
> It turns out that my tests wanted the owner to be “vagrant” rather than
> “postgres”. This is apparently because I was running as that user (in a
> Vagrant VM) when running the tests. Then I took that output and just made
> it the expected output. I’ve re-worked my build environment a bit so that I
> run as “postgres” inside the Vagrant VM.
> 
> What I don’t understand is why that didn’t break all the other tests.

Probably because the other tests avoid showing the owner, exactly
because it varies depending on who runs the tests.  Most of the "plus"
commands aren't tested, at least in the sql regression tests.

We should probably change one of the CI usernames to something other
than "postgres" to catch the case that someone hardcodes "postgres".

> proper value in the Owner column so let’s see what CI does with it.

Or better: see above about using it from your github account.

-- 
Justin




Re: Remove source code display from \df+?

2023-01-21 Thread Tom Lane
Isaac Morland  writes:
> It turns out that my tests wanted the owner to be “vagrant” rather than
> “postgres”. This is apparently because I was running as that user (in a
> Vagrant VM) when running the tests. Then I took that output and just made
> it the expected output. I’ve re-worked my build environment a bit so that I
> run as “postgres” inside the Vagrant VM.

Nope, that is not going to get past the buildfarm (hint: a lot of the
BF animals run under "buildfarm" or some similar username).  You have
to make sure that your tests do not care what the name of the bootstrap
superuser is.

> What I don’t understand is why that didn’t break all the other tests.

Because all the committed tests are independent of that.

regards, tom lane




Re: Remove source code display from \df+?

2023-01-21 Thread Isaac Morland
On Thu, 19 Jan 2023 at 13:02, Isaac Morland  wrote:

> On Thu, 19 Jan 2023 at 11:30, Justin Pryzby  wrote:
>
>> On Wed, Jan 18, 2023 at 10:27:46AM -0500, Isaac Morland wrote:
>> >
>> > I thought I had: https://commitfest.postgresql.org/42/4133/
>>
>> This is failing tests:
>> http://cfbot.cputube.org/isaac-morland.html
>>
>> It seems like any "make check" would fail .. but did you also try
>> cirrusci from your own github account?
>> ./src/tools/ci/README
>>
>
> I definitely ran "make check" but I did not realize there is also
> cirrusci. I will look at that. I'm having trouble interpreting the cfbot
> page to which you linked but I'll try to run cirrusci myself before
> worrying too much about that.
>
> Running "make check" the first time I was surprised to see no failures -
> so I added tests for \df+, which passed when I did "make check".
>
>>
It turns out that my tests wanted the owner to be “vagrant” rather than
“postgres”. This is apparently because I was running as that user (in a
Vagrant VM) when running the tests. Then I took that output and just made
it the expected output. I’ve re-worked my build environment a bit so that I
run as “postgres” inside the Vagrant VM.

What I don’t understand is why that didn’t break all the other tests. I
would have expected “postgres” to show up all over the expected results and
be changed to “vagrant” by what I did; so I should have seen all sorts of
test failures in the existing tests. Anyway, my new tests now have the
proper value in the Owner column so let’s see what CI does with it.

BTW, I think "ELSE NULL" could be omitted.
>>
>
> Looks like; I'll update. Might as well reduce the visual size of the code.
>

I did this. I’m ambivalent about this because I usually think of CASE and
similar constructs as needing to explicitly cover all possible cases but
the language does provide for the NULL default case so may as well use the
feature where applicable.


0001-Remove-source-code-display-from-df-v3.patch
Description: Binary data


Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Bruce Momjian
On Sat, Jan 21, 2023 at 08:08:34PM -0800, Peter Geoghegan wrote:
> On Sat, Jan 21, 2023 at 2:05 PM Peter Geoghegan  wrote:
> > There is one thing about clang-format that I find mildly infuriating:
> > it can indent function declarations in the way that I want it to, and
> > it can indent variable declarations in the way that I want it to. It
> > just can't do both at the same time, because they're both controlled
> > by AlignConsecutiveDeclarations.
> 
> Looks like I'm not the only one that doesn't like this behavior:
> 
> https://github.com/llvm/llvm-project/issues/55605

Wow, that is very weird.  When I work with other open source projects, I
am regularly surprised by their low quality requirements.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Peter Geoghegan
On Sat, Jan 21, 2023 at 2:05 PM Peter Geoghegan  wrote:
> There is one thing about clang-format that I find mildly infuriating:
> it can indent function declarations in the way that I want it to, and
> it can indent variable declarations in the way that I want it to. It
> just can't do both at the same time, because they're both controlled
> by AlignConsecutiveDeclarations.

Looks like I'm not the only one that doesn't like this behavior:

https://github.com/llvm/llvm-project/issues/55605

-- 
Peter Geoghegan




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-21 Thread Peter Geoghegan
On Sat, Jan 21, 2023 at 6:54 PM Andres Freund  wrote:
> Is 
> https://www.postgresql.org/message-id/CAH2-WzmytCuSpaMEhv8H-jt8x_9whTi0T5bjNbH2gvaR0an2Pw%40mail.gmail.com
> the last / relevant version of the patch to look at?

Yes. I'm mostly just asking about v5-0001-* right now.

-- 
Peter Geoghegan




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Peter Geoghegan
On Sat, Jan 21, 2023 at 3:39 PM Jelte Fennema  wrote:
> I was unable to get either of them to produce the weird alignment of
> declarations that pgindent outputs. Maybe that's just because I don't
> understand what this alignment is supposed to do. Because to me the
> current alignment seems completely random most of the time (and like
> Andres said also not very unhelpful). For clang-format you should use
> at least clang-format 15, otherwise it has some bugs in the alignment
> logic.

Really? I have been using 14, which is quite recent. Did you just
figure this out recently? If this is true, then it's certainly
discouraging.

I don't have a problem with the current pgindent alignment of function
parameters, so not sure what you mean about that. It *was* terrible
prior to commit e3860ffa, but that was back in 2017 (pg_bsd_indent 2.0
fixed that problem).

-- 
Peter Geoghegan




Re: Parallelize correlated subqueries that execute within each worker

2023-01-21 Thread James Coleman
On Wed, Jan 18, 2023 at 9:34 PM James Coleman  wrote:
>
> On Wed, Jan 18, 2023 at 2:09 PM Tomas Vondra
>  wrote:
> >
> > Hi,
> >
> > This patch hasn't been updated since September, and it got broken by
> > 4a29eabd1d91c5484426bc5836e0a7143b064f5a which the incremental sort
> > stuff a little bit. But the breakage was rather limited, so I took a
> > stab at fixing it - attached is the result, hopefully correct.
>
> Thanks for fixing this up; the changes look correct to me.
>
> > I also added a couple minor comments about stuff I noticed while
> > rebasing and skimming the patch, I kept those in separate commits.
> > There's also a couple pre-existing TODOs.
>
> I started work on some of these, but wasn't able to finish this
> evening, so I don't have an updated series yet.
>
> > James, what's your plan with this patch. Do you intend to work on it for
> > PG16, or are there some issues I missed in the thread?
>
> I'd love to see it get into PG16. I don't have any known issues, but
> reviewing activity has been light. Originally Robert had had some
> concerns about my original approach; I think my updated approach
> resolves those issues, but it'd be good to have that sign-off.
>
> Beyond that I'm mostly looking for review and evaluation of the
> approach I've taken; of note is my description of that in [1].

Here's an updated patch version incorporating feedback from Tomas as
well as some additional comments and tweaks.

While working through Tomas's comment about a conditional in the
max_parallel_hazard_waker being guaranteed true I realized that in the
current version of the patch the safe_param_ids tracking in
is_parallel_safe isn't actually needed any longer. That seemed
suspicious, and so I started digging, and I found out that in the
current approach all of the tests pass with only the changes in
clauses.c. I don't believe that the other changes aren't needed;
rather I believe there isn't yet a test case exercising them, but I
realize that means I can't prove they're needed. I spent some time
poking at this, but at least with my current level of imagination I
haven't stumbled across a query that would exercise these checks. One
of the reasons I'm fairly confident that this is true is that the
original approach (which was significantly more invasive) definitely
required rechecking parallel safety at each level until we reached the
point where the subquery was known to be generated within the current
worker through the safe_param_ids tracking mechanism. Of course it is
possible that that complexity is actually required and this simplified
approach isn't feasible (but I don't have a good reason to suspect
that currently). It's also possible that the restrictions on
subqueries just aren't necessary...but that isn't compelling because
it would require proving that you can never have a query level with
as-yet unsatisfied lateral rels.

Note: All of the existing tests for "you can't parallelize a
correlated subquery" are all simple versions which are not actually
parallel unsafe in theory. I assume they were added to show that the
code excluded that broad case, and there wasn't any finer grain of
detail required since the code simply didn't support making the
decision with that granularity anyway. But that means there weren't
any existing test cases to exercise the granularity I'm now trying to
achieve.

If someone is willing to help out what I'd like help with currently is
finding such a test case (where a gather or gather merge path would
otherwise be created but at the current plan level not all of the
required lateral rels are yet available -- meaning that we can't
perform all of the subqueries within the current worker). In support
of that patch 0004 converts several of the new parallel safety checks
into WARNING messages instead to make it easy to see if a query
happens to encounter any of those checks.

James Coleman


v7-0004-Warning-messages-for-finding-test-cases.patch
Description: Binary data


v7-0001-Add-tests-before-change.patch
Description: Binary data


v7-0003-Possible-additional-checks.patch
Description: Binary data


v7-0002-Parallelize-correlated-subqueries.patch
Description: Binary data


Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-21 Thread Andres Freund
Hi,

On 2023-01-21 18:37:59 -0800, Peter Geoghegan wrote:
> On Tue, Jan 17, 2023 at 10:02 AM Andres Freund  wrote:
> > I think with a bit of polish "Add autovacuum trigger instrumentation." ought
> > to be quickly mergeable.
> 
> Any thoughts on v5-0001-*?
> 
> It would be nice to get the uncontentious part of all this (which is
> the instrumentation patch) out of the way soon.

Is 
https://www.postgresql.org/message-id/CAH2-WzmytCuSpaMEhv8H-jt8x_9whTi0T5bjNbH2gvaR0an2Pw%40mail.gmail.com
the last / relevant version of the patch to look at?

Greetings,

Andres Freund




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-21 Thread Peter Geoghegan
On Tue, Jan 17, 2023 at 10:02 AM Andres Freund  wrote:
> I think with a bit of polish "Add autovacuum trigger instrumentation." ought
> to be quickly mergeable.

Any thoughts on v5-0001-*?

It would be nice to get the uncontentious part of all this (which is
the instrumentation patch) out of the way soon.

-- 
Peter Geoghegan




GUCs to control abbreviated sort keys

2023-01-21 Thread Jeff Davis
The attached patch adds GUCs to control the use of the abbreviated keys
optimization when sorting. Also, I changed the TRUST_STRXFRM from a
#define into a GUC.

One reason for these GUCs is to make it easier to diagnose any issues
that come up with my collation work. Another is that I observed cases
with ICU where the abbreviated keys optimization resulted in a ~8-10%
regression, and it would be good to have some basic control over it.

I made them developer options because they are more about diagnosing
and I don't expect users to change these in production. If the issues
with abbreviated keys get more complex (where maybe we need to consider
costing each provider?), we can make it more user-facing.

This is fairly simple, so I plan to commit soon.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 7699f28634f04772c718ac465bbbff48b849f2bc Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Sat, 21 Jan 2023 12:44:07 -0800
Subject: [PATCH v1] Introduce GUCs to control abbreviated keys sort
 optimization.

The setting sort_abbreviated_keys turns the optimization on or off
overall. The optimization relies on collation providers, which are
complex dependencies, and the performance of the optimization may rely
on many factors. Introducing a GUC allows easier diagnosis when this
optimization results in worse perforamnce.

The setting trust_strxfrm replaces the define TRUST_STRXFRM, allowing
users to experiment with the abbreviated keys optimization when using
the libc provider. Previously, the optimization only applied to
collations using the ICU provider unless specially compiled. By
default, allowed only for superusers (because an incorrect setting
could lead to wrong results), but can be granted to others.
---
 doc/src/sgml/config.sgml   | 40 ++
 src/backend/utils/adt/varlena.c| 10 +++---
 src/backend/utils/misc/guc_tables.c| 24 +
 src/backend/utils/sort/tuplesortvariants.c | 17 ++---
 4 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index dc9b78b0b7..47ee3ec6e7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11248,6 +11248,46 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  sort_abbreviated_keys (boolean)
+  
+   sort_abbreviated_keys configuration parameter
+  
+  
+  
+   
+Enables or disables the use of abbreviated sort keys, an optimization,
+if applicable. The default is true. Disabling may
+be useful to diagnose problems or measure performance.
+   
+  
+ 
+
+ 
+  trust_strxfrm (boolean)
+  
+   trust_strxfrm configuration parameter
+  
+  
+  
+   
+Abbreviated keys, a sort optimization, depends on correct behavior of
+the operating system function strxfrm() when
+using a collation with the libc provider. On some
+platforms strxfrm() does not return results
+consistent with strcoll(), which means the
+optimization could return wrong results. Set to
+true if certain that strxfrm()
+can be trusted.
+   
+   
+The default value is false. This setting has no
+effect if  is set to
+false.
+   
+  
+ 
+
  
   trace_locks (boolean)
   
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 33ffdb013a..8b5b467205 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -45,6 +45,9 @@
 /* GUC variable */
 int			bytea_output = BYTEA_OUTPUT_HEX;
 
+/* GUC to enable use of strxfrm() for abbreviated keys */
+bool		trust_strxfrm = false;
+
 typedef struct varlena unknown;
 typedef struct varlena VarString;
 
@@ -2092,7 +2095,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	 * other libc other than Cygwin has so far been shown to have a problem,
 	 * we take the conservative course of action for right now and disable
 	 * this categorically.  (Users who are certain this isn't a problem on
-	 * their system can define TRUST_STRXFRM.)
+	 * their system can set the trust_strxfrm GUC to true.)
 	 *
 	 * Even apart from the risk of broken locales, it's possible that there
 	 * are platforms where the use of abbreviated keys should be disabled at
@@ -2105,10 +2108,9 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	 * categorically, we may still want or need to disable it for particular
 	 * platforms.
 	 */
-#ifndef TRUST_STRXFRM
-	if (!collate_c && !(locale && locale->provider == COLLPROVIDER_ICU))
+	if (!trust_strxfrm && !collate_c &&
+		!(locale && locale->provider == COLLPROVIDER_ICU))
 		abbreviate = false;
-#endif
 
 	/*
 	 * If we're using abbreviated keys, or if we're using a locale-aware
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 

Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Jelte Fennema
Attached are a .clang-format file and an uncrustify.cfg file that I
whipped up to match the current style at least somewhat.

I was unable to get either of them to produce the weird alignment of
declarations that pgindent outputs. Maybe that's just because I don't
understand what this alignment is supposed to do. Because to me the
current alignment seems completely random most of the time (and like
Andres said also not very unhelpful). For clang-format you should use
at least clang-format 15, otherwise it has some bugs in the alignment
logic.

One thing that clang-format really really wants to change in the
codebase, is making it consistent on what to do when
arguments/parameters don't fit on a single line: You can either choose
to minimally pack everything on the minimum number of lines, or to put
all arguments on their own separate line. Uncrustify is a lot less
strict about that and will leave most things as they currently are.

Overall it seems that uncrustify is a lot more customizable, whether
that's a good thing is debatable. Personally I think the fewer
possible debates I have about codestyle the better, so I usually like
the tools with fewer options better. But if the customizability allows
for closer matching of existing codestyle then it might be worth the
extra debates and effort in customization in this case.

> 1) revert the re-indent commits in $backbranch
> 2) merge $backbranch-with-revert into $forkbranch
> 3) re-indent $forkbranch

The git commands to achieve this are something like the following.
I've used such git commands in the past to make big automatic
refactors much easier to get in and it has worked quite well so far.

git checkout forkbranch
git rebase {commit-before-re-indent}
# now, get many merge conflicts
git rebase {re-indent-commit}
# keep your own changes (its --theirs instead of --ours because rebase
flips it around)
git checkout --theirs .
run-new-reindent
git add .
git rebase --continue

On Sat, 21 Jan 2023 at 23:47, Andres Freund  wrote:
>
> Hi,
>
> On 2023-01-21 17:20:35 -0500, Tom Lane wrote:
> > I don't feel wedded to every last detail of what pgindent does (and
> > especially not the bugs).  But I think if the new tool is not a pretty
> > close match we'll be in for years of back-patching pain.  We have made
> > changes in pgindent itself in the past, and the patching consequences
> > weren't *too* awful, but the changes weren't very big either.
>
> Perhaps we could backpatch formatting changes in a way that doesn't
> inconvenience forks too much. One way to deal with such changes is to
>
> 1) revert the re-indent commits in $backbranch
> 2) merge $backbranch-with-revert into $forkbranch
> 3) re-indent $forkbranch
>
> After that future changes should be mergable again.
>
> Certainly doesn't do away with the pain entirely, but it does make it perhaps
> bearable
>
> Greetings,
>
> Andres Freund


uncrustify.cfg
Description: Binary data


.clang-format
Description: Binary data


Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Peter Geoghegan
On Sat, Jan 21, 2023 at 2:43 PM Andres Freund  wrote:
> Unless I miss something, I don't think clang-format actually does that level
> of C parsing - you can't pass include paths etc, so it really can't.

It's hard to keep track of, since I also use clangd, which is
influenced by .clang-format for certain completions. It clearly does
plenty of stuff that requires an AST, since it requires a
compile_commands.json. You're the LLVM committer, not me.

Attached is my .clang-format, since you asked for it. It was
originally based on stuff that both you and Peter E posted several
years back, I believe. Plus the timescaledb one in one or two places.
I worked a couple of things out through trial and error. It's
relatively hard to follow the documentation, and there have been
features added to newer LLVM versions.

-- 
Peter Geoghegan


clang-format
Description: Binary data


Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Andres Freund
Hi,

On 2023-01-21 17:20:35 -0500, Tom Lane wrote:
> I don't feel wedded to every last detail of what pgindent does (and
> especially not the bugs).  But I think if the new tool is not a pretty
> close match we'll be in for years of back-patching pain.  We have made
> changes in pgindent itself in the past, and the patching consequences
> weren't *too* awful, but the changes weren't very big either.

Perhaps we could backpatch formatting changes in a way that doesn't
inconvenience forks too much. One way to deal with such changes is to

1) revert the re-indent commits in $backbranch
2) merge $backbranch-with-revert into $forkbranch
3) re-indent $forkbranch

After that future changes should be mergable again.

Certainly doesn't do away with the pain entirely, but it does make it perhaps
bearable

Greetings,

Andres Freund




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Andres Freund
Hi,

On 2023-01-21 14:05:41 -0800, Peter Geoghegan wrote:
> On Sat, Jan 21, 2023 at 12:59 PM Tom Lane  wrote:
> > Hmm, that could be a deal-breaker.  It's not going to be acceptable
> > to have to pgindent different parts of the system on different platforms
> > ... at least not unless we can segregate them on the file level, and
> > even that would have a large PITA factor.

Unless I miss something, I don't think clang-format actually does that level
of C parsing - you can't pass include paths etc, so it really can't.


> It's probably something that could be worked around. My remarks are
> based on some dim memories of dealing with the tool before I arrived
> at a configuration that works well enough for me.

Could you share your .clang-format?



> > Still, we won't know unless someone makes a serious experiment with it.
> 
> There is one thing about clang-format that I find mildly infuriating:
> it can indent function declarations in the way that I want it to, and
> it can indent variable declarations in the way that I want it to. It
> just can't do both at the same time, because they're both controlled
> by AlignConsecutiveDeclarations.
> 
> Of course the way that I want to do things is (almost by definition)
> the pgindent way, at least right now -- it's not necessarily about my
> fixed preferences (though it can be hard to tell!). It's really not
> surprising that clang-format cannot quite perfectly simulate pgindent.
> How flexible can we be about stuff like that? Obviously there is no
> clear answer right now.

I personally find the current indentation of variables assignment deeply
unhelpful - but changing it would be a very noisy change.


Greetings,

Andres Freund




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Tom Lane
Peter Geoghegan  writes:
> Of course the way that I want to do things is (almost by definition)
> the pgindent way, at least right now -- it's not necessarily about my
> fixed preferences (though it can be hard to tell!). It's really not
> surprising that clang-format cannot quite perfectly simulate pgindent.
> How flexible can we be about stuff like that? Obviously there is no
> clear answer right now.

I don't feel wedded to every last detail of what pgindent does (and
especially not the bugs).  But I think if the new tool is not a pretty
close match we'll be in for years of back-patching pain.  We have made
changes in pgindent itself in the past, and the patching consequences
weren't *too* awful, but the changes weren't very big either.

As I said upthread, this is really impossible to answer without a
concrete proposal of how to configure clang-format and a survey of
what diffs we'd wind up with.

regards, tom lane




Re: Non-superuser subscription owners

2023-01-21 Thread Andres Freund
Hi,

On 2023-01-20 11:08:54 -0500, Robert Haas wrote:
>  /*
> - * Validate connection info string (just try to parse it)
> + * Validate connection info string, and determine whether it might cause
> + * local filesystem access to be attempted.
> + *
> + * If the connection string can't be parsed, this function will raise
> + * an error and will not return. If it can, it will return true if local
> + * filesystem access may be attempted and false otherwise.
>   */
> -static void
> +static bool
>  libpqrcv_check_conninfo(const char *conninfo)
>  {
>   PQconninfoOption *opts = NULL;
> + PQconninfoOption *opt;
>   char   *err = NULL;
> + boolresult = false;
>  
>   opts = PQconninfoParse(conninfo, );
>   if (opts == NULL)
> @@ -267,7 +274,40 @@ libpqrcv_check_conninfo(const char *conninfo)
>errmsg("invalid connection string syntax: %s", 
> errcopy)));
>   }
>  
> + for (opt = opts; opt->keyword != NULL; ++opt)
> + {
> + /* Ignore connection options that are not present. */
> + if (opt->val == NULL)
> + continue;
> +
> + /* For all these parameters, the value is a local filename. */
> + if (strcmp(opt->keyword, "passfile") == 0 ||
> + strcmp(opt->keyword, "sslcert") == 0 ||
> + strcmp(opt->keyword, "sslkey") == 0 ||
> + strcmp(opt->keyword, "sslrootcert") == 0 ||
> + strcmp(opt->keyword, "sslcrl") == 0 ||
> + strcmp(opt->keyword, "sslcrldir") == 0 ||
> + strcmp(opt->keyword, "service") == 0)
> + {
> + result = true;
> + break;
> + }

Do we need to think about 'options' allowing anything bad? I don't
immediately* see a problem, but ...


> +
> + /*
> +  * For the host parameter, the value might be a local filename.
> +  * It might also be a reference to the local host's abstract 
> UNIX
> +  * socket namespace, which we consider equivalent to a local 
> pathname
> +  * for security purporses.
> +  */
> + if (strcmp(opt->keyword, "host") == 0 && 
> is_unixsock_path(opt->val))
> + {
> + result = true;
> + break;
> + }
> + }

Hm, what about kerberos / gss / SSPI? Aren't those essentially also tied to
the local filesystem / user?

Greetings,

Andres Freund




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Peter Geoghegan
On Sat, Jan 21, 2023 at 12:59 PM Tom Lane  wrote:
> Hmm, that could be a deal-breaker.  It's not going to be acceptable
> to have to pgindent different parts of the system on different platforms
> ... at least not unless we can segregate them on the file level, and
> even that would have a large PITA factor.

It's probably something that could be worked around. My remarks are
based on some dim memories of dealing with the tool before I arrived
at a configuration that works well enough for me. Importantly,
clang-format doesn't require you to futz around with Makefiles or
objdump or anything like that -- that's a huge plus. It doesn't seem
to impose any requirements on how I build Postgres at all (I generally
use gcc, not clang).

Even if these kinds of issues proved to be a problem for the person
tasked with running clang-format against the whole tree periodically,
they still likely wouldn't affect most of us. It's quite convenient to
use clang-format from an editor -- it can be invoked very
incrementally, against a small range of lines at a time. It's pretty
much something that I can treat like the built-in indent for my
editor. It's vastly different to the typical pgindent workflow.

> Still, we won't know unless someone makes a serious experiment with it.

There is one thing about clang-format that I find mildly infuriating:
it can indent function declarations in the way that I want it to, and
it can indent variable declarations in the way that I want it to. It
just can't do both at the same time, because they're both controlled
by AlignConsecutiveDeclarations.

Of course the way that I want to do things is (almost by definition)
the pgindent way, at least right now -- it's not necessarily about my
fixed preferences (though it can be hard to tell!). It's really not
surprising that clang-format cannot quite perfectly simulate pgindent.
How flexible can we be about stuff like that? Obviously there is no
clear answer right now.

-- 
Peter Geoghegan




Re: Non-superuser subscription owners

2023-01-21 Thread Andres Freund
Hi,

On 2023-01-20 08:25:46 -0500, Robert Haas wrote:
> Worse still, I have always felt that the security vulnerability that
> led to these controls being installed is pretty much fabricated: it's
> an imaginary problem. Today I went back and found the original CVE at
> https://nvd.nist.gov/vuln/detail/CVE-2007-3278 and it seems that at
> least one other person agrees. The Red Hat vendor statement on that
> page says: "Red Hat does not consider this do be a security issue.
> dblink is disabled in default configuration of PostgreSQL packages as
> shipped with Red Hat Enterprise Linux versions 2.1, 3, 4 and 5, and it
> is a configuration decision whether to grant local users arbitrary
> access." I think whoever wrote that has an excellent point. I'm unable
> to discern any legitimate security purpose for this restriction. What
> I think it mostly does is (a) inconvenience users or (b) force them to
> rely on a less-secure authentication method than they would otherwise
> have chosen.

FWIW, I've certainly seen situations where having the checks prevented easy
paths to privilege escalations. That's not to say that I like the checks, but
I also don't think we can get away without them (or a better replacement, of
course).

There are good reasons to have 'peer' authentication set up for the user
running postgres, so admin scripts can connect without issues. Which
unfortunately then also means that postgres_fdw etc can connect to the current
database as superuser, without that check. Which imo clearly is an issue.

Why do you think this is a fabricated issue?


The solution we have is quite bad, of course. Just because the user isn't a
superuser "immediately" doesn't mean it doesn't have the rights to become
one somehow.


> > > The basic idea that by looking at which connection string properties are 
> > > set
> > > we can tell what kinds of things the connection string is going to do 
> > > seems
> > > sound to me.
> >
> > I don't think you *can* check it purely based on existing connection string
> > properties, unfortunately. Think of e.g. a pg_hba.conf line of "local all 
> > user
> > peer" (quite reasonable config) or "host all all 127.0.0.1/32 trust" (less 
> > so).
> >
> > Hence the hack with dblink_security_check().
> >
> > I think there might be a discussion somewhere about adding an option to 
> > force
> > libpq to not use certain auth methods, e.g. plaintext password/md5. It's
> > possible this could be integrated.
> 
> I still think you're talking about a different problem here. I'm
> talking about the problem of knowing whether local files are going to
> be accessed by the connection string.

Why is this only about local files, rather than e.g. also using the local
user?

Greetings,

Andres Freund




isolation test for postgres_fdw interruptability

2023-01-21 Thread Andres Freund
Hi,

Due to [1] I thought it'd be a good idea to write an isolation test for
testing postgres_fdw interruptability during connection establishment.

I was able to make that work - but unfortunately doing so requires preventing
a login from completing. The only way I could see to achieve that is to lock
one of the important tables. I ended up with
  step s2_hang_logins { LOCK pg_db_role_setting; }

However, I'm a bit worried that that might cause problems. It'll certainly
block progress in concurrent tests, given it's a shared relation. But locking
relevant non-shared relations causes more problems, because it'll e.g. prevent
querying pg_stat_activity.

Does anybody see another way to cause a login to hang as part of an isolation
test, in a controllable manner?  Or, if not, do you think we can get away with
locking pg_db_role_setting?


The other complexity is that isolationtester won't see the wait edge going
through postgres_fdw. My approach for that is to do that one wait in a DO
block loop, matching on application_name = 'isolation/interrupt/s1'.

I don't think we can teach isolationtester to understand such edges. I guess
we could teach it to wait for certain wait events though? But I'm not sure how
generally useful that is. IIRC Tom concluded in the past that it didn't get us
very far.


The test currently tests one termination case, because isolationtester will
just fail the next permutation if a connection is gone. I don't see an issue
fixing that?


I attached my current WIP patch for the test.


Note that the test will only work with the patches from [1] applied.


Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de
>From e188d6dc6c5bc0dea5ff33692b0f44a607f8e855 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 21 Jan 2023 13:27:40 -0800
Subject: [PATCH v4 6/6] wip: test that postgres_fdw can be interrupted

---
 contrib/postgres_fdw/Makefile   |  1 +
 contrib/postgres_fdw/expected/interrupt.out | 95 +
 contrib/postgres_fdw/meson.build|  5 ++
 contrib/postgres_fdw/specs/interrupt.spec   | 68 +++
 4 files changed, 169 insertions(+)
 create mode 100644 contrib/postgres_fdw/expected/interrupt.out
 create mode 100644 contrib/postgres_fdw/specs/interrupt.spec

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index c1b0cad453f..eb141442dee 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -17,6 +17,7 @@ EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql
 
 REGRESS = postgres_fdw
+ISOLATION = interrupt
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/expected/interrupt.out b/contrib/postgres_fdw/expected/interrupt.out
new file mode 100644
index 000..fe32e8ea27b
--- /dev/null
+++ b/contrib/postgres_fdw/expected/interrupt.out
@@ -0,0 +1,95 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2_begin s2_hang_logins s1_select_remote s3_cancel_s1 s2_commit
+step s2_begin: BEGIN;
+step s2_hang_logins: LOCK pg_db_role_setting;
+step s1_select_remote: SELECT count(*) FROM remote_t1; 
+step s3_cancel_s1: 
+  SELECT wait_for_s1();
+  SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND application_name = 'isolation/interrupt/s1';
+
+wait_for_s1
+---
+   
+(1 row)
+
+pg_cancel_backend
+-
+t
+(1 row)
+
+step s1_select_remote: <... completed>
+ERROR:  canceling statement due to user request
+step s2_commit: COMMIT;
+
+starting permutation: s2_begin s2_hang_logins s1_select_remote s2_commit
+step s2_begin: BEGIN;
+step s2_hang_logins: LOCK pg_db_role_setting;
+step s1_select_remote: SELECT count(*) FROM remote_t1; 
+step s2_commit: COMMIT;
+step s1_select_remote: <... completed>
+count
+-
+0
+(1 row)
+
+
+starting permutation: s2_begin s2_hang_select s1_select_remote s3_cancel_s1 s2_commit
+step s2_begin: BEGIN;
+step s2_hang_select: LOCK local_t1;
+step s1_select_remote: SELECT count(*) FROM remote_t1; 
+step s3_cancel_s1: 
+  SELECT wait_for_s1();
+  SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND application_name = 'isolation/interrupt/s1';
+
+wait_for_s1
+---
+   
+(1 row)
+
+pg_cancel_backend
+-
+t
+(1 row)
+
+step s1_select_remote: <... completed>
+ERROR:  canceling statement due to user request
+step s2_commit: COMMIT;
+
+starting permutation: s2_begin s2_hang_select s1_select_remote s2_commit
+step s2_begin: BEGIN;
+step s2_hang_select: LOCK local_t1;
+step s1_select_remote: SELECT count(*) FROM remote_t1; 
+step s2_commit: COMMIT;
+step s1_select_remote: <... completed>
+count
+-
+0
+(1 row)
+
+
+starting permutation: s2_begin s2_hang_logins s1_select_remote s3_terminate_s1 s2_commit
+step s2_begin: BEGIN;
+step s2_hang_logins: LOCK pg_db_role_setting;

Re: Implement missing join selectivity estimation for range types

2023-01-21 Thread Tomas Vondra
Hi Mahmoud,

I finally had time to properly read the paper today - the general
approach mostly matches how I imagined the estimation would work for
inequalities, but it's definitely nice to see the algorithm properly
formalized and analyzed.

What seems a bit strange to me is that the patch only deals with range
types, leaving the scalar cases unchanged. I understand why (not having
a MCV simplifies it a lot), but I'd bet joins on range types are wy
less common than inequality joins on scalar types. I don't even remember
seeing inequality join on a range column, TBH.

That doesn't mean the patch is wrong, of course. But I'd expect users to
be surprised we handle range types better than "old" scalar types (which
range types build on, in some sense).

Did you have any plans to work on improving estimates for the scalar
case too? Or did you do the patch needed for the paper, and have no
plans to continue working on this?

I'm also wondering about not having MCV for ranges. I was a bit
surprised we don't build MCV in compute_range_stats(), and perhaps we
should start building those - if there are common ranges, this might
significantly improve some of the estimates (just like for scalar
columns). Which would mean the estimates for range types are just as
complex as for scalars. Of course, we don't do that now.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Tom Lane
Peter Geoghegan  writes:
> In practice this approach tends to run into problems when the relevant
> AST isn't available. For example, if there's code that only builds on
> Windows, maybe it won't work at all (at least on my Linux system).

Hmm, that could be a deal-breaker.  It's not going to be acceptable
to have to pgindent different parts of the system on different platforms
... at least not unless we can segregate them on the file level, and
even that would have a large PITA factor.

Still, we won't know unless someone makes a serious experiment with it.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Peter Geoghegan
On Sat, Jan 21, 2023 at 9:30 AM Bruce Momjian  wrote:
> I don't see uncrustify or clang-format supporting typedef lists so maybe
> they implemented this feedback loop.  It would be good to see if we can
> get either of these tools to match our formatting.

I personally use clang-format for Postgres development work, since it
integrates nicely with my text editor, and can be configured to
produce approximately the same result as pgindent (certainly good
enough when working on a patch that's still far from a committable
state). I'm fairly sure that clang-format has access to a full AST
from the clang compiler, which is the ideal approach - at least in
theory.

In practice this approach tends to run into problems when the relevant
AST isn't available. For example, if there's code that only builds on
Windows, maybe it won't work at all (at least on my Linux system).
This doesn't really bother me currently, since I only rely on
clang-format as a first pass sort of thing. Maybe I could figure out a
better way to deal with such issues, but right now I don't have much
incentive to.

Another advantage of clang-format is that it's a known quantity. For
example there is direct support for it built into meson, with bells
and whistles such as CI support:

https://mesonbuild.com/Code-formatting.html

My guess is that moving to clang-format would require giving up some
flexibility, but getting much better integration with text editors and
tools like meson in return. It would probably make it practical to
have much stronger rules about how committed code must be indented --
rules that are practical, and can actually be enforced. That trade-off
seems likely to be worth it in my view, though it's not something that
I feel too strongly about.

-- 
Peter Geoghegan




Re: libpqrcv_connect() leaks PGconn

2023-01-21 Thread Andres Freund
Hi,

On 2023-01-21 08:16:42 -0800, Noah Misch wrote:
> On Fri, Jan 20, 2023 at 06:50:37PM -0800, Andres Freund wrote:
> > On 2023-01-20 17:12:37 -0800, Andres Freund wrote:
> > > We have code like this in libpqrcv_connect():
> > It's bit worse than I earlier thought: We use walrv_connect() during CREATE
> > SUBSCRIPTION. One can easily exhaust file descriptors right now.  So I think
> > we need to fix this.
> > 
> > I also noticed the following in libpqrcv_connect, added in 11da97024abb:
> 
> > The attached patch fixes both issues.
> 
> Looks good.  I'm not worried about a superuser hosing their own session via
> CREATE SUBSCRIPTION failures in a loop.  At the same time, this fix is plenty
> safe to back-patch.

Yea, I'm not worried about it from a security perspective and more from a
usability perspective (but even there not terribly). File descriptors that
leaked, particularly when not reserved (AcquireExternalFD() etc), can lead to
weird problems down the line. And I think it's not that rare to need a few
attempts at getting the connection string, permissions, etc right.

Thanks for looking at the fix!


> > I seems we don't have any tests for creating a subscription that fails 
> > during
> > connection establishment? That doesn't seem optimal - I guess there may have
> > been concern around portability of the error messages?
> 
> Perhaps.  We have various (non-subscription) tests using "\set VERBOSITY
> sqlstate" for that problem.  If even the sqlstate varies, a DO block is the
> next level of error swallowing.

That's a good trick I need to remember. And the errcode for an invalid
connection string luckily differs from the one for a not working one.


I think found an even easier way - port=-1 is rejected during PQconnectPoll()
and will never even open a socket. That'd make it reasonable for the test to
happen in subscription.sql, instead of a tap test, I think (faster, easier to
maintain). It may be that we'll one day move that error into the
PQconninfoParse() phase, but I don't think we need to worry about it now.

Any reason not to go for that?

If not, I'll add a test for an invalid conninfo and a non-working connection
string to subscription.sql.

Greetings,

Andres Freund




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-21 Thread Andres Freund
Hi,

On 2023-01-20 21:31:57 -0800, Andres Freund wrote:
> On 2023-01-20 20:16:13 -0800, Andres Freund wrote:
> > I'm planning to push most of my changes soon, had hoped to get to it a bit
> > sooner, but ...
>
> I pushed the int64-ification commits.

There's an odd compilation failure on AIX.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly=2023-01-21%2007%3A01%3A42

/opt/IBM/xlc/16.1.0/bin/xlc_r -D_LARGE_FILES=1 -DRANDOMIZE_ALLOCATED_MEMORY 
-qnoansialias -g -O2 -qmaxmem=33554432 -qsuppress=1500-010:1506-995 
-qsuppress=1506-010:1506-416:1506-450:1506-480:1506-481:1506-492:1506-944:1506-1264
 -qinfo=all:nocnd:noeff:noext:nogot:noini:noord:nopar:noppc:norea:nouni:nouse 
-qinfo=nounset  -qvisibility=hidden -I. -I. -I/opt/freeware/include/python3.5m 
-I../../../src/include -I/home/nm/sw/nopath/icu58.3-64/include   
-I/home/nm/sw/nopath/libxml2-64/include/libxml2  
-I/home/nm/sw/nopath/uuid-64/include -I/home/nm/sw/nopath/openldap-64/include 
-I/home/nm/sw/nopath/icu58.3-64/include -I/home/nm/sw/nopath/libxml2-64/include 
 -c -o plpy_cursorobject.o plpy_cursorobject.c
"../../../src/include/portability/instr_time.h", line 116.9: 1506-304 (I) No 
function prototype given for "clock_gettime".
"../../../src/include/portability/instr_time.h", line 116.23: 1506-045 (S) 
Undeclared identifier CLOCK_REALTIME.
: recipe for target 'plpy_cursorobject.o' failed

but files including instr_time.h *do* build successfully, e.g. instrument.c:

/opt/IBM/xlc/16.1.0/bin/xlc_r -D_LARGE_FILES=1 -DRANDOMIZE_ALLOCATED_MEMORY 
-qnoansialias -g -O2 -qmaxmem=33554432 -qsuppress=1500-010:1506-995 
-qsuppress=1506-010:1506-416:1506-450:1506-480:1506-481:1506-492:1506-944:1506-1264
 -qinfo=all:nocnd:noeff:noext:nogot:noini:noord:nopar:noppc:norea:nouni:nouse 
-qinfo=nounset -I../../../src/include -I/home/nm/sw/nopath/icu58.3-64/include   
-I/home/nm/sw/nopath/libxml2-64/include/libxml2  
-I/home/nm/sw/nopath/uuid-64/include -I/home/nm/sw/nopath/openldap-64/include 
-I/home/nm/sw/nopath/icu58.3-64/include -I/home/nm/sw/nopath/libxml2-64/include 
 -c -o instrument.o instrument.c


Before the change the clock_gettime() call was in a macro and thus could be
referenced even without a prior declaration, as long as places using
INSTR_TIME_SET_CURRENT() had all the necessary includes and defines.


Argh:

There's nice bit in plpython.h:

/*
 * Include order should be: postgres.h, other postgres headers, plpython.h,
 * other plpython headers.  (In practice, other plpython headers will also
 * include this file, so that they can compile standalone.)
 */
#ifndef POSTGRES_H
#error postgres.h must be included before plpython.h
#endif

/*
 * Undefine some things that get (re)defined in the Python headers. They aren't
 * used by the PL/Python code, and all PostgreSQL headers should be included
 * earlier, so this should be pretty safe.
 */
#undef _POSIX_C_SOURCE
#undef _XOPEN_SOURCE


the relevant stuff in time.h is indeed guarded by
#if _XOPEN_SOURCE>=500


I don't think the plpython actually code follows the rule about including all
postgres headers earlier.

plpy_typeio.h:

#include "access/htup.h"
#include "fmgr.h"
#include "plpython.h"
#include "utils/typcache.h"

plpy_curserobject.c:

#include "access/xact.h"
#include "catalog/pg_type.h"
#include "mb/pg_wchar.h"
#include "plpy_cursorobject.h"
#include "plpy_elog.h"
#include "plpy_main.h"
#include "plpy_planobject.h"
#include "plpy_procedure.h"
#include "plpy_resultobject.h"
#include "plpy_spi.h"
#include "plpython.h"
#include "utils/memutils.h"


It strikes me as a uh, not good idea to undefine _POSIX_C_SOURCE,
_XOPEN_SOURCE.

The include order aspect was perhaps feasible when there just was plpython.c,
but with the split into many different C files and many headers, it seems hard
to maintain. There's a lot of violations afaics.

The undefines were added in a11cf433413, the split in 147c2482542.



Greetings,

Andres Freund




Re: pgindent vs variable declaration across multiple lines

2023-01-21 Thread Thomas Munro
On Fri, Jan 20, 2023 at 2:43 PM Tom Lane  wrote:
> Andres Freund  writes:
> > There's a few places in the code that try to format a variable definition 
> > like this
>
> > ReorderBufferChange *next_change =
> > dlist_container(ReorderBufferChange, node, next);
>
> > but pgindent turns that into
>
> > ReorderBufferChange *next_change =
> > dlist_container(ReorderBufferChange, node, next);
>
> Yeah, that's bugged me too.  I suspect that the triggering factor is
> use of a typedef name within the assigned expression, but I've not
> tried to run it to ground.
>
> > I assume we'd again have to dive into pg_bsd_indent's code to fix it :(
>
> Yeah :-(.  That's enough of a rat's nest that I've not really wanted to.
> But I'd support applying such a fix if someone can figure it out.

This may be a clue: the place where declarations are treated
differently seems to be (stangely) in io.c:

ps.ind_stmt = ps.in_stmt & ~ps.in_decl; /* next line should be
 * indented if we have not
 * completed this stmt and if
 * we are not in the middle of
 * a declaration */

If you just remove "& ~ps.in_decl" then it does the desired thing for
that new code in predicate.c, but it also interferes with declarations
with commas, ie int i, j; where i and j currently line up, now j just
gets one indentation level.  It's probably not the right way to do it
but you can fix that with a last token kluge, something like:

#include "indent_codes.h"

ps.ind_stmt = ps.in_stmt && (!ps.in_decl || ps.last_token != comma);

That improves a lot of code in our tree IMHO but of course there is
other collateral damage...




Re: pg_stats and range statistics

2023-01-21 Thread Egor Rogov

Hi Tomas,

On 21.01.2023 00:50, Tomas Vondra wrote:

Hi Egor,

While reviewing a patch improving join estimates for ranges [1] I
realized we don't show stats for ranges in pg_stats, and I recalled we
had this patch.

I rebased the v2, and I decided to took a stab at showing separate
histograms for lower/upper histogram bounds. I believe it makes it way
more readable, which is what pg_stats is about IMHO.



Thanks for looking into this.

I have to admit it looks much better this way, so +1.



This simply adds two functions, accepting/producing anyarray - one for
lower bounds, one for upper bounds. I don't think it can be done with a
plain subquery (or at least I don't know how).



Anyarray is an alien to SQL, so functions are well justified here. What 
makes me a bit uneasy is two almost identical functions. Should we 
consider other options like a function with an additional parameter or a 
function returning an array of bounds arrays (which is somewhat 
wasteful, but probably it doesn't matter much here)?




Finally, it renames the empty_range_frac to start with range_, per the
earlier discussion. I wonder if the new column names for lower/upper
bounds (range_lower_bounds_histograms/range_upper_bounds_histograms) are
too long ...



It seems so. The ending -s should be left out since it's a single 
histogram now. And I think that 
range_lower_histogram/range_upper_histogram are descriptive enough.


I'm adding one more patch to shorten the column names, refresh the docs, 
and make 'make check' happy (unfortunately, we have to edit 
src/regress/expected/rules.out every time pg_stats definition changes).





regards

[1] https://commitfest.postgresql.org/41/3821/
From b339c0eab5616cec61e9d9e85398034861608d30 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 20 Jan 2023 20:50:41 +0100
Subject: [PATCH 1/3] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

TODO: catalog version bump
---
 doc/src/sgml/catalogs.sgml   | 32 
 src/backend/catalog/system_views.sql | 23 +++-
 src/include/catalog/pg_statistic.h   |  3 +++
 src/test/regress/expected/rules.out  | 26 +-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054e..c8bd84c56eb 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9634,6 +9634,38 @@ SCRAM-SHA-256$iteration 
count:
User mapping specific options, as keyword=value strings
   
  
+
+ 
+  
+   range_length_histogram anyarray
+  
+  
+   A histogram of the lengths of non-empty and non-null range values of a
+   range type column. (Null for non-range types.)
+  
+ 
+
+ 
+  
+   empty_range_frac float4
+  
+  
+   Fraction of column entries whose values are empty ranges.
+   (Null for non-range types.)
+  
+ 
+
+ 
+  
+   range_bounds_histograms anyarray
+  
+  
+   Histograms of lower and upper bounds of non-empty, non-null ranges,
+   combined into a single array of range values. The lower and upper bounds
+   of each value correspond to the histograms of lower and upper bounds
+   respectively. (Null for non-range types.)
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 8608e3fa5b1..ccd6c7ffdb7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 WHEN stakind3 = 5 THEN stanumbers3
 WHEN stakind4 = 5 THEN stanumbers4
 WHEN stakind5 = 5 THEN stanumbers5
-END AS elem_count_histogram
+END AS elem_count_histogram,
+CASE
+WHEN stakind1 = 6 THEN stavalues1
+WHEN stakind2 = 6 THEN stavalues2
+WHEN stakind3 = 6 THEN stavalues3
+WHEN stakind4 = 6 THEN stavalues4
+WHEN stakind5 = 6 THEN stavalues5
+END AS range_length_histogram,
+CASE
+WHEN stakind1 = 6 THEN stanumbers1[1]
+WHEN stakind2 = 6 THEN stanumbers2[1]
+WHEN stakind3 = 6 THEN stanumbers3[1]
+WHEN stakind4 = 6 THEN stanumbers4[1]
+WHEN stakind5 = 6 THEN stanumbers5[1]
+END AS empty_range_frac,
+CASE
+WHEN stakind1 = 7 THEN stavalues1
+WHEN stakind2 = 7 THEN stavalues2
+WHEN stakind3 = 7 THEN stavalues3
+WHEN stakind4 = 7 THEN stavalues4
+WHEN stakind5 = 7 THEN stavalues5
+END AS range_bounds_histograms
 FROM pg_statistic s JOIN pg_class c ON (c.oid = 

Re: Unicode grapheme clusters

2023-01-21 Thread Bruce Momjian
On Sat, Jan 21, 2023 at 01:17:27PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > I just checked if wcswidth() would honor graphene clusters, though
> > wcwidth() does not, but it seems wcswidth() treats characters just like
> > wcwidth():
> 
> Well, that's at least potentially fixable within libc, while wcwidth
> clearly can never do this right.
> 
> Probably our long-term answer is to avoid depending on wcwidth
> and use wcswidth instead.  But it's hard to get excited about
> doing the legwork for that until popular libc implementations
> get it right.

Agreed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: Unicode grapheme clusters

2023-01-21 Thread Tom Lane
Bruce Momjian  writes:
> I just checked if wcswidth() would honor graphene clusters, though
> wcwidth() does not, but it seems wcswidth() treats characters just like
> wcwidth():

Well, that's at least potentially fixable within libc, while wcwidth
clearly can never do this right.

Probably our long-term answer is to avoid depending on wcwidth
and use wcswidth instead.  But it's hard to get excited about
doing the legwork for that until popular libc implementations
get it right.

regards, tom lane




Re: Unicode grapheme clusters

2023-01-21 Thread Bruce Momjian
On Sat, Jan 21, 2023 at 12:37:30PM -0500, Bruce Momjian wrote:
> Well, as one of the URLs I quoted said:
> 
>   This is by design. wcwidth() is utterly broken. Any terminal or
>   terminal application that uses it is also utterly broken. Forget
>   about emoji wcwidth() doesn't even work with combining characters,
>   zero width joiners, flags, and a whole bunch of other things.
> 
> So, either we have to find a function in the library that will do the
> looping over the string for us, or we need to identify the special
> Unicode characters that create grapheme clusters and handle them in our
> code.

I just checked if wcswidth() would honor graphene clusters, though
wcwidth() does not, but it seems wcswidth() treats characters just like
wcwidth():

$ LANG=en_US.UTF-8 grapheme_test
wcswidth len=7

bytes_consumed=4, wcwidth len=2
bytes_consumed=4, wcwidth len=2
bytes_consumed=3, wcwidth len=0
bytes_consumed=3, wcwidth len=1
bytes_consumed=3, wcwidth len=0
bytes_consumed=4, wcwidth len=2

C test program attached.  This is on Debian 11.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.
#define _XOPEN_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 

int
main (int argc, char *argv[])
{
	char *cp = "‍⚕️喙";
	wchar_t wch[100];
	int i;
	
	setlocale(LC_ALL, "en_US.UTF-8");

	mbstowcs(wch, cp, 100);
	printf("wcswidth len=%d\n\n", wcswidth(wch, 100));

	while (cp[i])
	{
		int res = mbtowc(wch, cp + i, 100);

		printf("bytes_consumed=%d, ", res);
	
		int len = wcwidth(wch[0]);
		printf("wcwidth len=%d\n", len);
		i += res;
	}

	return 0;
}


Re: Unicode grapheme clusters

2023-01-21 Thread Bruce Momjian
On Sat, Jan 21, 2023 at 11:20:39AM -0500, Greg Stark wrote:
> On Fri, 20 Jan 2023 at 00:07, Pavel Stehule  wrote:
> >
> > I partially watch an progres in VTE - one of the widely used terminal libs, 
> > and I am very sceptical so there will be support in the next two years.
> >
> > Maybe the new microsoft terminal will give this area a new dynamic, but 
> > currently only few people on the planet are working on fixing or enhancing 
> > terminal's technologies. Unfortunately there is too much historical balast.
> 
> Fwiw this isn't really about terminal emulators. psql is also used to
> generate text files for reports or for display in various ways.
> 
> I think it's worth using whatever APIs we have available to implement
> better alignment for grapheme clusters and just assume whatever will
> eventually be used to display the output will display it "properly".
> 
> I do not think it's worth trying to implement this ourselves if the
> libraries aren't there yet. And I don't think it's worth trying to
> adapt to the current state of the current terminal. We don't know that
> that's the only place the output will be viewed and it'll all be
> wasted effort when the terminals eventually implement full support.

Well, as one of the URLs I quoted said:

This is by design. wcwidth() is utterly broken. Any terminal or
terminal application that uses it is also utterly broken. Forget
about emoji wcwidth() doesn't even work with combining characters,
zero width joiners, flags, and a whole bunch of other things.

So, either we have to find a function in the library that will do the
looping over the string for us, or we need to identify the special
Unicode characters that create grapheme clusters and handle them in our
code.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Bruce Momjian
On Fri, Jan 20, 2023 at 10:43:50AM +0100, Jelte Fennema wrote:
> Side-question: What's the reason why pgindent is used instead of some
> more "modern" code formatter that doesn't require keeping
> typedefs.list up to date for good looking output? (e.g. uncrustify or
> clang-format) Because that would also allow for easy editor
> integration.

One reason the typedef list is required is a quirk of the C syntax. 
Most languages have a lexer/scanner, which tokenizes, and a parser,
which parses.  The communication is usually one-way, lexer to parser. 
For C, typedefs require the parser to feed new typedefs back into the
lexer:

http://calculist.blogspot.com/2009/02/c-typedef-parsing-problem.html

BSD indent doesn't have that feedback mechanism, probably because it
doesn't fully parse the C file.  Therefore, we have to supply typedefs
manually, and for Postgres we pull them from debug-enabled binaries in
our buildfarm.  The problem with that is you often import typedefs from
system headers, and the typedefs apply to all C files, not just the ones
were the typdefs are visible.

I don't see uncrustify or clang-format supporting typedef lists so maybe
they implemented this feedback loop.  It would be good to see if we can
get either of these tools to match our formatting.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: Unicode grapheme clusters

2023-01-21 Thread Tom Lane
Greg Stark  writes:
> (If we were really crazy about this we could use terminal escape codes
> to query the current cursor position after emitting multicharacter
> graphemes. But as I said, I don't even think that would be useful,
> even if there weren't other reasons it would be a bad idea)

Yeah, use of a pager would be enough to break that.

regards, tom lane




Re: Unicode grapheme clusters

2023-01-21 Thread Pavel Stehule
so 21. 1. 2023 v 17:21 odesílatel Greg Stark  napsal:

> On Fri, 20 Jan 2023 at 00:07, Pavel Stehule 
> wrote:
> >
> > I partially watch an progres in VTE - one of the widely used terminal
> libs, and I am very sceptical so there will be support in the next two
> years.
> >
> > Maybe the new microsoft terminal will give this area a new dynamic, but
> currently only few people on the planet are working on fixing or enhancing
> terminal's technologies. Unfortunately there is too much historical balast.
>
> Fwiw this isn't really about terminal emulators. psql is also used to
> generate text files for reports or for display in various ways.
>
> I think it's worth using whatever APIs we have available to implement
> better alignment for grapheme clusters and just assume whatever will
> eventually be used to display the output will display it "properly".
>
> I do not think it's worth trying to implement this ourselves if the
> libraries aren't there yet. And I don't think it's worth trying to
> adapt to the current state of the current terminal. We don't know that
> that's the only place the output will be viewed and it'll all be
> wasted effort when the terminals eventually implement full support.
>
> (If we were really crazy about this we could use terminal escape codes
> to query the current cursor position after emitting multicharacter
> graphemes. But as I said, I don't even think that would be useful,
> even if there weren't other reasons it would be a bad idea)
>

+1

Pavel

>
>
> --
> greg
>


Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Tom Lane
... btw, can we get away with making the diff run be "diff -upd"
not just "diff -u"?  I find diff output for C files noticeably
more useful with those options, but I'm unsure about their
portability.

regards, tom lane




Re: Unicode grapheme clusters

2023-01-21 Thread Greg Stark
On Fri, 20 Jan 2023 at 00:07, Pavel Stehule  wrote:
>
> I partially watch an progres in VTE - one of the widely used terminal libs, 
> and I am very sceptical so there will be support in the next two years.
>
> Maybe the new microsoft terminal will give this area a new dynamic, but 
> currently only few people on the planet are working on fixing or enhancing 
> terminal's technologies. Unfortunately there is too much historical balast.

Fwiw this isn't really about terminal emulators. psql is also used to
generate text files for reports or for display in various ways.

I think it's worth using whatever APIs we have available to implement
better alignment for grapheme clusters and just assume whatever will
eventually be used to display the output will display it "properly".

I do not think it's worth trying to implement this ourselves if the
libraries aren't there yet. And I don't think it's worth trying to
adapt to the current state of the current terminal. We don't know that
that's the only place the output will be viewed and it'll all be
wasted effort when the terminals eventually implement full support.

(If we were really crazy about this we could use terminal escape codes
to query the current cursor position after emitting multicharacter
graphemes. But as I said, I don't even think that would be useful,
even if there weren't other reasons it would be a bad idea)


-- 
greg




Re: libpqrcv_connect() leaks PGconn

2023-01-21 Thread Noah Misch
On Fri, Jan 20, 2023 at 06:50:37PM -0800, Andres Freund wrote:
> On 2023-01-20 17:12:37 -0800, Andres Freund wrote:
> > We have code like this in libpqrcv_connect():
> > 
> > conn = palloc0(sizeof(WalReceiverConn));
> > conn->streamConn = PQconnectStartParams(keys, vals,
> > 
> >  /* expand_dbname = */ true);
> > if (PQstatus(conn->streamConn) == CONNECTION_BAD)
> > {
> > *err = pchomp(PQerrorMessage(conn->streamConn));
> > return NULL;
> > }
> > 
> > [try to establish connection]
> > 
> > if (PQstatus(conn->streamConn) != CONNECTION_OK)
> > {
> > *err = pchomp(PQerrorMessage(conn->streamConn));
> > return NULL;
> > }
> > 
> > 
> > Am I missing something, or are we leaking the libpq connection in case of
> > errors?
> > 
> > It doesn't matter really for walreceiver, since it will exit anyway, but we
> > also use libpqwalreceiver for logical replication, where it might?
> > 
> > 
> > Seems pretty clear that we should do a PQfinish() before returning NULL? I
> > lean towards thinking that this isn't worth backpatching given the current
> > uses of libpq, but I could easily be convinced otherwise.
> > 
> 
> It's bit worse than I earlier thought: We use walrv_connect() during CREATE
> SUBSCRIPTION. One can easily exhaust file descriptors right now.  So I think
> we need to fix this.
> 
> I also noticed the following in libpqrcv_connect, added in 11da97024abb:

> The attached patch fixes both issues.

Looks good.  I'm not worried about a superuser hosing their own session via
CREATE SUBSCRIPTION failures in a loop.  At the same time, this fix is plenty
safe to back-patch.

> I seems we don't have any tests for creating a subscription that fails during
> connection establishment? That doesn't seem optimal - I guess there may have
> been concern around portability of the error messages?

Perhaps.  We have various (non-subscription) tests using "\set VERBOSITY
sqlstate" for that problem.  If even the sqlstate varies, a DO block is the
next level of error swallowing.

> I think we can control
> for that in a tap test, by failing to connect due to a non-existant database,
> then the error is under our control. Whereas e.g. an invalid hostname would
> contain an error from gai_strerror().

That sounds fine.




Re: On login trigger: take three

2023-01-21 Thread Pavel Stehule
pá 20. 1. 2023 v 19:46 odesílatel Mikhail Gribkov 
napsal:

>   Hi Pavel,
>
> On Mon, Jan 16, 2023 at 9:10 AM Pavel Stehule 
> wrote:
>
>>
>>
>> ne 15. 1. 2023 v 7:32 odesílatel Pavel Stehule 
>> napsal:
>>
>>> Hi
>>>
>>>
 On Thu, Jan 12, 2023 at 9:51 AM Pavel Stehule 
 wrote:

> Hi
>
> I checked this patch and it looks well. All tests passed. Together
> with https://commitfest.postgresql.org/41/4013/ it can be a good
> feature.
>
> I re-tested impact on performance and for the worst case looks like
> less than 1% (0.8%). I think it is acceptable. Tested pgbench scenario
> "SELECT 1"
>
> pgbench -f ~/test.sql -C -c 3 -j 5 -T 100 -P10 postgres
>
> 733 tps (master), 727 tps (patched).
>
> I think raising an exception inside should be better tested - not it
> is only in 001_stream_rep.pl - generally more tests are welcome -
> there are no tested handling exceptions.
>

>>> Thank you
>>>
>>> check-world passed without problems
>>> build doc passed without problems
>>> I think so tests are now enough
>>>
>>> I'll mark this patch as ready for committer
>>>
>>
>> Unfortunately, I  forgot one important point. There are not any tests
>> related to backup.
>>
>> I miss pg_dump related tests.
>>
>> I mark this patch as waiting on the author.
>>
>>
>
> Thanks for noticing this.
> I have added sections to pg_dump tests. Attached v37 patch contains these
> additions along with the fresh rebase on master.
>

Thank you

marked as ready for committer

Regards

Pavel


>
> --
>  best regards,
> Mikhail A. Gribkov
>
> e-mail: youzh...@gmail.com
> *http://www.flickr.com/photos/youzhick/albums
> *
> http://www.strava.com/athletes/5085772
> phone: +7(916)604-71-12
> Telegram: @youzhick
>
>
>
>


Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Tom Lane
Andrew Dunstan  writes:
>>> I think we could do better with some automation tooling for committers
>>> here. One low-risk and simple change would be to provide a
>>> non-destructive mode for pgindent that would show you the changes if any
>>> it would make. That could be worked into a git pre-commit hook that
>>> committers could deploy. I can testify to the usefulness of such hooks -
>>> I have one that while not perfect has saved me on at least two occasions
>>> from forgetting to bump the catalog version.

That sounds like a good idea from here.  I do not think we want a
mandatory commit filter, but if individual committers care to work
this into their process in some optional way, great!  I can think
of ways I'd use it during patch development, too.

(One reason not to want a mandatory filter is that you might wish
to apply pgindent as a separate commit, so that you can then
put that commit into .git-blame-ignore-revs.  This could be handy
for example when a patch needs to change the nesting level of a lot
of pre-existing code, without making changes in it otherwise.)

>> This time with patch.

> ... with typo fixed.

Looks reasonable, but you should also update
src/tools/pgindent/pgindent.man, which AFAICT is our only
documentation for pgindent switches.  (Is it time for a
--help option in pgindent?)

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Andrew Dunstan

On 2023-01-21 Sa 10:02, Andrew Dunstan wrote:
> On 2023-01-21 Sa 10:00, Andrew Dunstan wrote:
>> On 2023-01-21 Sa 08:26, Andrew Dunstan wrote:
>>> On 2023-01-20 Fr 13:19, Tom Lane wrote:
 Andres Freund  writes:
> On 2023-01-20 12:09:05 -0500, Tom Lane wrote:
>> The core problem here is that requiring that would translate to
>> requiring every code contributor to have a working copy of pg_bsd_indent.
> Wouldn't just every committer suffice?
 Not if we have cfbot complaining about it.

 (Another problem here is that there's a sizable subset of committers
 who clearly just don't care, and I'm not sure we can convince them to.)
>>> I think we could do better with some automation tooling for committers
>>> here. One low-risk and simple change would be to provide a
>>> non-destructive mode for pgindent that would show you the changes if any
>>> it would make. That could be worked into a git pre-commit hook that
>>> committers could deploy. I can testify to the usefulness of such hooks -
>>> I have one that while not perfect has saved me on at least two occasions
>>> from forgetting to bump the catalog version.
>>>
>>> I'll take a look at fleshing this out, for my own if no-one else's use.
>>>
>>>
>> Here's a quick patch for this. I have it in mind to use like this in a
>> pre-commit hook:
>>
>> # only do this on master
>> test `git rev-parse --abbrev-ref HEAD` = "master" || exit 0
>>
>> src/tools/pgindent/pg_indent --silent `git diff --cached --name-only` || 
>> \
>>
>>   { echo "Need a pgindent run" >&2 ; exit 1; }
>>
>>
>> The committer could then run
>>
>> src/tools/pgindent/pg_indent --show-diff `git diff --cached --name-only`
>>
>> to see what changes it thinks are needed.
>>
>>
> This time with patch.
>
>
... with typo fixed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 741b0ccb58..af0ddd1944 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -21,7 +21,8 @@ my $indent_opts =
 
 my $devnull = File::Spec->devnull;
 
-my ($typedefs_file, $typedef_str, $code_base, $excludes, $indent, $build);
+my ($typedefs_file, $typedef_str, $code_base, $excludes, $indent, $build,
+	$show_diff, $silent_diff);
 
 my %options = (
 	"typedefs=s" => \$typedefs_file,
@@ -29,9 +30,14 @@ my %options = (
 	"code-base=s"=> \$code_base,
 	"excludes=s" => \$excludes,
 	"indent=s"   => \$indent,
-	"build"  => \$build,);
+	"build"  => \$build,
+"show-diff"  => \$show_diff,
+"silent_diff"=> \$silent_diff,);
 GetOptions(%options) || die "bad command line argument\n";
 
+die "Cannot have both --silent-diff and --show-diff"
+  if $silent_diff && $show_diff;
+
 run_build($code_base) if ($build);
 
 # command line option wins, then first non-option arg,
@@ -283,30 +289,19 @@ sub run_indent
 
 }
 
-
-# for development diagnostics
-sub diff
+sub show_diff
 {
-	my $pre   = shift;
-	my $post  = shift;
-	my $flags = shift || "";
+	my $indented   = shift;
+	my $source_filename  = shift;
 
-	print STDERR "running diff\n";
-
-	my $pre_fh  = new File::Temp(TEMPLATE => "pgdiffbX");
 	my $post_fh = new File::Temp(TEMPLATE => "pgdiffaX");
 
-	print $pre_fh $pre;
-	print $post_fh $post;
+	print $post_fh $indented;
 
-	$pre_fh->close();
 	$post_fh->close();
 
-	system( "diff $flags "
-		  . $pre_fh->filename . " "
-		  . $post_fh->filename
-		  . " >&2");
-	return;
+	my $diff = `diff -u $source_filename $post_fh->filename  2>&1`;
+	return $diff;
 }
 
 
@@ -404,6 +399,8 @@ push(@files, @ARGV);
 
 foreach my $source_filename (@files)
 {
+	# ignore anything that's not a .c or .h file
+	next unless $source_filename =~/\.[ch]$/;
 
 	# Automatically ignore .c and .h files that correspond to a .y or .l
 	# file.  indent tends to get badly confused by Bison/flex output,
@@ -429,7 +426,24 @@ foreach my $source_filename (@files)
 
 	$source = post_indent($source, $source_filename);
 
-	write_source($source, $source_filename) if $source ne $orig_source;
+	if ($source ne $orig_source)
+	{
+		if ($silent_diff)
+		{
+			exit 2;
+		}
+		elsif ($show_diff)
+		{
+			print show_diff($source, $source_filename);
+		}
+		else
+		{
+			write_source($source, $source_filename);
+		}
+	}
+
 }
 
 build_clean($code_base) if $build;
+
+exit 0;


Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Andrew Dunstan

On 2023-01-21 Sa 10:00, Andrew Dunstan wrote:
> On 2023-01-21 Sa 08:26, Andrew Dunstan wrote:
>> On 2023-01-20 Fr 13:19, Tom Lane wrote:
>>> Andres Freund  writes:
 On 2023-01-20 12:09:05 -0500, Tom Lane wrote:
> The core problem here is that requiring that would translate to
> requiring every code contributor to have a working copy of pg_bsd_indent.
 Wouldn't just every committer suffice?
>>> Not if we have cfbot complaining about it.
>>>
>>> (Another problem here is that there's a sizable subset of committers
>>> who clearly just don't care, and I'm not sure we can convince them to.)
>> I think we could do better with some automation tooling for committers
>> here. One low-risk and simple change would be to provide a
>> non-destructive mode for pgindent that would show you the changes if any
>> it would make. That could be worked into a git pre-commit hook that
>> committers could deploy. I can testify to the usefulness of such hooks -
>> I have one that while not perfect has saved me on at least two occasions
>> from forgetting to bump the catalog version.
>>
>> I'll take a look at fleshing this out, for my own if no-one else's use.
>>
>>
> Here's a quick patch for this. I have it in mind to use like this in a
> pre-commit hook:
>
> # only do this on master
> test `git rev-parse --abbrev-ref HEAD` = "master" || exit 0
>
> src/tools/pgindent/pg_indent --silent `git diff --cached --name-only` || \
>
>   { echo "Need a pgindent run" >&2 ; exit 1; }
>
>
> The committer could then run
>
> src/tools/pgindent/pg_indent --show-diff `git diff --cached --name-only`
>
> to see what changes it thinks are needed.
>
>
This time with patch.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 741b0ccb58..2834f5c9fc 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -21,7 +21,8 @@ my $indent_opts =
 
 my $devnull = File::Spec->devnull;
 
-my ($typedefs_file, $typedef_str, $code_base, $excludes, $indent, $build);
+my ($typedefs_file, $typedef_str, $code_base, $excludes, $indent, $build,
+	$show_diff, $silent_diff);
 
 my %options = (
 	"typedefs=s" => \$typedefs_file,
@@ -29,9 +30,14 @@ my %options = (
 	"code-base=s"=> \$code_base,
 	"excludes=s" => \$excludes,
 	"indent=s"   => \$indent,
-	"build"  => \$build,);
+	"build"  => \$build,
+"show-diff"  => \$show_diff,
+"silent_diff"=> \$silent_diff,);
 GetOptions(%options) || die "bad command line argument\n";
 
+die "Cannot have both --silent-diff and --show-diff"
+  if $silent_diff && $show_diff;
+
 run_build($code_base) if ($build);
 
 # command line option wins, then first non-option arg,
@@ -283,30 +289,19 @@ sub run_indent
 
 }
 
-
-# for development diagnostics
-sub diff
+sub show_diff
 {
-	my $pre   = shift;
-	my $post  = shift;
-	my $flags = shift || "";
+	my $indented   = shift;
+	my $source_filename  = shift;
 
-	print STDERR "running diff\n";
-
-	my $pre_fh  = new File::Temp(TEMPLATE => "pgdiffbX");
 	my $post_fh = new File::Temp(TEMPLATE => "pgdiffaX");
 
-	print $pre_fh $pre;
-	print $post_fh $post;
+	print $post_fh $indented;
 
-	$pre_fh->close();
 	$post_fh->close();
 
-	system( "diff $flags "
-		  . $pre_fh->filename . " "
-		  . $post_fh->filename
-		  . " >&2");
-	return;
+	my $diff = `diff -u $source_filename $post_fh->filename  2>&1`);
+	return $diff;
 }
 
 
@@ -404,6 +399,8 @@ push(@files, @ARGV);
 
 foreach my $source_filename (@files)
 {
+	# ignore anything that's not a .c or .h file
+	next unless $source_filename =~/\.[ch]$/;
 
 	# Automatically ignore .c and .h files that correspond to a .y or .l
 	# file.  indent tends to get badly confused by Bison/flex output,
@@ -429,7 +426,24 @@ foreach my $source_filename (@files)
 
 	$source = post_indent($source, $source_filename);
 
-	write_source($source, $source_filename) if $source ne $orig_source;
+	if ($source ne $orig_source)
+	{
+		if ($silent_diff)
+		{
+			exit 2;
+		}
+		elsif ($show_diff)
+		{
+			print show_diff($source, $source_filename);
+		}
+		else
+		{
+			write_source($source, $source_filename);
+		}
+	}
+
 }
 
 build_clean($code_base) if $build;
+
+exit 0;


Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Andrew Dunstan


On 2023-01-21 Sa 08:26, Andrew Dunstan wrote:
> On 2023-01-20 Fr 13:19, Tom Lane wrote:
>> Andres Freund  writes:
>>> On 2023-01-20 12:09:05 -0500, Tom Lane wrote:
 The core problem here is that requiring that would translate to
 requiring every code contributor to have a working copy of pg_bsd_indent.
>>> Wouldn't just every committer suffice?
>> Not if we have cfbot complaining about it.
>>
>> (Another problem here is that there's a sizable subset of committers
>> who clearly just don't care, and I'm not sure we can convince them to.)
>
> I think we could do better with some automation tooling for committers
> here. One low-risk and simple change would be to provide a
> non-destructive mode for pgindent that would show you the changes if any
> it would make. That could be worked into a git pre-commit hook that
> committers could deploy. I can testify to the usefulness of such hooks -
> I have one that while not perfect has saved me on at least two occasions
> from forgetting to bump the catalog version.
>
> I'll take a look at fleshing this out, for my own if no-one else's use.
>
>

Here's a quick patch for this. I have it in mind to use like this in a
pre-commit hook:

# only do this on master
test `git rev-parse --abbrev-ref HEAD` = "master" || exit 0

src/tools/pgindent/pg_indent --silent `git diff --cached --name-only` || \

  { echo "Need a pgindent run" >&2 ; exit 1; }


The committer could then run

src/tools/pgindent/pg_indent --show-diff `git diff --cached --name-only`

to see what changes it thinks are needed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-01-21 Thread Ted Yu
On Sat, Jan 21, 2023 at 3:05 AM Dean Rasheed 
wrote:

> On Tue, 10 Jan 2023 at 14:43, Dean Rasheed 
> wrote:
> >
> > Rebased version attached.
> >
>
> Rebased version, following 8eba3e3f02 and 5d29d525ff.
>
> Regards,
> Dean
>
Hi,
In transform_MERGE_to_join :

+   if (action->matchKind ==
MERGE_WHEN_NOT_MATCHED_BY_SOURCE)
+   tgt_only_tuples = true;
+   if (action->matchKind ==
MERGE_WHEN_NOT_MATCHED_BY_TARGET)

There should be an `else` in front of the second `if`.
When tgt_only_tuples and src_only_tuples are both true, we can come out of
the loop.

Cheers


Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-21 Thread Karl O. Pinc
Attached are 2 v9 patch versions.  I don't think I like them.
I think the v8 versions are better.  But I thought it
wouldn't hurt to show them to you.

On Fri, 20 Jan 2023 14:22:25 -0600
"Karl O. Pinc"  wrote:

> Attached are 2 alternatives:
> (They touch separate files so the ordering is meaningless.)
> 
> 
> v8-0001-List-trusted-and-obsolete-extensions.patch
> 
> Instead of putting [trusted] and [obsolete] in the titles
> of the modules, like v7 does, add a list of them into the text.

v9 puts the list in vertical format, 5 columns.

But the column spacing in HTML is ugly, and I don't
see a parameter to set to change it.  I suppose we could
do more work on the stylesheets, but this seems excessive.

It looks good in PDF, but the page break in the middle
of the paragraph is ugly. (US-Letter)  Again (without forcing a hard
page break by frobbing the stylesheet and adding a processing
instruction), I don't see a a good way to fix the page break.

(sagehill.net says that soft page breaks don't work.  I didn't
try it.)

> v8-0002-Page-break-before-sect1-in-contrib-appendix-when-pdf.patch
> 
> This frobs the PDF style sheet so that when sect1 is used
> in the appendix for the contrib directory, there is a page
> break before every sect1.  This puts each module/extension
> onto a separate page, but only for the contrib appendix.
> 
> Aside from hardcoding the "contrib" id, which I suppose isn't
> too bad since it's publicly exposed as a HTML anchor (or URL 
> component?) and unlikely to change, this also means that the 
> contrib documentation can't use  instead of .

v9 supports using  instead of just .  But
I don't know that it's worth it -- the appendix is committed
to sect* entities. Once you start with sect* the stylesheet
does not allow "section" use to be interspersed.  All the 
sect*s would have to be changed to "section" throughout 
the appendix and I don't see that happening.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
diff --git a/doc/src/sgml/stylesheet-fo.xsl b/doc/src/sgml/stylesheet-fo.xsl
index 0c4dff92c4..9ce36c7279 100644
--- a/doc/src/sgml/stylesheet-fo.xsl
+++ b/doc/src/sgml/stylesheet-fo.xsl
@@ -132,4 +132,12 @@
   
 
 
+
+
+
+  
+  
+
+
 
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 12c79b798b..077184d90d 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -84,6 +84,31 @@ CREATE EXTENSION extension_name;
   provide access to outside-the-database functionality.
  
 
+ These are the trusted extensions:
+   
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+
+ 
+
  
   Many extensions allow you to install their objects in a schema of your
   choice.  To do that, add SCHEMA
@@ -100,6 +125,15 @@ CREATE EXTENSION extension_name;
   component for details.
  
 
+ 
+   These modules and extensions are obsolete:
+
+   
+ 
+ 
+   
+ 
+
  
  
  


Re: run pgindent on a regular basis / scripted manner

2023-01-21 Thread Andrew Dunstan


On 2023-01-20 Fr 13:19, Tom Lane wrote:
> Andres Freund  writes:
>> On 2023-01-20 12:09:05 -0500, Tom Lane wrote:
>>> The core problem here is that requiring that would translate to
>>> requiring every code contributor to have a working copy of pg_bsd_indent.
>> Wouldn't just every committer suffice?
> Not if we have cfbot complaining about it.
>
> (Another problem here is that there's a sizable subset of committers
> who clearly just don't care, and I'm not sure we can convince them to.)


I think we could do better with some automation tooling for committers
here. One low-risk and simple change would be to provide a
non-destructive mode for pgindent that would show you the changes if any
it would make. That could be worked into a git pre-commit hook that
committers could deploy. I can testify to the usefulness of such hooks -
I have one that while not perfect has saved me on at least two occasions
from forgetting to bump the catalog version.

I'll take a look at fleshing this out, for my own if no-one else's use.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-21 Thread Jim Jones

Hi Jacob,

> I think the sslcertmode=disable option that I introduced in [1] 
solves this issue too;


Well, I see there is indeed a significant overlap between our patches -
but yours has a much more comprehensive approach! If I got it right,
the new slcertmode=disable would indeed cancel the existing certs in
'~/.postgresql/ in case they exist. Right?

+    if (conn->sslcertmode[0] == 'd') /* disable */
+    {
+        /* don't send a client cert even if we have one */
+        have_cert = false;
+    }
+    else if (fnbuf[0] == '\0')

My idea was rather to use the existing sslmode with a new option
"no-clientcert" that does actually the same:

    /* sslmode no-clientcert */
    if (conn->sslmode[0] == 'n')
    {

    fnbuf[0] = '\0';

    }

    ...

    if (fnbuf[0] == '\0')
    {
    /* no home directory, proceed without a client cert */
    have_cert = false;
    }

I wish I had found your patchset some months ago. Now I hate myself
for the duplication of efforts :D

What is the status of your patchset?

Cheers
Jim





Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-21 Thread Ted Yu
On Sat, Jan 21, 2023 at 4:03 AM Hayato Kuroda (Fujitsu) <
kuroda.hay...@fujitsu.com> wrote:

> Dear Katsuragi-san,
>
> Thank you for reviewing! PSA new patch set.
>
> > ## v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
> > +
> > +
> > PQCanConncheckPQCan
> > Conncheck
> > + 
> > +  
> > +   Returns the status of the socket.
> >
> > Is this description right? I think this description is for
> > PQConncheck. Something like "Checks whether PQConncheck is
> > available on this platform." seems better.
>
> It might be copy-and-paste error. Thanks for reporting.
> According to other parts, the sentence should be started like "Returns
> ...".
> So I followed the style and did cosmetic change.
>
> > +/* Check whether the postgres server is still alive or not */
> > +extern int PQConncheck(PGconn *conn);
> > +extern int PQCanConncheck(void);
> >
> > Should the names of these functions be in the form of PQconnCheck?
> > Not PQConncheck (c.f. The section of fe-misc.c in libpq-fe.h).
>
> Agreed, fixed.
>
> > ## v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
> > +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states);
> > +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states_all);
> > +PG_FUNCTION_INFO_V1(postgres_fdw_can_verify_connection_states);
> >
> > This patch adds new functions to postgres_fdw for PostgreSQL 16.
> > So, I think it is necessary to update the version of postgres_fdw (v1.1
> > to v1.2).
>
> I checked postgres_fdw commit log, and it seemed that the version was
> updated when SQL functions are added. Fixed.
>
> > +postgres_fdw_verify_connection_states_all() returns
> > boolean
> > +
> > + 
> > +  This function checks the status of remote connections that are
> > established
> > +  by postgres_fdw from the local session to
> > the foreign
> > +  servers. This check is performed by polling the socket and allows
> >
> > It seems better to add a description that states this function
> > checks all the connections. Like the description of
> > postgres_fdw_disconnect_all(). For example, "This function
> > checks the status of 'all the' remote connections..."?
>
> I checked the docs and fixed. Moreover, some inconsistent statements were
> also fixed.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>
> Hi,
For v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch ,
`pqConnCheck_internal` only has one caller which is quite short.
Can pqConnCheck_internal and PQConnCheck be merged into one func ?

+int
+PQCanConnCheck(void)

It seems the return value should be of bool type.

Cheers


RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-21 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reviewing! PSA new patch set.

> ## v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
> +
> +
> PQCanConncheckPQCan
> Conncheck
> + 
> +  
> +   Returns the status of the socket.
> 
> Is this description right? I think this description is for
> PQConncheck. Something like "Checks whether PQConncheck is
> available on this platform." seems better.

It might be copy-and-paste error. Thanks for reporting.
According to other parts, the sentence should be started like "Returns ...".
So I followed the style and did cosmetic change.

> +/* Check whether the postgres server is still alive or not */
> +extern int PQConncheck(PGconn *conn);
> +extern int PQCanConncheck(void);
> 
> Should the names of these functions be in the form of PQconnCheck?
> Not PQConncheck (c.f. The section of fe-misc.c in libpq-fe.h).

Agreed, fixed.

> ## v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
> +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states);
> +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states_all);
> +PG_FUNCTION_INFO_V1(postgres_fdw_can_verify_connection_states);
> 
> This patch adds new functions to postgres_fdw for PostgreSQL 16.
> So, I think it is necessary to update the version of postgres_fdw (v1.1
> to v1.2).

I checked postgres_fdw commit log, and it seemed that the version was
updated when SQL functions are added. Fixed.

> +postgres_fdw_verify_connection_states_all() returns
> boolean
> +
> + 
> +  This function checks the status of remote connections that are
> established
> +  by postgres_fdw from the local session to
> the foreign
> +  servers. This check is performed by polling the socket and allows
> 
> It seems better to add a description that states this function
> checks all the connections. Like the description of
> postgres_fdw_disconnect_all(). For example, "This function
> checks the status of 'all the' remote connections..."?

I checked the docs and fixed. Moreover, some inconsistent statements were
also fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch
Description:  v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch


v25-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v25-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v25-0003-add-test.patch
Description: v25-0003-add-test.patch


Re: MERGE ... WHEN NOT MATCHED BY SOURCE

2023-01-21 Thread Dean Rasheed
On Tue, 10 Jan 2023 at 14:43, Dean Rasheed  wrote:
>
> Rebased version attached.
>

Rebased version, following 8eba3e3f02 and 5d29d525ff.

Regards,
Dean
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
new file mode 100644
index b87ad5c..1482ede
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -396,8 +396,8 @@
 originally matched appears later in the list of actions.
 On the other hand, if the row is concurrently updated or deleted so
 that the join condition fails, then MERGE will
-evaluate the condition's NOT MATCHED actions next,
-and execute the first one that succeeds.
+evaluate the condition's NOT MATCHED [BY TARGET]
+actions next, and execute the first one that succeeds.
 If MERGE attempts an INSERT
 and a unique index is present and a duplicate row is concurrently
 inserted, then a uniqueness violation error is raised;
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
new file mode 100644
index 0995fe0..8ef121a
--- a/doc/src/sgml/ref/merge.sgml
+++ b/doc/src/sgml/ref/merge.sgml
@@ -33,7 +33,8 @@ USING dat
 and when_clause is:
 
 { WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
-  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }
+  WHEN NOT MATCHED BY SOURCE [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |
+  WHEN NOT MATCHED [BY TARGET] [ AND condition ] THEN { merge_insert | DO NOTHING } }
 
 and merge_insert is:
 
@@ -70,7 +71,9 @@ DELETE
from data_source to
target_table_name
producing zero or more candidate change rows.  For each candidate change
-   row, the status of MATCHED or NOT MATCHED
+   row, the status of MATCHED,
+   NOT MATCHED BY SOURCE,
+   or NOT MATCHED [BY TARGET]
is set just once, after which WHEN clauses are evaluated
in the order specified.  For each candidate change row, the first clause to
evaluate as true is executed.  No more than one WHEN
@@ -226,16 +229,37 @@ DELETE
   At least one WHEN clause is required.
  
  
+  The WHEN clause may specify WHEN MATCHED,
+  WHEN NOT MATCHED BY SOURCE, or
+  WHEN NOT MATCHED [BY TARGET].
+  Note that the SQL standard only defines
+  WHEN MATCHED and WHEN NOT MATCHED
+  (which is defined to mean no matching target row).
+  WHEN NOT MATCHED BY SOURCE is an extension to the
+  SQL standard, as is the option to append
+  BY TARGET to WHEN NOT MATCHED, to
+  make its meaning more explicit.
+ 
+ 
   If the WHEN clause specifies WHEN MATCHED
-  and the candidate change row matches a row in the
+  and the candidate change row matches a row in the source to a row in the
   target_table_name,
   the WHEN clause is executed if the
   condition is
   absent or it evaluates to true.
  
  
-  Conversely, if the WHEN clause specifies
-  WHEN NOT MATCHED
+  If the WHEN clause specifies
+  WHEN NOT MATCHED BY SOURCE and the candidate change
+  row represents a row in the
+  target_table_name that does
+  not match a source row, the WHEN clause is executed
+  if the condition is
+  absent or it evaluates to true.
+ 
+ 
+  If the WHEN clause specifies
+  WHEN NOT MATCHED [BY TARGET]
   and the candidate change row does not match a row in the
   target_table_name,
   the WHEN clause is executed if the
@@ -257,7 +281,10 @@ DELETE
  
   A condition on a WHEN MATCHED clause can refer to columns
   in both the source and the target relations. A condition on a
-  WHEN NOT MATCHED clause can only refer to columns from
+  WHEN NOT MATCHED BY SOURCE clause can only refer to
+  columns from the target relation, since by definition there is no matching
+  source row. A condition on a WHEN NOT MATCHED [BY TARGET]
+  clause can only refer to columns from
   the source relation, since by definition there is no matching target row.
   Only the system attributes from the target table are accessible.
  
@@ -382,8 +409,10 @@ DELETE
   WHEN MATCHED clause, the expression can use values
   from the original row in the target table, and values from the
   data_source row.
-  If used in a WHEN NOT MATCHED clause, the
-  expression can use values from the data_source.
+  If used in a WHEN NOT MATCHED BY SOURCE clause, the
+  expression can only use values from the original row in the target table.
+  If used in a WHEN NOT MATCHED [BY TARGET] clause, the
+  expression can only use values from the data_source.
  
 

@@ -452,8 +481,9 @@ MERGE tot

 
  
-  Evaluate whether each row is MATCHED or
-  NOT MATCHED.
+  Evaluate whether each row is MATCHED,
+  NOT MATCHED BY SOURCE, or
+  NOT MATCHED [BY TARGET].
  
 
 
@@ -528,7 +558,8 @@ MERGE tot
   
If a WHEN