Re: [PATCH] pg_convert improvement

2023-11-27 Thread Yurii Rashkovskii
 Hi Bertrand,


> Did some minor changes in the attached:
>
> - Started the multi-line comment with an upper case and finished
> it with a "." and re-worded a bit.
> - Ran pgindent
>
> What do you think about the attached?
>

It looks great!


>
> Also, might be good to create a CF entry [1] so that the patch proposal
> does not get lost
> and gets visibility.
>

Just submitted it to SF. Thank you for the review!

-- 
Y.


Re: [PATCH] pg_convert improvement

2023-11-24 Thread Yurii Rashkovskii
Hi Bertrand,

On Fri, Nov 24, 2023 at 6:26 AM Drouvot, Bertrand <
bertranddrouvot...@gmail.com> wrote:

>
> The patch is pretty straightforward, I just have one remark:
>
> +   /* if no actual conversion happened, return the original string */
> +   /* (we are checking pointers to strings instead of encodings
> because
> +  `pg_do_encoding_conversion` above covers more cases than just
> +  encoding equality) */
>
> I think this could be done in one single comment and follow the preferred
> style
> for multi-line comment, see [1].
>

Thank you for your feedback. I've attached a revised patch.

-- 
Y.


v2-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-conv.patch
Description: Binary data


[PATCH] pg_convert improvement

2023-11-24 Thread Yurii Rashkovskii
Hi,

I propose a patch that ensures `pg_convert` doesn't allocate and copy data
when no conversion is done. It is an unnecessary overhead, especially when
such conversions are done frequently and for large values.

I've tried measuring the performance impact, and the patched version has a
small but non-zero gain.

The patch builds against `master` and `make check` succeeds.

Happy to hear any feedback!

-- 
Y.


v1-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-conv.patch
Description: Binary data


Re: SET ROLE documentation improvement

2023-09-26 Thread Yurii Rashkovskii
On Mon, Sep 25, 2023 at 3:09 PM Nathan Bossart 
wrote:

> On Fri, Sep 15, 2023 at 02:36:16PM -0700, Yurii Rashkovskii wrote:
> > On Fri, Sep 15, 2023 at 1:47 PM Nathan Bossart  >
> > wrote:
> >> I think another issue is that the aforementioned note doesn't mention
> the
> >> new SET option added in 3d14e17.
> >
> > How do you think we should word it in that note to make it useful?
>
> Maybe something like this:
>
> The current session user must have the SET option for the specified
> role_name, either directly or indirectly via a chain of memberships
> with the SET option.
>

This is a good start, indeed. I've amended my patch to include it.

-- 
Y.


V2-0001-Minor-improvement-to-SET-ROLE-documentation.patch
Description: Binary data


Re: SET ROLE documentation improvement

2023-09-15 Thread Yurii Rashkovskii
On Fri, Sep 15, 2023 at 1:47 PM Nathan Bossart 
wrote:

> On Fri, Sep 15, 2023 at 11:26:16AM -0700, Yurii Rashkovskii wrote:
> > I believe SET ROLE documentation makes a slightly incomplete statement
> > about what happens when a superuser uses SET ROLE.
> >
> > The documentation reading suggests that the superuser would lose all
> their
> > privileges. However, they still retain the ability to use `SET ROLE`
> again.
> >
> > The attached patch adds this bit to the documentation.
>
> IMO this is arguably covered by the following note:
>
>The specified role_name
>must be a role that the current session user is a member of.
>(If the session user is a superuser, any role can be selected.)
>
>
I agree that this may be considered sufficient coverage, but I believe that
giving contextual clarification goes a long way to help people understand.
Documentation reading can be challenging.


> But I don't see a big issue with clarifying things further as you propose.
>
> I think another issue is that the aforementioned note doesn't mention the
> new SET option added in 3d14e17.
>

How do you think we should word it in that note to make it useful?


-- 
Y.


Re: ALTER ROLE documentation improvement

2023-09-15 Thread Yurii Rashkovskii
On Fri, Sep 15, 2023 at 1:53 PM Nathan Bossart 
wrote:

> On Fri, Sep 15, 2023 at 11:46:35AM -0700, Yurii Rashkovskii wrote:
> > It appears that 16.0 improved some of the checks in ALTER ROLE.
> Previously,
> > it was possible to do the following (assuming current_user is a bootstrap
> > user):
> >
> > ```
> > ALTER ROLE current_user NOSUPERUSER
> > ```
> >
> > As of 16.0, this produces an error:
> >
> > ```
> > ERROR:  permission denied to alter role
> > DETAIL:  The bootstrap user must have the SUPERUSER attribute.
> > ```
> >
> > The attached patch documents this behavior by providing a bit more
> > clarification to the following statement:
> >
> > "Database superusers can change any of these settings for any role."
>
> I think this could also be worth a mention in the glossary [0].  BTW the
> glossary calls this role the "bootstrap superuser", but the DETAIL message
> calls it the "bootstrap user".  Perhaps we should standardize on one name.
>
> [0] https://www.postgresql.org/docs/devel/glossary.html
>
>
Thank you for the feedback. I've updated the glossary and updated the
terminology to be consistent. Please see the new patch attached.

-- 
Y.


0001-Improve-ALTER-ROLE-documentation-to-document-current-v2.patch
Description: Binary data


ALTER ROLE documentation improvement

2023-09-15 Thread Yurii Rashkovskii
Hi,

It appears that 16.0 improved some of the checks in ALTER ROLE. Previously,
it was possible to do the following (assuming current_user is a bootstrap
user):

```
ALTER ROLE current_user NOSUPERUSER
```

As of 16.0, this produces an error:

```
ERROR:  permission denied to alter role
DETAIL:  The bootstrap user must have the SUPERUSER attribute.
```

The attached patch documents this behavior by providing a bit more
clarification to the following statement:

"Database superusers can change any of these settings for any role."


-- 
Y.


0001-Improve-ALTER-ROLE-documentation-to-document-current.patch
Description: Binary data


SET ROLE documentation improvement

2023-09-15 Thread Yurii Rashkovskii
Hi,

I believe SET ROLE documentation makes a slightly incomplete statement
about what happens when a superuser uses SET ROLE.

The documentation reading suggests that the superuser would lose all their
privileges. However, they still retain the ability to use `SET ROLE` again.

The attached patch adds this bit to the documentation.

-- 
Y.


0001-Minor-improvement-to-SET-ROLE-documentation.patch
Description: Binary data


Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-07-04 Thread Yurii Rashkovskii
Nathan,

On Mon, Jul 3, 2023 at 8:12 PM Nathan Bossart 
wrote:

> On Mon, Jul 03, 2023 at 06:00:12PM -0700, Yurii Rashkovskii wrote:
> > Great, thank you! The reason I was leaving the other constant in place to
> > make upgrading extensions trivial (so that they don't need to adjust for
> > this), but if you think this is a better way, I am fine with it.
>
> Sorry, I'm not following.  Which constant are you referring to?
>

Apologies, I misread the final patch. All good!

-- 
Y.


Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-07-03 Thread Yurii Rashkovskii
Nathan,

On Mon, Jul 3, 2023 at 3:08 PM Nathan Bossart 
wrote:

> On Sun, Jul 02, 2023 at 04:37:52PM -0700, Yurii Rashkovskii wrote:
> > Thank you for revising the patch. While this is relatively minor, I think
> > it should be set to MAXPGPATH directly to clarify their relationship.
>
> Committed.  I set the size to MAXPGPATH directly instead of inventing a new
> macro with the same value.
>

Great, thank you! The reason I was leaving the other constant in place to
make upgrading extensions trivial (so that they don't need to adjust for
this), but if you think this is a better way, I am fine with it.

-- 
Y.


Re: Size vs size_t or, um, PgSize?

2023-07-03 Thread Yurii Rashkovskii
Daniel,

On Mon, Jul 3, 2023 at 12:20 PM Daniel Gustafsson  wrote:

> > On 3 Jul 2023, at 21:14, Yurii Rashkovskii  wrote:
>
> > That being said, going ahead with the global renaming of Size to size_t
> will mostly eliminate this clash in, say, five years when old versions will
> be gone. At least it'll be fixed then. Otherwise, it'll never be fixed at
> all. To me, having the problem gone in the future beats having the problem
> forever.
>
> I would also like all Size instances gone, but the cost during backpatching
> will likely be very high.  There are ~1300 or so of them in the code, and
> that's a lot of potential conflicts during the coming 5 years of
> backpatches.
>
>
I understand. How about a workaround for extension builders? Something like

```
/* Use this if you run into Size type redefinition */
#ifdef DONT_TYPEDEF_SIZE
#define Size size_t
#else
typedef size_t Size;
#endif
```
This way, extension developers can specify DONT_TYPEDEF_SIZE. However, this
would have to be backported, but to minimal/no effect if I am not missing
anything.

Not beautiful, but better than freezing the status quo forever?

-- 
Y.


Re: Size vs size_t or, um, PgSize?

2023-07-03 Thread Yurii Rashkovskii
Hi Thomas,

On Mon, Jul 3, 2023 at 12:03 PM Thomas Munro  wrote:

> On Tue, Jul 4, 2023 at 6:46 AM Daniel Gustafsson  wrote:
> > > On 3 Jul 2023, at 20:32, Yurii Rashkovskii  wrote:
> > > If there's a willingness to try this out, I am happy to prepare a
> patch.
> >
> > This has been discussed a number of times in the past, and the
> conclusion from
> > last time IIRC was to use size_t for new code and only change the
> existing
> > instances when touched for other reasons to avoid churn.
>
> One such earlier discussion:
>
>
> https://www.postgresql.org/message-id/flat/CAEepm%3D1eA0vsgA7-2oigKzqg10YeXoPWiS-fCuQRDLwwmgMXag%40mail.gmail.com
>
> I personally wouldn't mind if we just flipped to standard types
> everywhere, but I guess it wouldn't help with your problem with
> extensions on macOS as you probably also want to target released
> branches, not just master/17+.  But renaming in the back branches
> doesn't sound like something we'd do...
>

Of course, it would have been great to have it backported in the ideal
world, but it isn't realistic, as you say.

That being said, going ahead with the global renaming of Size to size_t
will mostly eliminate this clash in, say, five years when old versions will
be gone. At least it'll be fixed then. Otherwise, it'll never be fixed at
all. To me, having the problem gone in the future beats having the problem
forever.


-- 
Y.


Size vs size_t or, um, PgSize?

2023-07-03 Thread Yurii Rashkovskii
Hi,

I've run into an issue of a name clash with system libraries. Specifically,
the `Size` type seems to be just an alias for `size_t` and, at least on
macOS, it clashes with the core SDK, as it is also defined by MacTypes.h,
which is used by some of the libraries one may want to use from within a
Postgres extension.

While in my case, I believe I have a workaround, I couldn't find any
rationale as to why we might want to have this alias and not use size_t.
Any insight on this would be appreciated.

Would there be any sense in changing it all to size_t or renaming it to
something else?

I understand that they will break some extensions, so if we don't want them
to have to go through with the renaming, can we enable backward
compatibility with a macro?

If there's a willingness to try this out, I am happy to prepare a patch.

-- 
Y.


Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-07-02 Thread Yurii Rashkovskii
Hi Nathan,

On Fri, Jun 30, 2023 at 2:39 PM Nathan Bossart 
wrote:

>
> In v4, I've introduced a new BGW_LIBLEN macro and set it to the default
> value of MAXPGPATH (1024).  This way, the value can live in bgworker.h like
> the other BGW_* macros do.  Plus, this should make the assertion that
> checks for backward compatibility unnecessary.  Since bgw_library_name is
> essentially a path, I can see the argument that we should just set
> BGW_LIBLEN to MAXPGPATH directly.  I'm curious what folks think about this.
>

Thank you for revising the patch. While this is relatively minor, I think
it should be set to MAXPGPATH directly to clarify their relationship.

-- 
Y.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-05-11 Thread Yurii Rashkovskii
On Thu, May 11, 2023 at 10:36 AM Alvaro Herrera 
wrote:

> On 2023-May-11, Yurii Rashkovskii wrote:
>
> > ```
> > 127.0.0.1=5432 ::1=54321
> > ```
> >
> > Basically, a space-delimited set of address/port pairs (delimited by `=`
> to
> > allow IPv6 addresses to use a colon).
>
> This seems a bit too creative.  I'd rather have the IPv6 address in
> square brackets, which clues the parser immediately as to the address
> family and use colons to separate the port number.  If we do go with a
> separate file, which to me sounds easier than cramming it into the PID
> file, then one per line is likely better, if only because line-oriented
> Unix text tooling has an easier time that way.
>

Just a general caution here that using square brackets to denote IPv6
addresses will make it (unnecessarily?) harder to process this with a shell
script.

-- 
Y.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-05-11 Thread Yurii Rashkovskii
Alvaro, Tom,

On Mon, May 8, 2023 at 4:49 PM Tom Lane  wrote:

> Alvaro Herrera  writes:
> > This made me wonder if storing the unadorned port number is really the
> > best way.  Suppose we did extend things so that we listen on different
> > ports on different interfaces; how would this scheme work at all?
>
> Yeah, the probability that that will happen someday is one of the
> things bothering me about this proposal.  I'd rather change the
> file format to support that first (it can be dummy for now, with
> all lines showing the same port), and then document it second.
>

How soon do you think the change will occur that will allow for choosing
different ports on different interfaces? I am happy to help address this.

Relying on a variable number of lines may be counter-productive here if we
want postmaster.pid to be easily readable by shell scripts. What if we
improved the port line to be something like this?

```
127.0.0.1=5432 ::1=54321
```

Basically, a space-delimited set of address/port pairs (delimited by `=` to
allow IPv6 addresses to use a colon). If we allow the address side to be
dropped, the current format (`5432`) will also be correct parsing-wise.

-- 
Y.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-05-10 Thread Yurii Rashkovskii
Hi Denis,

Great catch. I've amended the patch to fix this issue with the
documentation (V3).



On Tue, May 9, 2023 at 2:25 PM Denis Laxalde 
wrote:

> The documentation fails to build for me:
>
> $ ninja docs
> [1/2] Generating doc/src/sgml/postgres-full.xml with a custom command
> FAILED: doc/src/sgml/postgres-full.xml
> /usr/bin/python3 ../postgresql/doc/src/sgml/xmltools_dep_wrapper
> --targetname doc/src/sgml/postgres-full.xml --depfile
> doc/src/sgml/postgres-full.xml.d --tool /usr/bin/xmllint -- --nonet
> --noent --valid --path doc/src/sgml -o doc/src/sgml/postgres-full.xml
> ../postgresql/doc/src/sgml/postgres.sgml
> ../postgresql/doc/src/sgml/postgres.sgml:685: element para: validity
> error : Element entry is not declared in para list of possible children
> ninja: build stopped: subcommand failed.
>
>
> Removing the  tag resolves the issue:
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index cd07bad3b5..f71859f710 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -684,7 +684,7 @@ include_dir 'conf.d'
>  
>  
>   The port can be set to 0 to make Postgres pick an unused port
> number.
> -The assigned port number can be then retrieved from
> postmaster.pid.
> +The assigned port number can be then retrieved from
> postmaster.pid.
>  
> 
>
>
>

-- 
Y.


V3-0001-Allow-listening-port-to-be-0.patch
Description: Binary data


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-05-07 Thread Yurii Rashkovskii
Hi Cary,

Thank you so much for the review. It's very valuable, and you caught an
important issue with it that I somehow missed (not updating the .pid file
with the selected port number). I'm not sure how it escaped me (perhaps I
was focusing too much on the log file to validate the behaviour).

I've amended the patch to ensure the port number is in the lock file. I've
attached V2.

Yurii


On Sat, May 6, 2023 at 12:31 AM Cary Huang  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, failed
> Spec compliant:   not tested
> Documentation:not tested
>
> Hello
>
> This is one of those features that is beneficial to a handful of people in
> specific test cases. It may not benefit the majority of the users but is
> certainly not useless either. As long as it can be disabled and enough
> tests have been run to ensure it won't have a significant impact on working
> components while disabled, it should be fine in my opinion. Regarding where
> the selected port shall be saved (postmaster.pid, read by pg_ctl or saved
> in a dedicated file), I see that postmaster.pid already contains a port
> number in line number 4, so adding a port number into there is nothing new;
> port number is already there and we can simply replace the port number with
> the one selected by the system.
>
> I applied and tested the patch and found that the system can indeed start
> when port is set to 0, but line 4 of postmaster.pid does not store the port
> number selected by the system, rather, it stored 0, which is the same as
> configured. So I am actually not able to find out the port number that my
> PG is running on, at least not in a straight-forward way.
>
> thank you
> ==
> Cary Huang
> HighGo Software
> www.highgo.ca



-- 
Y.


V2-0001-Allow-listening-port-to-be-0.patch
Description: Binary data


Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-05-01 Thread Yurii Rashkovskii
On Mon, May 1, 2023 at 11:05 PM Eric Ridge  wrote:

>
> > We expect the .so's own minor version number to suffice for that,
> > but I realize that that might not be the most user-friendly answer.
>
> One of my desires is that the on-disk .so's filename be associated with
> the pg_extension entry and not Each. Individual. Function.  There's a few
> extensions that like to version the on-disk .so's filename which means a
> CREATE OR REPLACE for every function on every extension version bump.  That
> forces an upgrade script even if the schema didn't technically change and
> also creates the need for bespoke tooling around extension.sql and
> upgrade.sql scripts.
>

I understand the potential pain with this (though I suppose MODULE_PATHNAME
can somewhat alleviate it). However, I'd like to highlight the fact
that, besides the fact that control files contain a reference to a .so
file, there's very little in the way of actual dependency of the extension
mechanism on shared libraries, as that part relies on functions being able
to use C language for their implementation.  Nothing I am aware of would
stop you from having multiple .so files in a given extension (for one
reason or another reason), and I think that's actually a great design,
incidentally or not. This does allow for a great deal of flexibility and an
escape hatch for when the straightforward case doesn't work [1]

If the extension subsystem wasn't replacing MODULE_PATHNAME, I don't think
there would be a reason for having `module_pathname` in the control file.
It doesn't preload the file or call anything from it. It's what `create
function` in the scripts will do. And that's actually great.

I am referencing this not because I don't think you know this but to try
and capture the higher-level picture here.

This doesn't mean we shouldn't improve the UX/DX of one of the common and
straightforward cases (having a single .so file with no other
complications) where possible.

[1]
https://yrashk.com/blog/2023/04/10/avoiding-postgres-extensions-limitations/
-- 
https://omnigr.es
Yurii.


Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-04-24 Thread Yurii Rashkovskii
You're absolutely right. Here's v3.


On Mon, Apr 24, 2023 at 6:30 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi,
>
> > This is a very good catch and a suggestion. I've slightly reworked the
> patch, and I also made this static assertion to have less indirection and,
> therefore, make it easier to understand the premise.
> > Version 2 is attached.
>
> ```
> sizeof((BackgroundWorker *)NULL)->bgw_library_name
> ```
>
> I'm pretty confident something went wrong with the parentheses in v2.
>
>
> --
> Best regards,
> Aleksander Alekseev
>


-- 
Y.


v3-0001-Extend-the-length-of-BackgroundWorker.bgw_library_na.patch
Description: Binary data


Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-04-24 Thread Yurii Rashkovskii
Aleksander,

On Mon, Apr 24, 2023 at 1:01 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> > The patch is backwards-compatible and ensures that bgw_library_name
> stays *at least* as long as BGW_MAXLEN. Existing external code that uses
> BGW_MAXLEN is a length boundary (for example, in `strncpy`) will continue
> to work as expected.
>
> There is a mistake in the comment though:
>
> ```
> +/*
> + * Ensure bgw_function_name's size is backwards-compatible and sensible
> + */
> +StaticAssertDecl(MAXPGPATH >= BGW_MAXLEN, "MAXPGPATH must be at least
> equal to BGW_MAXLEN");
> ```
>
> library_name, not function_name. Also I think the comment should be
> more detailed, something like "prior to PG17 we used ... but since
> PG17 ... which may cause problems if ...".
>

This is a very good catch and a suggestion. I've slightly reworked the
patch, and I also made this static assertion to have less indirection and,
therefore, make it easier to understand the premise.
Version 2 is attached.

-- 
Y.


v2-0001-Extend-the-length-of-BackgroundWorker.bgw_library_na.patch
Description: Binary data


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-20 Thread Yurii Rashkovskii
Aleksander,

On Thu, Apr 20, 2023 at 1:22 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi,
>
> > Also, I don't think there's a case for distributed systems here because
> we're only managing a single computer's resource: the allocation of local
> ports.
>
> I don't suggest building a distributed system but rather using
> well-known solutions from this area. For the described case the
> "resource manager" will be as simple a single Consul instance (a
> single binary file, since Consul is written in Go) running locally.
> The "complexity" would be for the test framework to use a few extra
> REST queries. Arguably not that complicated.
>

Bringing in a process that works over REST API (requiring itself to have a
port, by the way) and increasing the rollout of such an environment is
antithetical to simplicity
and, thus, will make it only worse. If this is the alternative, I'd rather
have a few retries or some other small hacks.

Bringing in a new dependency with Python is also a heavy solution I'd
rather avoid. I find that this is rather a problem with a relatively small
surface. If the patch won't go through,
I'll just find a workaround to live with, but I'd rather stay away from
blowing the development environment out of proportion for something so
minuscule.


>
> > using a KV store to lease a port does not guarantee the port's
> availability
>
> I assume you don't have random processes doing random things (like
> listening random ports) on a CI machine. You know that certain ports
> are reserved for the tests and are going to be used only for this
> purpose using the same leasing protocol.
>

This is intended to be used by CI and development workstations, where all
bets are kind of off.


>
> > For example, I'd suggest adding an option to Postgres to receive sockets
> it should listen [...]
>
> Not sure if I fully understood the idea, but it looks like you are
> suggesting to build in certain rather complicated functionality for an
> arguably rare use case so that a QA engineer didn't have one extra
> small dependency to worry about in this rare case. I'm quite skeptical
> that this is going to happen.
>

My suggestion was to simply allow listening for a wildcard port and be able
to read it out in some way. Nothing particularly complicated. The fact that
the process may die before it is connected to is rather a strange argument
as the same can happen outside of this use case.


-- 
Y.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-19 Thread Yurii Rashkovskii
Alexander,

On Wed, Apr 19, 2023 at 11:44 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi,
>
> Here are my two cents.
>
> > > I would like to suggest a patch against master (although it may be
> worth
> > > backporting it) that makes it possible to listen on any unused port.
> >
> > I think this is a bad idea, mainly because this:
> >
> > > Instead, with this patch, one can specify `port` as `0` (the "wildcard"
> > > port) and retrieve the assigned port from postmaster.pid
> >
> > is a horrid way to find out what was picked, and yet there could
> > be no other.
>
> What personally I dislike about this approach is the fact that it is
> not guaranteed to work in the general case.
>
> Let's say the test framework started Postgres on a random port. Then
> the framework started to do something else, building a Docker
> container for instance. While the framework is busy PostgreSQL crashes
> (crazy, I know, but not impossible). Both PID and the port will be
> reused eventually by another process. How soon is the implementation

detail of the given OS and its setting.
>

Let's say Postgres crashed, and the port was not reused. In this case, the
connection will fail. The test bench script can then, at the very least,
try checking the log files to see if there's any indication of a crash
there and report if one occurred. If the port was reused by something other
than Postgres, the script should (ideally) fail to communicate with it
using Postgres protocol.  If it was reused by another Postgres instance, it
gets a bit tougher, but then the test bench can, upon connection, verify
that it is the same system by comparing the system identifier on the file
system (retrieved using pg_controldata) and over the wire (retrieved
using `select system_identifier from pg_control_system()`)

I also suspect that this problem has a bigger scope than port retrieval. If
one is to use postmaster.pid only for PID retrieval, then there's still no
guarantee that between the time we retrieved the PID from the file and used
it,
Postgres didn't crash, and the PID was not re-used by a different process,
potentially even another postgres process launched in parallel by the test
bench.

There are tools mentioned previously by me in the thread that allow
inspecting which ports are opened by a given PID, and one can use those to
provide an extra determination as to whether we're still on the right
track. These tools
can also tell us what is the process name.

Ultimately, there's no transactionality in POSIX API, so we're always
exposed to the chance of discrepancies between the inspection time and the
next step.

>
> A bullet-proof approach would be (approximately) for the test
> framework to lease the ports on the given machine, for instance by
> using a KV value with CAS support like Consul or etcd (or another
> PostgreSQL instance), as this is done for leader election in
> distributed systems (so called leader lease). After leasing the port
> the framework knows no other testing process on the given machine will
> use it (and also it keeps telling the KV storage that the port is
> still leased) and specifies it in postgresql.conf as usual.
>

The approach you suggest introduces a significant amount of complexity but
seemingly fails to address one of the core issues: using a KV store to
lease a port does not guarantee the port's availability. I don't believe
this is a sound way to address this issue, let alone a bulletproof one.

Also, I don't think there's a case for distributed systems here because
we're only managing a single computer's resource: the allocation of local
ports.

If I were to go for a more bulletproof approach, I would probably consider
a different technique that would not necessitate provisioning and running
additional software for port leasing.

For example, I'd suggest adding an option to Postgres to receive sockets it
should listen on from a UNIX socket (using SCM_RIGHTS message) and then
have another program acquire the sockets using whatever algorithm (picking
pre-set one, unused wildcard port, etc.) and start Postgres passing the
sockets using the aforementioned UNIX socket. This program will be your
leaseholder and can perhaps print out the PID so that the testing scripts
can immediately use it. The leaseholder should watch for the Postgres
process to crash. This is still a fairly complicated solution that needs
some refining, but it does allocate ports flawlessly, relying on OS being
the actual leaseholder and not requiring fighting against race conditions.
I didn't go for anything like this because of the sheer complexity of it.

The proposed solution is, I believe, a simple one that gets you there in an
awful majority of cases. If one starts running out in the error cases like
port reuse or listener disappearance, the logic I described above may get
them a step further.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-18 Thread Yurii Rashkovskii
Stephen,

> You could just drop another file into the data directory that just
> contains
> > the port number ($PGDATA/port).  However, if we ever do multiple ports,
> that
> > would still require a change in the format of that file, so I don't know
> if
> > that's actually better than a).
>

I find it difficult to get anything done under the restriction of "what if
we ever need to change X?" as it is difficult to address something that
doesn't exist or hasn't been planned.

A fine and delicate balance of anticipating what may happen theoretically
and what's more likely happen is an art. It's also important to consider
the impact of a breaking change. It's one thing if we have to break, say,
an SQL function signature or SQL syntax itself, and another if it is a
relatively small feature related to the administration of a server (in this
case, more like scripting a test bench).


>
> If we did a port per line then it wouldn't be changing the format of the
> first line, so that might not be all that bad.
>

If we consider this path, then (if we assume the format of the file is
still to be private), we can make the port line accept multiple ports using
a delimiter like `:` so that the next line still remains the same.

That being said, if the format is private to Postgres, it's all minor
considerations.


> > I don't think involving pg_ctl is necessary or desirable, since it would
> > make any future changes like that even more complicated.
>
> I'm a bit confused by this- if pg_ctl is invoked then we have
> more-or-less full control over parsing and reporting out the answer, so
> while it might be a bit more complicated for us, it seems surely simpler
> for the end user.  Or maybe you're referring to something here that I'm
> not thinking of?
>

I would love to learn about this as well.


>
> Independent of the above though ... this hand-wringing about what we
> might do in the relative near-term when we haven't done much in the past
> many-many years regarding listen_addresses or port strikes me as
> unlikely to be necessary.  Let's pick something and get it done and
> accept that we may have to change it at some point in the future, but
> that's kinda what major releases are for, imv anyway.
>

That's how I see it, too. I tried to make this change as small as possible
to appreciate the fact that all of this may change one day if or when that
portion of Postgres will be due for a major redesign. I'd be happy to
contribute to that process, but in the meantime, I am looking for the
simplest reasonable way to achieve a relatively specific use case.

Personally, I am fine with reading the `.pid` file and accepting that it
_may_ change in the future; I am also fine with amending the patch to add
functionality to pg_ctl or adding a new file.

To keep everybody's cognitive load low, I'd rather not flood the thread
with multiple alternative implementations (unless that's desirable) and
just go for something we can agree on.

(I consider this feature so small that it doesn't deserve such a lengthy
discussion. However, I also get Tom's point about how we document this
feature's use, which is very valid and valuable. If it was up to me
entirely, I'd probably just document `postmaster.pid` and call it a day. If
it ever breaks, that's a major release territory. Otherwise, amending
`pg_ctl` to access information like this in a uniform way is also a good
approach if we want to keep the format of the pid file private.)

-- 
Y.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-12 Thread Yurii Rashkovskii
Hi Tom,

On Thu, Apr 13, 2023 at 9:17 AM Tom Lane  wrote:

> Yurii Rashkovskii  writes:
> > Thank you all for the feedback. It's quite useful. I think it is
> important
> > to separate this into two concerns:
>
> > 1. Letting Postgres pick an unused port.
> > 2. Retrieving the port it picked.
>
> Yeah, those are distinguishable implementation concerns, but ...
>
> > The bottom line is this decouples (1) from (2), and we can resolve them
> > separately if there's too much (understandable) hesitation to commit to a
> > particular approach to it (documenting postmaster.pid, changing its
> format,
> > amending pg_ctl functionality, etc.)
>
> ... AFAICS, there is exactly zero value in committing a solution for (1)
> without also committing a solution for (2).  I don't think any of the
> alternative methods you proposed are attractive or things we should
> recommend.
>

I disagree that zero value exists in (1) without (2). As my examples show,
they make it possible to pick a port without synchronization scripting. Are
they perfect? Of course, not. But they are better than lock file-based
scripts IMO. They are not exposed to race conditions.

But getting your agreement is important to get this in; I am willing to
play along and resolve both (1) and (2) in one go. As for the
implementation approach for (2), which of the following options would you
prefer?

a) Document postmaster.pid as it stands today
b) Expose the port number through pg_ctl (*my personal favorite)
c) Redesign its content below line 1 to make it extensible (make unnamed
lines named, for example)

If none of the above options suit you, do you have a strategy you'd prefer?


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-12 Thread Yurii Rashkovskii
Tom, Robert, Greg, Andrew,

On Thu, Apr 13, 2023 at 12:56 AM Tom Lane  wrote:

> Robert Haas  writes:
> > On Wed, Apr 12, 2023 at 1:31 PM Greg Stark  wrote:
> >> I don't object to using the pid file as the mechanism -- but it is a
> >> bit of an awkward UI for shell scripting. I imagine it would be handy
> >> if pg_ctl had an option to just print the port number so you could get
> >> it with a simple port=`pg_ctl -D  status-port`
>
> > That's not a bad idea, and would provide some additional isolation to
> > reduce direct dependency on the PID file format.
>
> Yeah.  My main concern here is with limiting our ability to change
> the pidfile format in future.  If we can keep the dependencies on that
> localized to code we control, it'd be much better.
>
>
Thank you all for the feedback. It's quite useful. I think it is important
to separate this into two concerns:

1. Letting Postgres pick an unused port.
2. Retrieving the port it picked.

If I get this right, there's no significant opposition to (1) as this is
common functionality we're relying on. The most contention is around (2)
because I suggested using postmaster.pid
file, which may be considered private for the most part, at least for the
time being.

With this in mind, I still think that proceeding with (1) is a good idea,
as retrieving the port being listened on is still much easier than
involving a more complex lock file script. For example, on UNIX-like
systems, `lsof` can be typically used to do this:

```
# For IPv4
lsof  -a -w -FPn -p $(head -n 1 postmaster.pid) -i4TCP -sTCP:LISTEN -P -n |
tail -n 1 | awk -F: '{print $NF}'
# For IPv6
lsof  -a -w -FPn -p $(head -n 1postmaster.pid) -i6TCP -sTCP:LISTEN -P -n |
tail -n 1 | awk -F: '{print $NF}'
```

(There are also other tools that can be used to achieve much of the same)

On Windows, this can be done using PowerShell (and perhaps netstat, too):

```
# IPv4
PS> Get-NetTCPConnection -State Listen -OwningProcess (Get-Content
"postmaster.pid" -First 1) | Where-Object { $_.LocalAddress -notmatch ':' }
| Select-Object -ExpandProperty LocalPort
5432
PS> Get-NetTCPConnection -State Listen -OwningProcess (Get-Content
"postmaster.pid" -First 1) | Where-Object { $_.LocalAddress -match ':' } |
Select-Object -ExpandProperty LocalPort
5432
```

The above commands can be worked on to extract multiple ports should that
ever become a feature.

The bottom line is this decouples (1) from (2), and we can resolve them
separately if there's too much (understandable) hesitation to commit to a
particular approach to it (documenting postmaster.pid, changing its format,
amending pg_ctl functionality, etc.) I will be happy to participate in the
discovery and resolution of (2) as well.

This would allow people like myself or Mark (above in the thread) to let
Postgres pick the unused port and extract it using a oneliner for the time
being. When a better approach for server introspection will be agreed on,
we can use that.

I'll be happy to address any [styling or other] issues with the currently
proposed patch.


--
http://omnigres.org
Yurii


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-11 Thread Yurii Rashkovskii
Hi Robert,

On Tue, Apr 11, 2023 at 12:54 AM Robert Haas  wrote:

> On Fri, Apr 7, 2023 at 5:34 PM Yurii Rashkovskii  wrote:
> > I'm trying to understand what's wrong with reading port from the pid
> file (if Postgres writes information there, it's surely so that somebody
> can read it, otherwise, why write it in the first placd)? The proposed
> solution uses operating system's functionality to achieve collision-free
> mechanics with none of the complexity introduced in the commit.
>
> I agree. We don't document the exact format of the postmaster.pid file
> to my knowledge, but storage.sgml lists all the things it contains,
> and runtime.sgml documents that the first line contains the postmaster
> PID, so this is clearly not some totally opaque file that nobody
> should ever touch. Consequently, I don't agree with Tom's statement
> that this would be a "a horrid way to find out what was picked." There
> is some question in my mind about whether this is a feature that we
> want PostgreSQL to have, and if we do want it, there may be some room
> for debate about how it's implemented, but I reject the idea that
> extracting the port number from postmaster.pid is intrinsically a
> terrible plan. It seems like a perfectly reasonable plan.
>
>
I appreciate your support on the pid file concern. What questions do you
have about this feature with regard to its desirability and/or
implementation? I'd love to learn from your insight and address any of
those if I can.

-- 
Y.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-07 Thread Yurii Rashkovskii
Hi Andrew,

On Fri, Apr 7, 2023, 7:07 p.m. Andrew Dunstan  wrote:

>
> On 2023-03-29 We 07:55, Tom Lane wrote:
>
> Yurii Rashkovskii   writes:
>
> I would like to suggest a patch against master (although it may be worth
> backporting it) that makes it possible to listen on any unused port.
>
> I think this is a bad idea, mainly because this:
>
>
> Instead, with this patch, one can specify `port` as `0` (the "wildcard"
> port) and retrieve the assigned port from postmaster.pid
>
> is a horrid way to find out what was picked, and yet there could
> be no other.
>
> Our existing design for this sort of thing is to let the testing
> framework choose the port, and I don't really see what's wrong
> with that approach.  Yes, I know it's theoretically subject to
> race conditions, but that hasn't seemed to be a problem in
> practice.  It's especially not a problem given that modern
> testing practice tends to not open any TCP port at all, just
> a Unix socket in a test-private directory, so that port
> conflicts are a non-issue.
>
>
> For TAP tests we have pretty much resolved the port collisions issue for
> TCP ports too. See commit 9b4eafcaf4
>
> Perhaps the OP could adapt that logic to his use case.
>

Thank you for referencing this commit. The point why I am suggesting my
patch is that I believe that my solution is a much better way to avoid
collisions in the first place. Implementing an algorithm similar to the one
in the referenced commit is error-pfone and can be difficult in
environments like shell script.

I'm trying to understand what's wrong with reading port from the pid file
(if Postgres writes information there, it's surely so that somebody can
read it, otherwise, why write it in the first placd)? The proposed solution
uses operating system's functionality to achieve collision-free mechanics
with none of the complexity introduced in the commit.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-06 Thread Yurii Rashkovskii
Hi Tom,

On Wed, Mar 29, 2023 at 6:55 PM Tom Lane  wrote:

> Yurii Rashkovskii  writes:
> > I would like to suggest a patch against master (although it may be worth
> > backporting it) that makes it possible to listen on any unused port.
>
> I think this is a bad idea, mainly because this:
>
> > Instead, with this patch, one can specify `port` as `0` (the "wildcard"
> > port) and retrieve the assigned port from postmaster.pid
>
> is a horrid way to find out what was picked, and yet there could
> be no other.
>

I answered you before (
https://www.postgresql.org/message-id/CA+RLCQwYw-Er-E_RGNCDfA514w+1YL8HGhNstxX=y1glaab...@mail.gmail.com),
but I am wondering whether you missed that response. I would really be
interested to learn why you think reading port from the pid file is a
"horrid way" to find out what was picked.

I've outlined my reasoning for this feature in detail in the referenced
message. Hope you can consider it.

-- 
http://omnigres.org
Yurii


Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-04-02 Thread Yurii Rashkovskii
I want to chime in on the issue of lower-number releases that are released
after higher-number releases. The way I see this particular problem is that
we always put upgrade SQL files in release "packages," and they obviously
become static resources.

While I [intentionally] overlook some details here, what if (as a
convention, for projects where it matters) we shipped extensions with
non-upgrade SQL files only, and upgrades were available as separate
downloads? This way, we're not tying releases themselves to upgrade paths.
This also requires no changes to Postgres.

I know this may be a big delivery layout departure for well-established
projects; I also understand that this won't solve the problem of having to
have these files in the first place (though in many cases, they can be
automatically generated once, I suppose, if they are trivial).


On Tue, Jan 10, 2023 at 5:52 AM Tom Lane  wrote:

> I continue to think that this is a fundamentally bad idea.  It creates
> all sorts of uncertainties about what is a valid update path and what
> is not.  Restrictions like
>
> + Such wildcard update
> + scripts will only be used when no explicit path is found from
> + old to target version.
>
> are just band-aids to try to cover up the worst problems.
>
> Have you considered the idea of instead inventing a "\include" facility
> for extension scripts?  Then, if you want to use one-monster-script
> to handle different upgrade cases, you still need one script file for
> each supported upgrade step, but those can be one-liners including the
> common script file.  Plus, such a facility could be of use to people
> who want intermediate factorization solutions (that is, some sharing
> of code without buying all the way into one-monster-script).
>
> regards, tom lane
>
>
>

-- 
Y.


Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-03-29 Thread Yurii Rashkovskii
Hi Tom,

Thank you for your feedback. Below are my comments.

On Wed, Mar 29, 2023 at 6:55 PM Tom Lane  wrote:

> Yurii Rashkovskii  writes:
> > I would like to suggest a patch against master (although it may be worth
> > backporting it) that makes it possible to listen on any unused port.
>
> I think this is a bad idea, mainly because this:
>
> > Instead, with this patch, one can specify `port` as `0` (the "wildcard"
> > port) and retrieve the assigned port from postmaster.pid
>
> is a horrid way to find out what was picked, and yet there could
> be no other.
>

Can you elaborate on why reading postmaster.pid is a horrid way to discover
the port, given that it is a pretty simple format with a fixed line number
for the port?


> Our existing design for this sort of thing is to let the testing
> framework choose the port, and I don't really see what's wrong
> with that approach.  Yes, I know it's theoretically subject to
> race conditions, but that hasn't seemed to be a problem in
> practice.  It's especially not a problem given that modern
> testing practice tends to not open any TCP port at all, just
> a Unix socket in a test-private directory, so that port
> conflicts are a non-issue.
>

I keep running into this race condition nearly daily, which is why I
proposed to address it with this patch. Yes, I know that one can get around
this with UNIX sockets,
but they have limited capabilities (not accessible outside of the local
machine, to begin with). Here's a real-world example of why I need to be
able to use TCP ports:

I have an extension that allows managing the lifecycle of [Docker/OCI]
containers, and it passes Postgres connection details to these containers
as environment variables.
These containers can now connect to Postgres using any program that can
communicate using the wire protocol. I test this functionality in an
automated test that is executed
concurrently with others. Testing that the extension can supply the correct
connection information to the containers is important.

If we forget the importance of testing this specific part of the
functionality for a bit, the rest of my issue can be _theoretically_
resolved by passing the UNIX socket in `PGHOST` instead.

However, it won't work in a typical Docker Desktop for macOS setup as it
utilizes a virtual machine, and therefore, I can't simply use a UNIX socket
between them.

Here's an example:

1. Create a UNIX socket listener:

```
socat unix-l:test.sock,fork system:'echo hello'
```

2. Verify that it works locally:

```
$ socat test.sock -
hello
```

3. Now, while being on macOS and using Docker Desktop, let Docker mount the
directory with the socket and try to connect it from there:

```
$ docker run -v /path/to/sock/dir:/sock -ti ubuntu bash
# apt update && apt install -y socat
# socat /sock/test.sock -
2023/03/29 23:34:48 socat[451] E connect(5, AF=1 "/sock/test.sock", 17):
Connection refused
```

I get that the UNIX socket around works for many cases, but it does not
work for mine. Hence the proposal. Allowing a (fairly common) practice of a
wildcard port with the discovery of it via
postmaster.pid resolves all the above concerns without having to resort to
a rather race-condition-prone way to pick a port (or a complicated way to
do so with proper locking).

-- 
http://omnigres.org
Yurii


[PATCH] Allow Postgres to pick an unused port to listen

2023-03-28 Thread Yurii Rashkovskii
Hi,

I would like to suggest a patch against master (although it may be worth
backporting it) that makes it possible to listen on any unused port.

The main motivation is running colocated instances of Postgres (such as
test benches) without having to coordinate port allocation in an
unnecessarily complicated way.

Instead, with this patch, one can specify `port` as `0` (the "wildcard"
port) and retrieve the assigned port from postmaster.pid

I believe there is no significant performance or another impact as it is a
tiny bit of conditional functionality executed during startup.

The patch builds and `make check` succeeds. The patch does not add a test;
however, I am trying to figure out if this behaviour can be tested
automatically.


-- 
http://omnigres.org
Y.


V1-0001-Allow-listening-port-to-be-0.patch
Description: Binary data


Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-03-13 Thread Yurii Rashkovskii
Nathan,

Thank you for your review.

Indeed, my motivation for doing the change the way I did it was that only
bgw_library_name is expected to be longer, whereas it is much less of a
concern for other fields. If we have increased BGW_MAXLEN, it would have
increased the size of BackgroundWorker for little to no benefit.

On Mon, Mar 13, 2023 at 10:35 AM Nathan Bossart 
wrote:

> On Mon, Mar 13, 2023 at 07:57:47AM -0700, Yurii Rashkovskii wrote:
> > However, there are use cases where [potentially] longer names are
> > expected/desired; for example, test benches (where library files may not
> > [or can not] be copied to Postgres installation) or alternative library
> > installation methods that do not put them into $libdir.
> >
> > The patch is backwards-compatible and ensures that bgw_library_name stays
> > *at least* as long as BGW_MAXLEN. Existing external code that uses
> > BGW_MAXLEN is a length boundary (for example, in `strncpy`) will continue
> > to work as expected.
>
> I see that BGW_MAXLEN was originally set to 64 in 2013 (7f7485a) [0], but
> was increased to 96 in 2018 (3a4b891) [1].  It seems generally reasonable
> to me to increase the length of bgw_library_name further for the use-case
> you describe, but I wonder if it'd be better to simply increase BGW_MAXLEN
> again.  However, IIUC bgw_library_name is the only field that is likely to
> be used for absolute paths, so only increasing that one to MAXPGPATH makes
> sense.
>
> [0]
> https://postgr.es/m/CA%2BTgmoYtQQ-JqAJPxZg3Mjg7EqugzqQ%2BZBrpnXo95chWMCZsXw%40mail.gmail.com
> [1]
> https://postgr.es/m/304a21ab-a9d6-264a-f688-912869c0d7c6%402ndquadrant.com
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>


--
http://omnigres.org
Yurii


[PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-03-13 Thread Yurii Rashkovskii
Hi,

I want to suggest a patch against master (it may also be worth backporting
it) that makes it possible to use longer filenames (such as those with
absolute paths) in `BackgroundWorker.bgw_library_name`.

`BackgroundWorker.bgw_library_name` currently allows names up to
BGW_MAXLEN-1, which is generally sufficient if `$libdir` expansion is used.

However, there are use cases where [potentially] longer names are
expected/desired; for example, test benches (where library files may not
[or can not] be copied to Postgres installation) or alternative library
installation methods that do not put them into $libdir.

The patch is backwards-compatible and ensures that bgw_library_name stays
*at least* as long as BGW_MAXLEN. Existing external code that uses
BGW_MAXLEN is a length boundary (for example, in `strncpy`) will continue
to work as expected.

The trade-off of this patch is that the `BackgroundWorker` structure
becomes larger. From my perspective, this is a reasonable cost (less than a
kilobyte of extra space per worker).

The patch builds and `make check` succeeds.

Any feedback is welcome!

-- 
http://omnigres.org
Yurii


v1-0001-Extend-the-length-of-BackgroundWorker.bgw_library_name.patch
Description: Binary data