Re: doc: Move enum storage commentary to top of section

2022-07-06 Thread David G. Johnston
On Wed, Jul 6, 2022 at 10:24 AM Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Thu, 9 Jun 2022 at 18:12, David G. Johnston
>  wrote:
> >
> > Per suggestion over on -docs:
> >
> >
> https://www.postgresql.org/message-id/bl0pr06mb4978f6c0b69f3f03aebed0fbb3...@bl0pr06mb4978.namprd06.prod.outlook.com
>
> I agree with moving the size of an enum into the top, but I don't
> think that the label length documentation also should be included in
> the top section. It seems excessively detailed for that section with
> its reference to NAMEDATALEN.
>
> -Matthias
>

Agreed.

Tangentially: It does seem a bit unusual to call the fact that the values
both case-sensitive and limited to the length of a system identifier an
implementation detail.  But if anything the length is more of one than the
case-sensitivity.  Specifying NAMEDATALEN here seems like it breaks
encapsulation, it could refer by comparison to an identifier and only those
that care can learn how that length might be changed in a custom build of
PostgreSQL.

David J.


Re: doc: Fix description of how the default user name is chosen

2022-07-09 Thread David G. Johnston
On Friday, July 8, 2022, Bruce Momjian  wrote:

> On Fri, Jul  8, 2022 at 10:17:11PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Tue, Jul  5, 2022 at 08:20:25PM -0400, Tom Lane wrote:
> > >> I agree this phrasing needs some work, but "resolved" doesn't seem
> > >> helpful, since it's not defined here or nearby.  Maybe "The default
> > >> database name is the specified (or defaulted) user name." ?
> >
> > > I am not seeing much improvement in the proposed patch either.  I
> wonder
> > > if we should be calling this the "session" or "connection" user name.
> > > When the docs say "if you do not specify a database name, it defaults
> to
> > > the database user name", there is so much "database in there that the
> > > meaing is unclear, and in this context, the user name is a property of
> > > the connection or session, not of the database.
> >
> > Umm ... you could make the exact same statement with respect to the
> > user's operating-system login session, so I doubt that "session" or
> > "connection" adds any clarity.
>
> Well, one confusion is that there is a database name and a database user
> name.  We don't have different operating system names that users can
> connect to, usually.
>
>
Maybe invoke the wording from the libpq docs and say:

The default database name is the same as the user connection parameter.

https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS

That page doesn’t feel the need to qualify user name and I don’t think it
hurts comprehension; and the  writing “user parameter” there, instead of
“user name”, since the parameter is simply “user”, not “username”.

David J.


Re: doc: Fix description of how the default user name is chosen

2022-07-09 Thread David G. Johnston
On Sat, Jul 9, 2022, 08:16 Bruce Momjian  wrote:

> On Sat, Jul  9, 2022 at 08:06:21AM -0700, David G. Johnston wrote:
> > Maybe invoke the wording from the libpq docs and say:
> >
> > The default database name is the same as the user connection parameter.
> >
> >
> https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
> >
> > That page doesn’t feel the need to qualify user name and I don’t think
> it hurts
> > comprehension; and the  writing “user parameter” there, instead of “user
> name”,
> > since the parameter is simply “user”, not “username”.
>
> Well, it could be the login OS name if the user connection parameter is
> unspecified, right?
>
>
No.  It is always the user parameter.  It just so happens that parameter
also has a default.  And so while there is a transitive aspect the
resolution of the user parameter happens first, using the OS user if
needed, then the dbname parameter is resolved using the user parameter if
needed to supply the default.

David J.


Re: doc: Clarify Routines and Extension Membership

2022-07-14 Thread David G. Johnston
On Thu, Jul 14, 2022 at 2:41 PM Bruce Momjian  wrote:

> On Fri, Jul  8, 2022 at 10:55:55PM -0400, Bruce Momjian wrote:
> > > The/that inconsistency ... choose one.  Or actually, the "an ... the"
> > > combination you used elsewhere doesn't grate on the ear either.
> > >
> > > +  For each extension, refuse to drop anything if any objects
> (other than the
> > > +  extensions listed) depend on it.  However, its own member
> objects, and routines
> > > +  that are explicitly dependent on this extension, are skipped.
> > > +  This is the default.
> > >
> > > "skipped" seems like a horrible choice of word; it could easily be
> read as
> > > "they don't get dropped".  I am not convinced that mentioning the
> member
> > > objects here is an improvement either.  In the first sentence you are
> > > treating each extension as a monolithic object; why not in the second?
> >
> > I created a modified patch based on this feedback;  patch attached.  I
> > rewrote the last change.
>
> Patch applied to PG 13 and later, where extension dependency was added.
> Thank you for the patch.
>

Thank you and apologies for being quiet here and on a few of the
other threads. I've been on vacation and flagged as ToDo some of the
non-simple feedback items that have come this way.

The change to restrict and description in drop extension needs to be fixed
up (the other pages look good).

"This option prevents the specified extensions from being dropped if there
exists non-extension-member objects that depends on any the extensions.
This is the default."

At minimum: "...that depend on any of the extensions."

I did just now confirm that if any of the named extensions failed to be
dropped the entire command fails.  There is no partial success mode.

I'd like to avoid non-extension-member, and one of the main points is that
the routine dependency is member-like, not actual membership.  Hence the
separate wording.

I thus propose to replace the drop extension / restrict paragraph and
replace it with the following:

"This option prevents the specified extensions from being dropped if other
objects - besides these extensions, their members, and their explicitly
dependent routines - depend on them.  This is the default."

Also, I'm thinking to change, on the same page (description):

"Dropping an extension causes its component objects,"

to be:

"Dropping an extension causes its member objects,"

I'm not sure why I originally chose component over member...

David J.


Re: Doc about how to set max_wal_senders when setting minimal wal_level

2022-07-15 Thread David G. Johnston
On Fri, Jul 15, 2022 at 6:27 AM Japin Li  wrote:

>
> >
> > +servers.  If setting max_wal_senders to
> > +0 consider also reducing the amount of WAL
> produced
> > +by changing wal_level to
> minimal.
> >
> > I don't think this is great advice.  It will encourage people to use
> > wal_level = minimal even if they have other requirements that weigh
> > against it.  If they feel that their system is producing too much
> > WAL, I doubt they'll have a hard time finding the wal_level knob.
> >
>
> Agreed. It isn't good advice.  We can remove the suggestion.
>
>
Yeah, I wrote that thinking that max_wal_senders being set to 0 implied
archive_mode = off, but that isn't the case.

David J.


Re: MERGE and parsing with prepared statements

2022-07-15 Thread David G. Johnston
On Fri, Jul 15, 2022 at 11:40 AM Alvaro Herrera 
wrote:

> On 2022-Jul-15, Justin Pryzby wrote:
>
> > It seems a bit odd that it's impossible to use merge with prepared
> statements
> > without specifically casting the source types (which I did now to
> continue my
> > experiment).
>
> I have no comments on this.  Maybe it can be improved, but I don't know
> how.
>
>
Not tested, but the example prepare command fails to make use of the
optional data types specification.  Using that should remove the need to
cast the parameter placeholder within the query itself.

That said, in theory the INSERT specification of the MERGE could be used to
either resolve unknowns or even forcibly convert the data types of the
relation produced by the USING clause to match the actual types required
for the INSERT (since that will happen at insert time anyway).  This would
make the UPDATE behavior slightly different than a top-level UPDATE command
though, since that does not have the same context information.

David J.


Re: Move Section 9.27.7 (Data Object Management Functions) to System Information Chapter

2022-07-15 Thread David G. Johnston
On Thu, Jul 14, 2022 at 3:57 PM Tom Lane  wrote:

> Bruce Momjian  writes:
> > On Mon, Apr 25, 2022 at 08:33:47AM -0700, David G. Johnston wrote:
> >> Both the location and name of the linked to section make no sense to me:
> >> https://www.postgresql.org/docs/current/functions-admin.html#
> >> FUNCTIONS-ADMIN-DBOBJECT
> >> Neither of the tables listed there manage (cause to change) anything.
> They are
> >> pure informational functions - size and path of objects respectively.
> It
> >> belongs in the previous chapter "System Information Functions and
> Operators"
> >> with a different name.
>
> > So, the section title is:
> >   9.27.7. Database Object Management Functions
> > I think the idea is that they _help_ to manage database objects by
> > reporting their size or location.  I do think it is in the right
> > chapter, but maybe needs a better title?  I can't think of one.
>
> I'm hesitant to move functions to a different documentation page
> without a really solid reason.  Just a couple days ago I fielded a
> complaint from somebody who couldn't find string_to_array anymore
> because we'd moved it from "array functions" to "string functions".
>
> I'd be the first to say that the division between 9.26 and 9.27 is
> pretty arbitrary ... but without a clearer demarcation rule,
> moving functions between the two pages seems more likely to
> add confusion than subtract it.
>
>
I'm not going to fight the prevailing winds on this one, much...but I've
probably been sitting on this annoyance for years since I use the ToC to
find stuff fairly quickly in the docs.  This seems much more clear to me
than a function than deciding whether a function that converts a string
into an array belongs in the string chapter or the array chapter.

On a related note, why itemize 9.27 in the table of contents but not 9.26?

I would ask that we at least rename it to:

Disk Usage Functions

Since this would show in the ToC finding the name of the functions that
allow one to compute disk usage, which is a question I probably see once a
year, and what motivates this request, would be more likely to be found
without skimming the entire 9.26 chapter (since I cannot see those table
heading in the ToC) and not finding it and then stumbling upon in on a
table the only deals with sizes but whose headers says nothing about sizes.

David J.


Re: MERGE and parsing with prepared statements

2022-07-15 Thread David G. Johnston
On Fri, Jul 15, 2022 at 12:40 PM Justin Pryzby  wrote:

> On Fri, Jul 15, 2022 at 12:17:51PM -0700, David G. Johnston wrote:
> > On Fri, Jul 15, 2022 at 11:40 AM Alvaro Herrera 
> wrote:
> > > On 2022-Jul-15, Justin Pryzby wrote:
> > >
> > > > It seems a bit odd that it's impossible to use merge with prepared
> statements
> > > > without specifically casting the source types (which I did now to
> continue my
> > > > experiment).
> > >
> > > I have no comments on this.  Maybe it can be improved, but I don't know
> > > how.
> >
> > Not tested, but the example prepare command fails to make use of the
> > optional data types specification.  Using that should remove the need to
> > cast the parameter placeholder within the query itself.
>
> What optional data type specification ?
>

The one documented here:

https://www.postgresql.org/docs/current/sql-prepare.html

PREPARE name [ ( data_type [, ...] ) ] AS statement

David J.


Re: MERGE and parsing with prepared statements

2022-07-15 Thread David G. Johnston
On Fri, Jul 15, 2022 at 12:40 PM Justin Pryzby  wrote:

>
> That appears to be copied from the INSERT page.
> What does that mean, if not that data types will be resolved as needed ?
>

Yep, and the system needs to resolve the type at a point where there is no
contextual information and so it chooses text.


> Note that if I add casts to the "ON" condition, MERGE complains about the
> INSERT VALUES.
>
> PREPARE p AS
> MERGE INTO CustomerAccount CA
> USING (SELECT $1 AS CustomerId, $2 AS TransactionValue) AS T
> ON CA.CustomerId = T.CustomerId::int
> WHEN NOT MATCHED THEN
>   INSERT (CustomerId, Balance)
>   VALUES (T.CustomerId, T.TransactionValue)
> WHEN MATCHED THEN
>   UPDATE SET Balance = Balance + TransactionValue;
>
> ERROR:  column "customerid" is of type integer but expression is of type
> text
> LINE 7:   VALUES (T.CustomerId, T.TransactionValue)
>
>
Noted.  Not surprised.  That error was always present, it's just that the
join happens first.  Since your fix narrowly targeted the join this error
remained to be discovered.

David J.


Re: doc: Bring mention of unique index forced transaction wait behavior outside of the internal section

2022-07-15 Thread David G. Johnston
On Thu, Jul 14, 2022 at 12:18 PM Bruce Momjian  wrote:

> On Mon, Jul 11, 2022 at 05:22:41PM +0300, Aleksander Alekseev wrote:
> > Hi Bruce,
> >
> > > I was not happy with putting this in the Transaction Isolation section.
> > > I rewrote it and put it in the INSERT secion, right before ON CONFLICT;
> > > patch attached.
> >
> > Looks good.
>
> Applied to all supported PG versions.
>
>
Sorry for the delayed response on this but I'm not a fan.  A comment of
some form in transaction isolation seems to make sense (even if not my
original thought...that patch got messed up a bit anyhow), and while having
something in INSERT makes sense this doesn't seem precise enough.

Comments about locking and modifying rows doesn't make sense (the issue
isn't relegated to ON CONFLICT, simple inserts will wait if they happen to
choose the same key to insert).

I would also phrase it as simply "Tables with a unique index will..." and
not even mention tables that lack a unique index - those don't really exist
and inference of their behavior by contrast seems sufficient.

Sticking close to what you proposed then:

INSERT into tables with a unique index might block when concurrent sessions
are inserting conflicting rows (i.e., have identical values for the unique
index columns) or when there already exists a conflicting row which is in
the process of being deleted.  Details are covered in .

I can modify my original patch to be shorter and more on-point for
inclusion in the MVCC chapter if there is interest in having a pointer from
there to index-unique-checks as well.  I think such a note regarding
concurrency on an index naturally fits into one of the main pages for
learning about concurrency in PostgreSQL.

David J.


Re: Select Reference Page - Make Join Syntax More Prominent

2022-07-15 Thread David G. Johnston
On Thu, Jul 7, 2022 at 2:33 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > Looking again at the SELECT Reference page while helping a novice user I
> > was once again annoyed but how the most common query syntax form for the
> > FROM clause is buried within a bunch of "how to generate a table" detail.
>
> Hmm ... I'm good with the concept of making JOIN syntax more prominent,
> but I don't much like this patch.  I think it's fundamentally wrong to
> describe from_item as disjoint from join_expression, and you're
> going to make people more confused not less so by doing that.
>

I'm not so sure about this - but in any case, as you note below, some of
this probably would be better placed in Chapter 7.  My impression is that
this aspect is dense and confusing enough as-is that people are learning
the syntax elsewhere and not worrying about how we express it here.  My
desire is to bring some design theory aspects to the docs and have the
syntax expression align with the design.  I'll do another pass to try and
hopefully find some middle ground here; or be more convincing.


> IMO there's nothing really wrong with the synopsis.  The problem is
> in the "FROM Clause" section, which dives headfirst into the weedy
> details without any context.  What do you think of adding an
> introductory para or two in that section, saying that the FROM
> clause is built from base tables possibly joined into join expressions?
> You sort of have that here, but it's pretty terse still.  Maybe it
> would be helpful also to separate the subsequent list of syntax
> details into base-table options and join syntax.
>

I really don't think focusing on base tables is the best choice here.
Frankly, those end up being fairly trivial.  When you start adding lateral
(especially if introduced by comma instead of a join), and subqueries more
generally, that understanding how the pieces compose becomes more useful.
And by disenjoining the from_item and join_item it becomes easier to give
each a purpose and describe how they relate to each other, rather than
trying to paper over their differences.  If anything I think that having
from_item be used within the join_item syntax and directly under FROM maybe
be detrimental.  Instead, have a term just for the items comma-separated in
the FROM clause, the ones that are CROSS JOINed and use the WHERE clause
for restrictions (or maybe not...).


>
> Not sure what I think about moving LATERAL up.  That's a sufficiently
> weird/obtuse thing that I think we're best off dealing with it last,
>

Strongly disagree given how useful set returning functions can be.  Part of
the reason I'm doing this now is a recent spate of questions where the
advice that I gave was to write an SRF in a joined lateral, what I consider
at least to be the idiomatic query structure for that use case.



> There's also an argument that the reference page *should* be terse
> and the place to cater to novices is 7.2's "Table Expressions"
> discussion (which we could, but don't, link from the ref page).
> I'm not sure if there's any good way to rework that material to make
> it clearer, but there are definitely bits of it that I don't find
> very well-written.  There might be an argument for jettisoning
> some details (like the obsolete "table*" notation) from 7.2
> altogether and covering those only in the ref page.
>
>
Agreed, I will see about allocating material between the two sections
better on the next pass.

I probably need to understand ROWS FROM syntax better as well to make sure
it fits into the mental model I am trying to create.

David J.


Re: Move Section 9.27.7 (Data Object Management Functions) to System Information Chapter

2022-07-16 Thread David G. Johnston
On Fri, Jul 15, 2022 at 12:36 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
> I would ask that we at least rename it to:
>
> Disk Usage Functions
>
>
Nevermind...I identified the scope of that header incorrectly and the
rename wouldn't be appropriate for the other tables in that section.

David J.


Re: doc: Make selectivity example match wording

2022-07-16 Thread David G. Johnston
On Sat, Jul 2, 2022 at 12:42 PM Dian M Fay  wrote:

> On Thu Jun 9, 2022 at 11:57 AM EDT, David G. Johnston wrote:
> > Reposting this to its own thread.
> >
> >
> https://www.postgresql.org/message-id/flat/CAKFQuwby1aMsJDMeibaBaohgoaZhivAo4WcqHC1%3D9-GDZ3TSng%40mail.gmail.com
> >
> > doc: make unique non-null join selectivity example match the prose
> >
> > The description of the computation for the unique, non-null,
> > join selectivity describes a division by the maximum of two values,
> > while the example shows a multiplication by their reciprocal.  While
> > equivalent the max phrasing is easier to understand; which seems
> > more important here than precisely adhering to the formula used
> > in the code (for which either variant is still an approximation).
>
> Should n_distinct and num_rows be d in the text?
>

Thanks for the review.  I generally like everything you said but it made me
realize that I still didn't really understand the intent behind the
formula.  I spent way too much time working that out for myself, then
turned what I found useful into this v2 patch.

It may need some semantic markup still but figured I'd see if the idea
makes sense.

I basically rewrote, in a bit different style, the same material into the
code comments, then proceeded to rework the proof that was already present
there.

I did do this in somewhat of a vacuum.  I'm not inclined to learn this all
start-to-end though.  If the abrupt style change is unwanted so be it.  I'm
not really sure how much benefit the proof really provides.  The comments
in the docs are probably sufficient for the code as well - just define why
the three pieces of the formula exist and are packaged into a single
multiplier called selectivity as an API choice.  I suspect once someone
gets to that comment it is fair to assume some prior knowledge.
Admittedly, I didn't really come into this that way...

David J.


0002-doc-Make-selectivity-example-match-text-rewrite-proo.patch
Description: Binary data


Re: Proposal to introduce a shuffle function to intarray extension

2022-07-16 Thread David G. Johnston
On Sat, Jul 16, 2022 at 7:25 PM Martin Kalcher <
martin.kalc...@aboutsource.net> wrote:

>
> - I added a second function sample(), because it is a lot faster to take
>some elements from an array than to shuffle the whole array and
>slice it. This function can be removed if it is not wanted. The
>important one is shuffle().
>
>
 +SELECT sample('{1,2,3,4,5,6,7,8,9,10,11,12}', 6) !=
sample('{1,2,3,4,5,6,7,8,9,10,11,12}', 6);
+ ?column?
+--
+ t
+(1 row)
+

While small, there is a non-zero chance for both samples to be equal.  This
test should probably just go, I don't see what it tests that isn't covered
by other tests or even trivial usage.

Same goes for:

+SELECT shuffle('{1,2,3,4,5,6,7,8,9,10,11,12}') !=
shuffle('{1,2,3,4,5,6,7,8,9,10,11,12}');
+ ?column?
+--
+ t
+(1 row)
+


David J.


Re: Proposal to introduce a shuffle function to intarray extension

2022-07-16 Thread David G. Johnston
On Sat, Jul 16, 2022 at 8:18 PM Tom Lane  wrote:

> Martin Kalcher  writes:
>
> > - I added a second function sample(), because it is a lot faster to take
> >some elements from an array than to shuffle the whole array and
> >slice it. This function can be removed if it is not wanted.
>
> I have no opinion about whether this one is valuable enough to include in
> intarray, but I do feel like sample() is a vague name, and easily confused
> with marginally-related operations like TABLESAMPLE.  Can we think of a
> more on-point name?  Something like "random_subset" would be pretty
> clear, but it's also clunky.  It's too late here for me to think of
> le mot juste...
>
>
choose(input anyarray, size integer, with_replacement boolean default
false, algorithm text default 'default')?

David J.


Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-18 Thread David G. Johnston
On Mon, Jul 18, 2022 at 3:18 PM Tom Lane  wrote:

>
> Independently of the dimensionality question --- I'd imagined that
> array_sample would select a random subset of the array elements
> but keep their order intact.  If you want the behavior shown
> above, you can do array_shuffle(array_sample(...)).  But if we
> randomize it, and that's not what the user wanted, she has no
> recourse.
>
>
And for those that want to know in what order those elements were chosen
they have no recourse in the other setup.

I really think this function needs to grow an algorithm argument that can
be used to specify stuff like ordering, replacement/without-replacement,
etc...just some enums separated by commas that can be added to the call.

David J.


Re: Doc about how to set max_wal_senders when setting minimal wal_level

2022-07-18 Thread David G. Johnston
On Mon, Jul 18, 2022 at 6:27 PM Japin Li  wrote:

>
> On Tue, 19 Jul 2022 at 03:58, Bruce Momjian  wrote:
> > On Fri, Jul 15, 2022 at 09:29:20PM +0800, Japin Li wrote:
> >>
> >> On Fri, 15 Jul 2022 at 08:49, Bruce Momjian  wrote:
> >> > On Tue, Jul  5, 2022 at 08:02:33PM -0400, Tom Lane wrote:
> >> >> "Precondition" is an overly fancy word that makes things less clear
> >> >> not more so.  Does it mean that setting wal_level = minimal will fail
> >> >> if you don't do these other things, or does it just mean that you
> >> >> won't be getting the absolute minimum WAL volume?  If the former,
> >> >> I think it'd be better to say something like "To set wal_level to
> minimal,
> >> >> you must also set [these variables], which has the effect of
> disabling
> >> >> both WAL archiving and streaming replication."
> >> >
> >> > I have created the attached patch to try to improve this text.
> >>
> >> IMO we can add the following sentence for wal_level description, since
> >> if wal_level = minimal and max_wal_senders > 0, we cannot start the
> database.
> >>
> >> To set wal_level to minimal, you must also set max_wal_senders to 0,
> >> which has the effect of disabling both WAL archiving and streaming
> >> replication.
> >
> > Okay, text added in the attached patch.
>
> Thanks for updating the patch! LGTM.
>
>
+0.90

Consider changing:

"makes any base backups taken before this unusable"

to:

"makes existing base backups unusable"

As I try to justify this, though, it isn't quite true, maybe:

"makes point-in-time recovery, using existing base backups, unable to
replay future WAL."

David J.


Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml

2022-07-18 Thread David G. Johnston
On Thu, Jul 14, 2022 at 4:31 PM Andres Freund  wrote:

> Hi,
>
> I had missed David's original email on this topic...
>
> On 2022-07-14 18:58:09 -0400, Bruce Momjian wrote:
> > On Wed, Apr 20, 2022 at 04:40:44PM -0700, David G. Johnston wrote:
> > > The new cumulative stats subsystem no longer has a "lost under heavy
> load"
> > > problem so that parenthetical should go (or at least be modified).
> > >
> > > These stats can be reset so some discussion about how the system uses
> them
> > > given that possibility seems like it would be good to add here.  I'm
> not sure
> > > what that should look like though.
> > >
> > > diff --git a/doc/src/sgml/maintenance.sgml
> b/doc/src/sgml/maintenance.sgml
> > > index 04a04e0e5f..360807c8f9 100644
> > > --- a/doc/src/sgml/maintenance.sgml
> > > +++ b/doc/src/sgml/maintenance.sgml
> > > @@ -652,9 +652,8 @@ vacuum insert threshold = vacuum base insert
> threshold +
> > > vacuum insert scale fac
> > >  tuples to be frozen by earlier vacuums.  The number of obsolete
> tuples and
> > >  the number of inserted tuples are obtained from the cumulative
> statistics
> > > system;
> > >  it is a semi-accurate count updated by each
> UPDATE,
> > > -DELETE and INSERT
> operation.  (It is
> > > -only semi-accurate because some information might be lost under
> heavy
> > > -load.)  If the relfrozenxid value of
> the table
> > > +DELETE and INSERT operation.
> > > +If the relfrozenxid value of the table
> > >  is more than vacuum_freeze_table_age
> transactions old,
> > >  an aggressive vacuum is performed to freeze old tuples and advance
> > >  relfrozenxid; otherwise, only pages
> that have
> > > been modified
> >
> > Yes, I agree and plan to apply this patch soon.
>
> It might make sense to still say semi-accurate, but adjust the explanation
> to
> say that stats reporting is not instantaneous?
>
>
Unless that delay manifests in executing an UPDATE in a session then
looking at these views in the same session and not seeing that update
reflected I wouldn't mention it.  Concurrency aspects are reasonably
expected here.  But if we do want to mention it maybe:

 "...it is an eventually-consistent count updated by..."

David J.


Re: Doc about how to set max_wal_senders when setting minimal wal_level

2022-07-18 Thread David G. Johnston
On Mon, Jul 18, 2022 at 8:16 PM Bruce Momjian  wrote:

> On Mon, Jul 18, 2022 at 07:39:55PM -0700, David G. Johnston wrote:
> > On Mon, Jul 18, 2022 at 6:27 PM Japin Li  wrote:
> >
> >
> > +0.90
> >
> > Consider changing:
> >
> > "makes any base backups taken before this unusable"
> >
> > to:
> >
> > "makes existing base backups unusable"
> >
> > As I try to justify this, though, it isn't quite true, maybe:
> >
> > "makes point-in-time recovery, using existing base backups, unable to
> replay
> > future WAL."
>
> I went with simpler wording.
>
>
+1

Thanks!

David J.


Re: let's disallow ALTER ROLE bootstrap_superuser NOSUPERUSER

2022-07-21 Thread David G. Johnston
On Thu, Jul 21, 2022 at 9:28 AM Tom Lane  wrote:

> Robert Haas  writes:
> > Currently, it's possible to remove the rolissuper bit from the
> > bootstrap superuser, but this leaves that user - and the system in
> > general - in an odd state. The bootstrap user continues to own all of
> > the objects it owned before, e.g. all of the system catalogs. Direct
> > DML on system catalogs is blocked by pg_class_aclmask_ext(), but it's
> > possible to do things like rename a system function out of the way and
> > create a new function with the same signature. Therefore, creating a
> > new superuser and making the original one a non-superuser is probably
> > not viable from a security perspective, because anyone who gained
> > access to that role would likely have little difficulty mounting a
> > Trojan horse attack against the current superusers.
>
> True, but what if the idea is to have *no* superusers?  I seem
> to recall people being interested in setups like that.
>


> On the whole I don't have any objection to your proposal, I just
> worry that somebody else will.
>
> Of course there's always "UPDATE pg_authid SET rolsuper = false",
> which makes it absolutely clear that you're breaking the glass cover.
>
>
I would expect an initdb option (once this is possible) to specify this
desire and we just never set one up in the first place.  It seems
impractical to remove one after it already exists.  Though we could enable
the option (or a function) tied to the specific predefined role that, say,
permits catalog changes, when that day comes.

David J.


Undocumented Order By vs Target List Volatile Function Behavior

2022-07-21 Thread David G. Johnston
Hey,

This came up today on twitter as a claimed POLA violation:

postgres=# select random(), random() order by random();
   random|   random
-+-
 0.08176638503720679 | 0.08176638503720679
(1 row)

Which was explained long ago by Tom as:

https://www.postgresql.org/message-id/9570.1193941378%40sss.pgh.pa.us

The parser makes it behave equivalent to:

SELECT random() AS foo ORDER BY foo;

Which apparently extends to any column, even aliased ones, that use the
same expression:

postgres=# select random() as foo, random() as foo2 order by foo;
foo |foo2
+
 0.7334292196943459 | 0.7334292196943459
(1 row)

The documentation does say:

"A query using a volatile function will re-evaluate the function at every
row where its value is needed."

https://www.postgresql.org/docs/current/xfunc-volatility.html

That sentence is insufficient to explain why, without the order by, the
system chooses to evaluate random() twice, while with order by it does so
only once.

I propose extending the existing ORDER BY paragraph in the SELECT Command
Reference as follows:

"A limitation of this feature is that an ORDER BY clause applying to the
result of a UNION, INTERSECT, or EXCEPT clause can only specify an output
column name or number, not an expression."

Add:

A side-effect of this feature is that ORDER BY expressions containing
volatile functions will execute the volatile function only once for the
entire row; thus any column expressions using the same function will reuse
the same function result.  By way of example, note the output differences
for the following two queries:

postgres=# select random() as foo, random()*1 as foo2 from
generate_series(1,2) order by foo;
foo |foo2
+
 0.2631492904302788 | 0.2631492904302788
 0.9019166692448664 | 0.9019166692448664
(2 rows)

postgres=# select random() as foo, random() as foo2 from
generate_series(1,2);
foo |foo2
+
 0.7763978178239725 | 0.3569212477832773
 0.7360531822096732 | 0.7028952103643864
(2 rows)

David J.


Interpretation of docs for \copy ... from stdin inaccurate when using -c

2022-07-22 Thread David G. Johnston
This works:

vagrant@vagrant:/usr/local/pgsql/bin$ echo 'value1' | ./psql -d postgres -c
'\copy csvimport from stdin;'
COPY 1

However:

For \copy ... from stdin, data rows are read from the same source that
issued the command

and

When either -c or -f is specified, psql does not read commands from
standard input;

So the meta-command is not read from standard input, thus standard input is
not the source of the command, yet the copy data sitting on standard input
is indeed read and used for the copy.

The behavior when the \copy command is in --file does conform to the
descriptions.  Thus one must write pstdin as the source to make the
following work:

vagrant@vagrant:/usr/local/pgsql/bin$ echo 'value1' | ./psql -d postgres -f
<(echo '\copy csvimport from pstdin;')
COPY 1

This also shows up with SQL COPY ... FROM since one is able to write:

vagrant@vagrant:/usr/local/pgsql/bin$ echo 'value1' | ./psql -d postgres -c
'copy csvimport from stdin;'
COPY 1

but not:

vagrant@vagrant:/usr/local/pgsql/bin$ echo 'value1' | ./psql -d postgres -f
<(echo 'copy csvimport from stdin;')
COPY 0

This last form is especially useful for COPY ... TO STDOUT but considerably
less so for COPY ... FROM; though you lose the flexibility to target
pstdout (which is likewise minor).  It is an inconsistency but an
understandable one.

Should we amend \copy to read:

"For \copy ... from stdin, data rows are read from the same source that
issued the command (for --command the related source is stdin, see Notes),
continuing..."

and then in Notes:

"The accessibility of the psql command's standard input varies slightly
depending on whether --command or --file is specified as the source of
commands.  For --command, the input is accessible both via pstdin and
stdin, but when using --file it can only be accessed via pstdin.  This most
often arises when using the \copy meta-command or SQL COPY, the latter
being unable to access pstdin."

?

David J.

p.s. For now I've taken the position that figuring out what works when both
--command and --file are specified is an exercise for the reader.


Re: doc: Clarify Savepoint Behavior

2022-07-26 Thread David G. Johnston
On Thu, Jul 14, 2022 at 12:44 PM Bruce Momjian  wrote:

> On Sat, Jul  9, 2022 at 12:59:23PM -0400, Bruce Momjian wrote:
> > On Sun, Jun 26, 2022 at 09:14:56AM -0700, David G. Johnston wrote:
> > > So leave the "release" behavior implied from the rollback behavior?
> > >
> > > On the whole I'm slightly in favor of your proposed wording (mostly
> due to the
> > > better fitting of the ROLLBACK command, though at the omission of
> RELEASE...)
> > > but are you seeing anything beyond personal style as to why you feel
> one is
> > > better than the other?  Is there some existing wording in the docs
> that I
> > > should be conforming to here?
> >
> > I have developed the attached patch based on the discussion here.  I
> > tried to simplify the language and example to clarify the intent.
> >
> > I was confused why the first part of the patch added a mention of
> > releasing savepoints to the ROLLBACK TO manual page --- I have removed
> > that and improved the text in RELEASE SAVEPOINT.
>
> Patch applied to all supported versions.
>
>
Bruce,

Thanks for committing this and the other patches.  Should I go into the
commitfest and mark the entries for these as committed or does protocol
dictate I remind you and you do that?

David J.


Re: predefined role(s) for VACUUM and ANALYZE

2022-07-26 Thread David G. Johnston
On Tue, Jul 26, 2022 at 10:37 AM Robert Haas  wrote:

> On Mon, Jul 25, 2022 at 9:47 PM Kyotaro Horiguchi
>  wrote:
> > One arguable point would be whether we will need to put restriction
> > the target relations that Bob can vacuum/analyze.
>


> But for a command with a target, you really ought to have a
> permission on the object, not just a general permission. On the other
> hand, we do have things like pg_read_all_tables, so we could have
> pg_vacuum_all_tables too.


I'm still more likely to create a specific security definer function owned
by the relevant table owner to give out ANALYZE (and maybe VACUUM)
permission to ETL-performing roles.

Still, it seems somewhat appealing to give
> people fine-grained control over this, rather than just "on" or "off".
>
>
Appealing enough to consume a couple of permission bits?

https://www.postgresql.org/message-id/CAKFQuwZ6dhjTFV7Bwmehe1N3%3Dk484y4mM22zuYjVEU2dq9V1aQ%40mail.gmail.com

David J.


Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))

2022-07-26 Thread David G. Johnston
On Mon, Jan 24, 2022 at 9:54 AM Justin Pryzby  wrote:

> I'm renaming this thread for better visibility, since buffers is a small,
> optional part of the patches I sent.
>
> I made a CF entry here.
> https://commitfest.postgresql.org/36/3409/
>
> On Wed, Dec 01, 2021 at 06:58:20PM -0600, Justin Pryzby wrote:
> > On Mon, Nov 15, 2021 at 01:09:54PM -0600, Justin Pryzby wrote:
> > > Some time ago, I had a few relevant patches:
> > > 1) add explain(REGRESS) which is shorthand for (BUFFERS OFF, TIMING
> OFF, COSTS OFF, SUMMARY OFF)
> > > 2) add explain(MACHINE) which elides machine-specific output from
> explain;
> > >for example, Heap Fetches, sort spaceUsed, hash nbuckets, and
> tidbitmap stuff.
> > >
> > >
> https://www.postgresql.org/message-id/flat/20200306213310.gm...@telsasoft.com
> >
> > The attached patch series now looks like this (some minor patches are not
> > included in this list):
> >
> >  1. add GUC explain_regress, which disables TIMING, SUMMARY, COSTS;
>


> >  2. enable explain(BUFFERS) by default (but disabled by explain_regress);
>

+1


> >  3. Add explain(MACHINE) - which is disabled by explain_regress.
> > This elides various machine-specific output like Memory and Disk use.
> > Maybe it should be called something else like "QUIET" or
> "VERBOSE_MINUS_1"
> > or ??
>

INCIDENTALS ?

Aside from being 4 syllables and, for me at least, a pain to remember how
to spell, it seems to fit.

RUNTIME is probably easier, and we just need to point out the difficulty in
naming things since actual rows is still shown, and timings have their own
independent option.

>
> > The regression tests now automatically run with explain_regress=on,
> which is
> > shorthand for TIMING OFF, SUMMARY OFF, COSTS OFF, and then BUFFERS OFF.
> >
> > There's a further option called explain(MACHINE) which can be disabled
> to hide
> > portions of the output that are unstable, like Memory/Disk/Batches/
> > Heap Fetches/Heap Blocks.  This allows "explain analyze" to be used more
> easily
> > in regression tests, and simplifies some existing tests that currently
> use
> > plpgsql functions to filter the output.  But it doesn't handle all the
> > variations from parallel workers.
> >
> > (3) is optional, but simplifies some regression tests.  The patch series
> could
> > be rephrased with (3) first.
>

I like the idea of encapsulating the logic about what to show into the
generation code instead of a post-processing routine.

I don't see any particular ordering of the three changes being most clear.

>
> > Unfortunately, "COSTS OFF" breaks postgres_fdw remote_estimate.  If
> specifying
> > "COSTS ON" in postgres_fdw.c is considered to be a poor fix


Given that we have version tests scattered throughout the postgres_fdw.c
code I'd say protect it with version >= 9.0 and call it good.

But I'm assuming that we are somehow also sending over the GUC to the
remote server (which will be v16+) but I am unsure why that would be the
case.  I'm done code reading for now so I'm leaving this an open
request/question.

, then I suppose
> > this patch series could do as suggested and enable buffers by default
> only when
> > ANALYZE is specified.  Then postgres_fdw is not affected, and the
> > explain_regress GUC is optional: instead, we'd need to specify BUFFERS
> OFF in
> > ~100 regression tests which use EXPLAIN ANALYZE.

I'm not following the transition from the prior sentences about COSTS to
this one regarding BUFFERS.

  (3) still seems useful on its
> > own.
>

It is an orthogonal feature with a different section of our user base being
catered to, so I'd agree by default.

Patch Review Comments:


The following change in the machine commit seems out-of-place; are we
fixing a bug here?

explain.c:
   /* Show buffer usage in planning */
- if (bufusage)
+ if (bufusage && es->buffers)



Typo (trailing comment // without comment):

ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); //


I did a pretty thorough look-through of the locations of the various
changes and everything seems consistent with the established code in this
area (took me a bit to get to the level of comprehension though).  Whether
there may be something missing I'm less able to judge.

>From reading the other main thread I'm not finding this particular choice
of GUC usage to be a problem anyone has raised and it does seem the
cleanest solution, both for us and for anyone out there with a similar
testing framework.

The machine output option covers an obvious need since we've already
written ad-hoc parsing code to deal with the problem.

And, as someone who sees first-hand the kinds of conversations that occur
surrounding beginner's questions, and getting more into performance
questions specifically, I'm sympathetic with the desire to have BUFFERS
something that is output by default.  Given existing buy-in for that idea
I'd hope that at minimum the "update 100 .sql and expected output files,
then change the default" commit happens even if the rest 

Re: Question about ExplainOneQuery_hook

2022-07-26 Thread David G. Johnston
On Tue, Jul 26, 2022 at 1:54 PM Zhihong Yu  wrote:

> Hi,
> I was looking at ExplainOneQuery() where ExplainOneQuery_hook is called.
>
> Currently the call to the hook is in if block and normal processing is in
> else block.
>
> What if the hook doesn't want to duplicate the whole code printing
> execution plan ?
>
> Please advise.
>
>
What kind of advice are you looking for, especially knowing we don't know
anything except you find the existing hook unusable.

https://github.com/postgres/postgres/commit/604ffd280b955100e5fc24649ee4d42a6f3ebf35

My advice is pretend the hook doesn't even exist since it was created 15
years ago for a specific purpose that isn't what you are doing.

I'm hoping that you already have some idea of how to interact with the open
source PostgreSQL project when it doesn't have a feature that you want.
Otherwise that generic discussion probably is best done on -general with a
better subject line.

David J.


Re: Expand palloc/pg_malloc API

2022-07-26 Thread David G. Johnston
On Tue, Jul 26, 2022 at 2:32 PM Tom Lane  wrote:

>
> 2. I don't like the "palloc_ptrtype" name at all.  I see that you
> borrowed that name from talloc, but I doubt that's a precedent that
> very many people are familiar with.



> To me it sounds like it might
> allocate something that's the size of a pointer, not the size of the
> pointed-to object.  I have to confess though that I don't have an
> obviously better name to suggest.  "palloc_pointed_to" would be
> clear perhaps, but it's kind of long.
>

I agree that ptrtype reads "the type of a pointer".

This may not be a C-idiom but the pointed-to thing is a "reference" (hence
pass by value vs pass by reference).  So:

palloc_ref(myvariablepointer)

will allocate using the type of the referenced object.  Just like _array
and _obj, which name the thing being used as a size template as opposed to
instantiate which seems more like another word for "allocate/palloc".

David J.
P.S.

Admittedly I'm still getting my head around reading pointer-using code (I
get the general concept but haven't had to code them)

- lockrelid = palloc(sizeof(*lockrelid));
+ lockrelid = palloc_ptrtype(lockrelid);

// This definitely seems like an odd idiom until I remembered about
short-lived memory contexts and the lost pointers are soon destroyed there.

So lockrelid (no star) is a pointer that has an underlying reference that
the macro (and the orignal code) resolves via the *

I cannot reason out whether the following would be equivalent to the above:

lockrelid = palloc_obj(*lockrelid);

I assume not because:  typeof(lockrelid) != (*lockrelid *)


Official Windows Installer and Documentation

2022-07-27 Thread David G. Johnston
Hey,

Just interacted with a frustrated user on Slack trying to upgrade from v13
to v14 on Windows.  Our official download page for the Windows installer
claims the core documentation as its official reference - can someone
responsible for this area please suggest and test some changes to make this
reality more acceptable.

The particular point that was brought up is our documentation for
pg_upgrade says:

RUNAS /USER:postgres "CMD.EXE"
SET PATH=%PATH%;C:\Program Files\PostgreSQL\14\bin;

The problem is apparently (I haven't personally tested) our
official installer doesn't bother to create the postgres operating system
user account.

It is also unclear whether the defaults for pg_hba.conf add some kind of
bad interaction here should one fix this particular problem.

And then there is the issue of file ownership.

Assuming we want better documentation for this specific issue for
back-patching what would that look like?

Going forward should our installer be creating the postgres user for
consistency with other platforms or not?

I suggest adding relevant discussion about this particular official binary
distribution to:

https://www.postgresql.org/docs/current/install-binaries.html

David J.


Re: Proposal: add a debug message about using geqo

2022-07-27 Thread David G. Johnston
On Fri, Jul 22, 2022 at 1:20 PM Jacob Champion 
wrote:

> On Wed, Jun 1, 2022 at 11:09 PM KAWAMOTO Masaya 
> wrote:
> > That sounds a nice idea. But I don't think that postgres shows in the
> > EXPLAIN output why the plan is selected. Would it be appropriate to
> > show that GEQO is used in EXPLAIN output?
>
> I'm reminded of Greenplum's "Optimizer" line in its EXPLAIN output
> [1], so from that perspective I think it's intuitive.
>
> > As a test, I created a patch that add information about GEQO to
> > EXPLAIN output by the GEQO option. The output example is as follows.
> > What do you think about the location and content of information about
> GEQO?


> I am a little surprised to see GeqoDetails being printed for a plan
> that didn't use GEQO, but again that's probably because I'm used to
> GPDB's Optimizer output. And I don't have a lot of personal experience
> using alternative optimizers.
>

I agree this should be part of explain output.

I would not print the current value of geqo_threshold and leave setting
display the exclusive purview of the settings option.

The presentation of only a single geqo result seems incorrect given that
multiple trees can exist.  In the first example below the full outer join
causes 3 relations to be seen as a single relation at the top level (hence
max join nodes = 4) while in the inner join case we see all 6 join nodes.
There should be two outputs of GEQO in the first explain, one with join
nodes of 3 and the existing one with 4.

I also don't see the point of labelling them "max"; "join nodes" seems
sufficient.

While it can probably be figured out from the rest of the plan, listing the
names of the join nodes may be useful (and give join nodes some company).

David J.

postgres=# explain (verbose, geqo) with gs2 (v2) as materialized ( select *
from generate_series(1,1) ) select * from gs2 as gs4 full outer join
(select gs2a.v2 from gs2 as gs2a, gs2 as gs2b) as gs5 using (v2),
generate_series(1, 1) as gs (v1) cross join gs2 as gs3 where v1 IN (select
v2 from gs2);
 QUERY PLAN


 Nested Loop  (cost=0.07..0.21 rows=1 width=12)
   Output: COALESCE(gs4.v2, gs2a.v2), gs.v1, gs3.v2
   CTE gs2
 ->  Function Scan on pg_catalog.generate_series  (cost=0.00..0.01
rows=1 width=4)
   Output: generate_series.generate_series
   Function Call: generate_series(1, 1)
   ->  Nested Loop  (cost=0.06..0.16 rows=1 width=12)
 Output: gs.v1, gs4.v2, gs2a.v2
 ->  Nested Loop  (cost=0.02..0.06 rows=1 width=4)
   Output: gs.v1
   Join Filter: (gs.v1 = gs2.v2)
   ->  Function Scan on pg_catalog.generate_series gs
 (cost=0.00..0.01 rows=1 width=4)
 Output: gs.v1
 Function Call: generate_series(1, 1)
   ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4)
 Output: gs2.v2
 Group Key: gs2.v2
 ->  CTE Scan on gs2  (cost=0.00..0.02 rows=1 width=4)
   Output: gs2.v2
 ->  Hash Full Join  (cost=0.03..0.10 rows=1 width=8)
   Output: gs4.v2, gs2a.v2
   Hash Cond: (gs2a.v2 = gs4.v2)
   ->  Nested Loop  (cost=0.00..0.05 rows=1 width=4)
 Output: gs2a.v2
 ->  CTE Scan on gs2 gs2b  (cost=0.00..0.02 rows=1
width=0)
   Output: gs2b.v2
 ->  CTE Scan on gs2 gs2a  (cost=0.00..0.02 rows=1
width=4)
   Output: gs2a.v2
   ->  Hash  (cost=0.02..0.02 rows=1 width=4)
 Output: gs4.v2
 ->  CTE Scan on gs2 gs4  (cost=0.00..0.02 rows=1
width=4)
   Output: gs4.v2
   ->  CTE Scan on gs2 gs3  (cost=0.00..0.02 rows=1 width=4)
 Output: gs3.v2
 GeqoDetails: GEQO: used, geqo_threshold: 2, Max join nodes: 4
(35 rows)

postgres=# explain (verbose, geqo) with gs2 (v2) as materialized ( select *
from generate_series(1,1) ) select * from gs2 as gs4 join (select gs2a.v2
from gs2 as gs2a, gs2 as gs2b) as gs5 using (v2), generate_series(1, 1) as
gs (v1) cross join gs2 as gs3 where v1 IN (select v2 from gs2);
QUERY PLAN

--
 Nested Loop  (cost=0.02..0.18 rows=1 width=12)
   Output: gs4.v2, gs.v1, gs3.v2
   CTE gs2
 ->  Function Scan on pg_catalog.generate_series  (cost=0.00..0.01
rows=1 width=4)
   Output: generate_series.generate_series
   Function Call: generate_series(1, 1)
   ->  Nested Loop  (cost=0.00..0.14 rows=1 width=12)
 Output: gs.v1, gs4.v2, gs3.v2
 ->  Nested Loop  (cost=0.00..0.11 rows=1 width=8)
   Output: gs.v1, gs4.v2
   ->  Nested Loop Semi

Re: Official Windows Installer and Documentation

2022-07-27 Thread David G. Johnston
On Wed, Jul 27, 2022 at 6:42 PM Julien Rouhaud  wrote:

> Hi,
>
> On Wed, Jul 27, 2022 at 11:36:11PM +0200, Thomas Kellerer wrote:
> > David G. Johnston schrieb am 27.07.2022 um 21:21:
> > > And then there is the issue of file ownership.
> > >
> > > Assuming we want better documentation for this specific issue for
> > > back-patching what would that look like?
> > >
> > > Going forward should our installer be creating the postgres user for
> > > consistency with other platforms or not?
> >
> > Didn't the installer used to do that in earlier releases and that
> > was removed when Postgres was able to "drop privileges" when the
> > service is started?
> >
> > I remember a lot of problems around the specific Postgres service
> > account when that still was the case.
>
> Note that there's no "official" Windows installer, and companies providing
> one
> are free to implement it the way they want, which can contradict the
> official
> documentation.  The download section of the website clearly says that this
> is a
> third-party installer.
>
> For now there's only the EDB installer that remains, but I think that some
> time
> ago there was 2 or 3 different providers.
>
> For the EDB installer, I'm not sure why or when it was changed, but it
> indeed
> used to have a dedicated local account and now relies on "Local System
> Account"
> or something like that.  But IIRC, when it used to create a local account
> the
> name could be configured, so there was no guarantee of a local "postgres"
> account by then either.
>
>
Our technical definition aside, the fact is our users consider the sole EDB
installer to be official.

If the ultimate solution is to update:

https://www.enterprisedb.com/downloads/postgres-postgresql-downloads

to have its own installation and upgrade supplement to the official
documentation then I'd be fine with that.  But as of now the "Installation
Guide" points back to the official documentation, which has no actual
distribution specific information while simultaneously reinforcing the fact
that it is an official installer.

I get sending people to the EDB web services team for download issues since
we don't host the binaries.  That aspect of them being third-party doesn't
seem to be an issue.

But for documentation, given the current state of things, whether we amend
our docs or highly encourage the people who are benefiting financially from
being our de facto official Windows installer provider to provide separate
documentation to address this apparent short-coming that is harming our
image in the Windows community, I don't really care, as long as something
changes.

In the end the problem is ours and cannot be simply assigned to a
third-party.  So let's resolve it here (on this list, whatever the
solution) where representatives from all parties are present.

David J.


Re: Official Windows Installer and Documentation

2022-07-27 Thread David G. Johnston
On Wednesday, July 27, 2022, Julien Rouhaud  wrote:

> On Wed, Jul 27, 2022 at 07:02:51PM -0700, David G. Johnston wrote:
> >
> > In the end the problem is ours and cannot be simply assigned to a
> > third-party.  So let's resolve it here (on this list, whatever the
> > solution) where representatives from all parties are present.
>
> We could amend the pg_upgrade (and maybe other if needed, but I don't see
> any
> other occurences of RUNAS) documentation to be a bit more general, like the
> non-windows part of it, maybe something like
>
> For Windows users, you must be logged into an administrative account, and
> then
> start a shell as the user running the postgres service and set the proper
> path.
> Assuming a user named postgres and the binaries installed in C:\Program
> Files\PostgreSQL\14:
>
> RUNAS /USER:postgres "CMD.EXE"
> SET PATH=%PATH%;C:\Program Files\PostgreSQL\14\bin;
>
> It's ultimately up to the users to adapt the commands to match their
> environment.
>

Ultimately we do our users the best service if when they operate an
installation using defaults that they have documentation showing how to
perform something like an upgrade that works with those defaults.  I don’t
see much point making that change in isolation until it is obvious nothing
better is forthcoming. If the o/s user postgres doesn’t exist then you need
to supply -U postgres cause the install user for PostgresSQL is still
postgres.  So why not assume the user is whatever the EDB installer uses
and make that the example?  If someone has an install on Windows that uses
the postgres account adapting the command for them should be trivial and
the majority installer users get a command sequence that works.

David J.


Re: Official Windows Installer and Documentation

2022-07-27 Thread David G. Johnston
On Wed, Jul 27, 2022 at 8:22 PM Tom Lane  wrote:

> I wrote:
> > If EDB isn't adequately filling in the documentation for the behavior
> > of their packaging, that's on them.
>
> Having now looked more closely at the pg_upgrade documentation,
> I don't think this is exactly EDB's fault; it's text that should
> never have been there to begin with.  ISTM we need to simply rip out
> lines 431..448 of pgupgrade.sgml, that is all the Windows-specific
> text starting with
>
>  For Windows users, you must be logged into an administrative account,
> and
>
> That has got nothing to recommend it: we do not generally provide
> platform-specific details in these man pages, and to the extent it
> provides details, those details are likely to be wrong.


I mean, we do provide platform-specific details/examples, it's just that
platform is a source installed Linux platform (though pathless)

Does the avoidance of dealing with other platforms also apply to NET STOP
or do you find that an acceptable variance?  Or are you suggesting that
basically all O/S commands should be zapped?  If not, then rewriting 442 to
446 to just be the command seems worthwhile.  I'd say pg_upgrade warrants
an examples section like pg_basebackup has (though obviously pg_upgrade is
procedural).

I do have another observation:

https://github.com/postgres/postgres/blob/4fc6b6eefcf98f79211bb790ee890ebcb05c178d/src/bin/pg_upgrade/check.c#L665

 if (PQntuples(res) != 1 ||
atooid(PQgetvalue(res, 0, 1)) != BOOTSTRAP_SUPERUSERID)
pg_fatal("database user \"%s\" is not the install user",
os_info.user);

Any reason to not inform the DBA the name of the install user here?  Sure,
it is almost certainly postgres, but it also seems like an easy win in
order for them, and anyone they may ask for help, to know exactly the name
of install user in the clusters should that end up being the issue.
Additionally, from what I can tell, if that check does fail (or any of the
checks really) it is not possible to tell whether the check was being
performed against the old or new server.  The user does not know that
checks against the old server are performed first then checks against the
new one, and there are no banners saying "checking old/new"

David J.


Re: pg_auth_members.grantor is bunk

2022-07-28 Thread David G. Johnston
On Thu, Jul 28, 2022 at 12:09 PM Robert Haas  wrote:

> On Tue, Jul 26, 2022 at 12:46 PM Robert Haas 
> wrote:
> > I believe that these patches are mostly complete, but I think that
> > dumpRoleMembership() probably needs some more work. I don't know what
> > exactly, but there's nothing to cause it to dump the role grants in an
> > order that will create dependent grants after the things that they
> > depend on, which seems essential.
>
> OK, so I fixed that, and also updated the documentation a bit more. I
> think these patches are basically done, and I'd like to get them
> committed before too much more time goes by, because I have other
> things that depend on this which I also want to get done for this
> release. Anybody object?
>
> I'm hoping not, because, while this is a behavior change, the current
> state of play in this area is just terrible. To my knowledge, this is
> the only place in the system where we allow a dangling OID reference
> in a catalog table to persist after the object to which it refers has
> been dropped. I believe it's also the object type where multiple
> grants by different grantors aren't tracked separately, and where the
> grantor need not themselves have the permission being granted. It
> doesn't really look like any of these things were intentional behavior
> so much as just ... nobody ever bothered to write the code to make it
> work properly. I'm hoping the fact that I have now done that will be
> viewed as a good thing, but maybe that won't turn out to be the case.
>
>
I suggest changing \du memberof to output something like this:

select r.rolname,
array(
  select format('%s:%s/%s',
b.rolname,
case when m.admin_option then 'admin' else 'member' end,
g.rolname)
  from pg_catalog.pg_auth_members m
  join pg_catalog.pg_roles b on (m.roleid = b.oid)
  join pg_catalog.pg_roles g on (m.grantor = g.oid)
  where m.member = r.oid
) as memberof
from pg_catalog.pg_roles r where r.rolname !~ '^pg_';

 rolname |  memberof
-+
 vagrant | {}
 o   | {}
 a   | {o:admin/p,o:admin/vagrant}
 x   | {o:admin/a,p:member/vagrant}
 b   | {o:admin/a}
 p   | {o:admin/vagrant}
 y   | {x:member/vagrant}
 q   | {}
 r   | {q:admin/vagrant}
 s   | {}
 t   | {q:admin/vagrant,s:member/vagrant}


(needs sorting, tried to model it after ACL - column privileges
specifically)

=> \dp mytable
  Access privileges
 Schema |  Name   | Type  |   Access privileges   |   Column privileges   |
Policies
+-+---+---+---+--
 public | mytable | table | miriam=arwdDxt/miriam+| col1:+|
| |   | =r/miriam+|   miriam_rw=rw/miriam |
| |   | admin=arw/miriam  |   |
(1 row)

If we aren't dead set on having \du and \dg be aliases for each other I'd
rather redesign \dg (or add a new meta-command) to be a group-centric view
of this exact same data instead of user-centric one.  Namely it has a
"members" column instead of "memberof" and have it output, one line per
member:

user=[admin|member]/grantor

I looked over the rest of the patch and played with the circularity a bit,
which motivated the expanded info in \du, and the confirmation that two
separate admin grants that are not circular can exist.

I don't have any meaningful insight as to breaking things with these
changes but I am strongly in favor of tightening this up and formalizing it.

David J.


Re: pg_auth_members.grantor is bunk

2022-07-31 Thread David G. Johnston
On Sun, Jul 31, 2022 at 11:18 AM Stephen Frost  wrote:

> Greetings,
>
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Tue, Jul 26, 2022 at 12:46 PM Robert Haas 
> wrote:
>
> +   }
> +
> +   /*
> +* Disallow attempts to grant ADMIN OPTION back to a user who
> granted it
> +* to you, similar to what check_circularity does for ACLs. We
> want the
> +* chains of grants to remain acyclic, so that it's always
> possible to use
> +* REVOKE .. CASCADE to clean up all grants that depend on the one
> being
> +* revoked.
> +*
> +* NB: This check might look redundant with the check for
> membership
> +* loops above, but it isn't. That's checking for role-member loop
> (e.g.
> +* A is a member of B and B is a member of A) while this is
> checking for
> +* a member-grantor loop (e.g. A gave ADMIN OPTION to X to B and
> now B, who
> +* has no other source of ADMIN OPTION on X, tries to give ADMIN
> OPTION
> +* on X back to A).
> +*/
>
> With this exact scenario, wouldn't it just be a no-op as A must have
> ADMIN OPTION already on X?  The spec says that no cycles of role
> authorizations are allowed.


Role A must have admin option for X to grant membership in X (with or
without admin option) to B.  But that doesn't preclude A from getting
another admin option from someone else.  That someone else cannot be
someone to whom they gave admin option to however. So B cannot grant admin
option back to A but role P could if it was basically a sibling of A (i.e.,
both getting their initial admin option from someone else).

If they do have admin option twice it should be possible to drop one of
them, the prohibition should be on dropping the only admin option
permission a role has for some other role.  The commit message for 2
contemplates this though I haven't gone through the revocation code in
detail.


> I'm trying to get this review out sooner than later and so I might be
> missing something, but looking at the regression test for this and these
> error messages, feels like the 'circular' error message makes more sense
> than the 'your own grantor' message that actually ends up being
> returned in that regression test.
>

Having a more specific error seems reasonable, faster to track down what
the problem is.

I think that the whole graph dynamic of this might need some presentation
work (messages and/or psql and/or functions) ; but assuming the errors are
handled improved messages and/or presentation of graphs can be a separate
enhancement.

David J.


Re: Triggers should work in isolation, with a final conflict detection step

2022-08-01 Thread David G. Johnston
On Sunday, July 31, 2022, Gianluca Calcagni  wrote:

>
> The real drawback is that such approach is forgoing the natural principle
> of *"separation of concerns"*! I have been looking into using trigger
> frameworks to solve this problem, but there is no trigger framework that is
> able to meet my main expectations: in short, *isolation* and *conflict
> detection*.
>
> You may notice that I took inspiration from GIT for most of the concepts
> above (e.g. "isolation" actually means "forking").
>
> I realize that this is a huge request,
>

So install a single c-language trigger function that does all of that and
provide some functions to manage adding delegation commands in user-space.
Make it all work with create extension.  Users will have to adhere to the
guidelines suggested.

I am against having core incorporate such code into the main project.  The
benefits/cost_complexity ratio is too small.

IOW, I suspect the only realistic way this gets into core is if you, its
champion, pay to have it developed, but even then I don’t think we should
accept such a feature even if it was well written.  So if you go that route
it should leverage extension mechanisms.  We may add some code hooks though
if those are requested and substantiated.

David J.


Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml

2022-08-12 Thread David G. Johnston
On Fri, Aug 12, 2022 at 12:48 PM Bruce Momjian  wrote:

> On Mon, Jul 18, 2022 at 08:04:12PM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2022-07-18 19:47:39 -0700, David G. Johnston wrote:
> > > On Thu, Jul 14, 2022 at 4:31 PM Andres Freund 
> wrote:
> > > > It might make sense to still say semi-accurate, but adjust the
> explanation
> > > > to
> > > > say that stats reporting is not instantaneous?
> > > >
> > > >
> > > Unless that delay manifests in executing an UPDATE in a session then
> > > looking at these views in the same session and not seeing that update
> > > reflected I wouldn't mention it.
> >
> > Depending on which stats you're looking at, yes, that could totally
> happen. I
> > don't think the issue is not seeing changes from the current transaction
> > though - it's that *after* commit you might not see them for a while
> (the're
> > transmitted not more than once a second, and can be delayed up to 60s if
> > there's contention).
>
> So the docs don't need any changes, I assume.
>
>
I dislike using the word accurate here now, it will be accurate, but we
don't promise perfect timeliness.  So it needs to change:

 diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 04a04e0e5f..360807c8f9 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -652,9 +652,8 @@ vacuum insert threshold = vacuum base insert threshold
+ vacuum insert scale fac
 tuples to be frozen by earlier vacuums.  The number of obsolete tuples
and
 the number of inserted tuples are obtained from the cumulative
statistics system;
 it is a semi-accurate count updated by each UPDATE,
-DELETE and INSERT operation.
 (It is
-only semi-accurate because some information might be lost under heavy
-load.)  If the relfrozenxid value of the
table
+DELETE and INSERT operation.
+If the relfrozenxid value of the table
 is more than vacuum_freeze_table_age transactions
old,
 an aggressive vacuum is performed to freeze old tuples and advance
 relfrozenxid; otherwise, only pages that
have been modified

However, also replace the remaining instance of "a semi-accurate count"
with "an eventually-consistent count".

...it is an eventually-consistent count updated by each UPDATE, DELETE, and
INSERT operation.

David J.


Re: Correct handling of blank/commented lines in PSQL interactive-mode history

2021-09-06 Thread David G. Johnston
On Mon, Sep 6, 2021 at 7:13 AM Greg Nancarrow  wrote:

> I've attached a patch that corrects the behaviour.
> For the type of lines mentioned, the patch makes the history behave
> more like Bash history.
>

I have my doubts that you've really fixed anything here since Bash is a
line-oriented shell while psql is a statement-oriented one.  This is a
feature.

What you are observing is, I think, a side-effect of that fact that
comments cannot terminate statements.  That seems reasonable.  In short,
your BEFORE results make sense and don't require fixing.

David J.


Re: [BUG?] SET TIME ZONE doesn't work with abbreviations

2021-09-07 Thread David G. Johnston
On Tuesday, September 7, 2021, Aleksander Alekseev 
wrote:

> Hi hackers,
>
> I noticed that `SET TIME ZONE` / `SET timezone TO` don't work with
> abbreviations:
>
> Is it a bug or this behavior is intentional (something to do with SQL
> standard, perhaps)?
>
>
Well, given that the limitation is documented I’d have to say it is
intentional:

You cannot set the configuration parameters TimeZone

 or log_timezone

to
a time zone abbreviation, but you can use abbreviations in date/time input
values and with the AT TIME ZONE operator.

https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES

David J.


Re: Confusing messages about index row size

2021-09-11 Thread David G. Johnston
On Sunday, September 12, 2021, Jaime Casanova 
wrote:
>
>
> So, what is it? the index row size could be upto 8191 or cannot be
> greater than 2704?
>

The wording doesn’t change between the two: The size cannot be greater the
8191 regardless of the index type being used.  This check is first,
probably because it is cheaper, and just normal abstraction layering, but
it doesn’t preclude individual indexes imposing their own constraint, as
evidenced by the lower maximum of 2704 in this specific setup.

It may be non-ideal from a UX perspective to have a moving target in the
error messages, but they are consistent and accurate, and doesn’t seem
worthwhile to expend much effort on usability since the errors should
themselves be rare.

David J.


Re: Access last_sqlstate from libpq

2021-09-17 Thread David G. Johnston
On Friday, September 17, 2021, Daniel Frey  wrote:
>
>
> However, this is not possible in a couple of other cases where I don't
> have a PGresult*, only the PGconn* is available:
>
> * PQconnectdb (and variants)
>
> * PQputCopyData
> * PQputCopyEnd
> * PQgetCopyData
>
> * lo_* (large object functions)
>
> After some research, it appears that PGconn* does have a field called
> last_sqlstate - it just can't be accessed.
> Are there any problems adding a simple accessor to libpq? Or is there some
> way to access it that I'm missing?
>

I suspect the reason for the omission is that there isn’t any usable data
to be gotten.  Those interfaces are not SQL interfaces and thus do not have
a relevant last_sqlstate to report.

David J.


Re: Access last_sqlstate from libpq

2021-09-17 Thread David G. Johnston
On Friday, September 17, 2021, Daniel Frey  wrote:
>
>
> Are you sure or are you guessing?


>
Guessing regarding the implementations of these interfaces.

David J.


Re: Improved PostgreSQL Mathematics Support.

2021-09-19 Thread David G. Johnston
On Sunday, September 19, 2021, A Z  wrote:

>
> Is there someone on this email list which could please have a look
> at the specifications that I have posted, and reply and get back to
> me?
>

Given the number of posts you’ve made I would have to conclude that the
answer to that question is no.  There is presently no interest in this from
the people who read these mailing lists.

David J.


Re: pg_dump does not dump tables created in information_schema schema

2021-10-07 Thread David G. Johnston
On Thursday, October 7, 2021, vignesh C  wrote:

>
> Should tables be allowed to create in "information_schema" schema, if
> yes should the tables/publications be dumped while dumping database
> contents?
>
>
I presume you have to be superuser to do this.  If so, this would seem to
fit under the “we don’t stop you, but you shouldn’t” advice that we apply
throughout the system, like in say modifying stuff in pg_catalog.
Information_schema is an internal schema attached to an static for a given
release.

David J.


Re: Proposal: allow database-specific role memberships

2021-10-10 Thread David G. Johnston
On Sun, Oct 10, 2021 at 2:29 PM Kenaniah Cerny  wrote:

> In building off of prior art regarding the 'pg_read_all_data' and
> 'pg_write_all_data' roles, I would like to propose an extension to roles
> that would allow for database-specific role memberships (for the purpose of
> granting database-specific privileges) as an additional layer of
> abstraction.
>
> = Problem =
>
> There is currently no mechanism to grant the privileges afforded by the
> default roles on a per-database basis. This makes it difficult to cleanly
> accomplish permissions such as 'db_datareader' and 'db_datawriter' (which
> are database-level roles in SQL Server that respectively grant read and
> write access within a specific database).
>
> The recently-added 'pg_read_all_data' and 'pg_write_all_data' work
> similarly to 'db_datareader' and 'db_datawriter', but work cluster-wide.
>

My first impression is that this is more complex than just restricting
which databases users are allowed to connect to.  The added flexibility
this would provide has some benefit but doesn't seem worth the added
complexity.

David J.


Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2021-07-13 Thread David G. Johnston
On Tue, Jul 13, 2021 at 3:30 AM Ibrar Ahmed  wrote:

>
>
> On Tue, Mar 9, 2021 at 9:01 PM David Steele  wrote:
>
>> On 3/9/21 10:08 AM, David G. Johnston wrote:
>> >
>> > On Tuesday, March 9, 2021, David Steele > > <mailto:da...@pgmasters.net>> wrote:
>> >
>> > Further, I think we should close this entry at the end of the CF if
>> > it does not attract committer interest. Tom is not in favor of the
>> > patch and it appears Alexander decided not to commit it.
>> >
>> > Pavel re-reviewed it and was fine with ready-to-commit so that status
>> > seems fine.
>>
>> Ah yes, that was my mistake.
>>
>> Regards,
>> --
>> -David
>> da...@pgmasters.net
>>
>>
>>
> The status of the patch is "Need Review" which was previously "Ready for
> Committer ''. After @David G
> and @David Steele  comments, it's not clear whether
> it should be "Read for commit" or "Need Review".
>
>
I changed it to Ready to Commit based on the same logic as my reply to
David quoted above.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston
On Wednesday, July 14, 2021, Dave Cramer  wrote:

>
>
> Notice the upgraded version is 1.5 and the new version is 1.8
>
> I would think somewhere in the upgrade of the schema there should have
> been a create extension pg_stat_statements ?
>

That would be a faulty assumption.  Modules do not get upgraded during a
server version upgrade.  This is a good thing, IMO.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston
On Wed, Jul 14, 2021 at 11:59 AM Dave Cramer  wrote:

>
>
> On Wed, 14 Jul 2021 at 14:47, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Wednesday, July 14, 2021, Dave Cramer  wrote:
>>
>>>
>>>
>>> Notice the upgraded version is 1.5 and the new version is 1.8
>>>
>>> I would think somewhere in the upgrade of the schema there should have
>>> been a create extension pg_stat_statements ?
>>>
>>
>> That would be a faulty assumption.  Modules do not get upgraded during a
>> server version upgrade.  This is a good thing, IMO.
>>
>
> This is from the documentation of pg_upgrade
>
> Install any custom shared object files (or DLLs) used by the old cluster
> into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
> some other source. Do not install the schema definitions, e.g., CREATE
> EXTENSION pgcrypto, because these will be upgraded from the old cluster.
> Also, any custom full text search files (dictionary, synonym, thesaurus,
> stop words) must also be copied to the new cluster.
>
> If indeed modules do not get upgraded then the above is confusing at best,
> and misleading at worst.
>
>
"Install ... files used by the old cluster" (which must be binary
compatible with the new cluster as noted elsewhere on that page) supports
the claim that it is the old cluster's version that is going to result.
But I agree that saying "because these will be upgraded from the old
cluster" is poorly worded and should be fixed to be more precise here.

Something like, "... because the installed extensions will be copied from
the old cluster during the upgrade."

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston
On Wed, Jul 14, 2021 at 12:21 PM Dave Cramer  wrote:

>
> On Wed, 14 Jul 2021 at 15:09, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> Something like, "... because the installed extensions will be copied from
>> the old cluster during the upgrade."
>>
>
> This is still rather opaque. Without intimate knowledge of what changes
> have occurred in each extension I have installed; how would I know what I
> have to fix after the upgrade.
>

The point of this behavior is that you don't have to fix anything after an
upgrade - so long as your current extension version works on the new
cluster.  If you are upgrading in such a way that the current extension and
new cluster are not compatible you need to not do that.  Upgrade instead to
a lesser version where they are compatible.  Then upgrade your extension to
its newer version, changing any required user code that such an upgrade
requires, then upgrade the server again.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thursday, July 15, 2021, Dave Cramer  wrote:

>
> As a first step I propose the following
>
> diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.
> sgml
> index a83c63cd98..f747a4473a 100644
> --- a/doc/src/sgml/ref/pgupgrade.sgml
> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -305,9 +305,10 @@ make prefix=/usr/local/pgsql.new install
>   Install any custom shared object files (or DLLs) used by the old
> cluster
>   into the new cluster, e.g., pgcrypto.so,
>   whether they are from contrib
> - or some other source. Do not install the schema definitions, e.g.,
> - CREATE EXTENSION pgcrypto, because these will be
> upgraded
> - from the old cluster.
> + or some other source. Do not execute CREATE EXTENSION on the new
> cluster.
> + The extensions will be upgraded from the old cluster. However it may
> be
> + necessary to recreate the extension on the new server after the
> upgrade
> + to ensure compatibility with the new library.
>   Also, any custom full text search files (dictionary, synonym,
>   thesaurus, stop words) must also be copied to the new cluster.
>  
>
>

I think this needs some work to distinguish between core extensions where
we know the new server already has a library installed and external
extensions where it’s expected that the library that is added to the new
cluster is compatible with the version being migrated (not upgraded).  In
short, it should never be necessary to recreate the extension.  My
uncertainty revolves around core extensions since it seems odd to tell the
user to overwrite them with versions from an older version of PostgreSQL.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thursday, July 15, 2021, David G. Johnston 
wrote:

> On Thursday, July 15, 2021, Dave Cramer  wrote:
>
>>
>>   Install any custom shared object files (or DLLs) used by the old
>> cluster
>>   into the new cluster, e.g., pgcrypto.so,
>>   whether they are from contrib
>> - or some other source.
>> However it may be
>> + necessary to recreate the extension on the new server after the
>> upgrade
>> + to ensure compatibility with the new library.
>>
>>
>
>  My uncertainty revolves around core extensions since it seems odd to tell
> the user to overwrite them with versions from an older version of
> PostgreSQL.
>

Ok. Just re-read the docs a third time…no uncertainty regarding contrib
now…following the first part of the instructions means that before one
could re-run create extension they would need to restore the original
contrib library files to avoid the new extension code using the old
library.  So that whole part about recreation is inconsistent with the
existing unchanged text.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thursday, July 15, 2021, Dave Cramer  wrote:

> Well clearly my suggestion was not clear if you interpreted that as over
> writing them with versions from an older version of PostgreSQL.
>
>>
>>
Ignoring my original interpretation as being moot; the section immediately
preceding your edit says to do exactly that.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thursday, July 15, 2021, Dave Cramer  wrote:

>
> I'm thinking at this point we need something a bit more sophisticated like
>
> ALTER EXTENSION ... UPGRADE. And the extension knows how to upgrade itself.
>

I’m not familiar with what hoops extensions jump through to facilitate
upgrades but even if it was as simple as “create extension upgrade” I
wouldn’t have pg_upgrade execute that command (or at least not by
default).  I would maybe have pg_upgrade help move the libraries over from
the old server (and we must be dealing with different databases having
different extension versions in some manner…).

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thu, Jul 15, 2021 at 8:43 AM Dave Cramer  wrote:

>
> On Thu, 15 Jul 2021 at 11:29, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> I’m not familiar with what hoops extensions jump through to facilitate
>> upgrades but even if it was as simple as “create extension upgrade” I
>> wouldn’t have pg_upgrade execute that command (or at least not by
>> default).  I would maybe have pg_upgrade help move the libraries over from
>> the old server (and we must be dealing with different databases having
>> different extension versions in some manner…).
>>
>
> Well IMHO the status quo is terrible. Perhaps you have a suggestion on how
> to make it better ?
>

To a certain extent it is beyond pg_upgrade's purview to care about
extension explicitly - it considers them "data" on the database side and
copies over the schema and, within reason, punts on the filesystem by
saying "ensure that the existing versions of your extensions in the old
cluster can correctly run in the new cluster" (which basically just takes a
simple file copy/install and the assumption you are upgrading to a server
version that is supported by the extension in question - also a reasonable
requirement).  In short, I don't have a suggestion on how to improve that
and don't really consider it a terrible flaw in pg_upgrade.

I'll readily admit that I lack sufficient knowledge here to make such
suggestions as I don't hold any optionions that things are "quite terrible"
and haven't been presented with concrete problems to consider alternatives
for.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thu, Jul 15, 2021 at 9:16 AM Dave Cramer  wrote:

> Eh, and
>>   pg_upgrade [other switches] --upgrade-extensions
>> sounds good too ...
>>
>
> Ultimately I believe this is the solution, however we still need to teach
> extensions how to upgrade themselves or emit a message saying they can't,
> or even ignore if it truly is a NOP.
>
>
If it's opt-in and simple I don't really care but I doubt I would use it as
personally I'd rather the upgrade not touch my application at all (to the
extent possible) and just basically promise that I'll get a reliable
upgrade.  Then I'll go ahead and ensure I have the backups of the new
version and that my application works correctly, then just run the "ALTER
EXTENSION" myself.  But anything that will solve pain points for
same-PostgreSQL-version extension upgrading is great.

I would say that it probably should be "--upgrade-extension=aaa
--upgrade_extension=bbb" though if we are going to the effort to offer
something.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thu, Jul 15, 2021 at 9:36 AM Jan Wieck  wrote:

>
> I am a bit confused here. From the previous exchange I get the feeling
> that you haven't created and maintained a single extension that survived
> a single version upgrade of itself or PostgreSQL (in the latter case
> requiring code changes to the extension due to internal API changes
> inside the PostgreSQL version).
>

Correct.

>
> I have. PL/Profiler to be explicit.
>
> Can you please elaborate what experience your opinion is based on?
>
>
I am an application developer who operates on the principle of "change only
one thing at a time".

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thu, Jul 15, 2021 at 9:58 AM Jan Wieck  wrote:

> On 7/15/21 12:46 PM, David G. Johnston wrote:
>
> > I am an application developer who operates on the principle of "change
> > only one thing at a time".
>
> Which pg_upgrade by definition isn't. It is bringing ALL the changes
> from one major version to the target version, which may be multiple at
> once. Including, but not limited to, catalog schema changes, SQL
> language changes, extension behavior changes and utility command
> behavior changes.
>
> On that principle, you should advocate against using pg_upgrade in the
> first place.
>
>
Not that I use extensions a whole lot (yes, my overall experience here is
slim) but I would definitely prefer those that allow me to stay on a single
PostgreSQL major version while migrating between major versions of their
own product.  Extensions that don't fit this model (i.e., choose to treat
their major version as being the same as the major version of PostgreSQL
they were developed for) must by necessity be upgraded simultaneously with
the PostgreSQL server.  But while PostgreSQL doesn't really have a choice
here - it cannot be expected to subdivide itself - extensions (at least
external ones - PostGIS is one I have in mind presently) - can and often do
attempt to support multiple versions of PostgreSQL for whatever major
versions of their product they are offering.  For these it is possible to
adhere to the "change one thing at a time principle" and to treat the
extensions as not being part of "ALL the changes from one major version to
the target version..."

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thu, Jul 15, 2021 at 11:14 AM Jan Wieck  wrote:

> On 7/15/21 1:10 PM, David G. Johnston wrote:
> > ... But while
> > PostgreSQL doesn't really have a choice here - it cannot be expected to
> > subdivide itself - extensions (at least external ones - PostGIS is one I
> > have in mind presently) - can and often do attempt to support multiple
> > versions of PostgreSQL for whatever major versions of their product they
> > are offering.  For these it is possible to adhere to the "change one
> > thing at a time principle" and to treat the extensions as not being part
> > of "ALL the changes from one major version to the target version..."
>
> You may make that exception for an external extension like PostGIS. But
> I don't think it is valid for one distributed in sync with the core
> system in the contrib package, like pg_stat_statements. Which happens to
> be the one named in the subject line of this entire discussion.
>
>
Yep, and IIUC running "CREATE EXTENSION pg_stat_statements VERSION '1.5';"
works correctly in v13 as does executing "ALTER EXTENSION
pg_stat_statements UPDATE;" while version 1.5 is installed.  So even
without doing the copying of the old contrib libraries to the new server
such a "one at a time" procedure would work just fine for this particular
contrib extension.

And since the OP was unaware of the presence of the existing ALTER
EXTENSION UPDATE command I'm not sure at what point a "lack of features"
complaint here is due to lack of knowledge or actual problems (yes, I did
forget too but at least this strengthens my position that one-at-a-time
methods are workable, even today).

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thu, Jul 15, 2021 at 11:52 AM Dave Cramer 
wrote:

>
> On Thu, 15 Jul 2021 at 14:31, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> Yep, and IIUC running "CREATE EXTENSION pg_stat_statements VERSION
>> '1.5';" works correctly in v13 as does executing
>>
> While it does work there are issues with dumping and restoring a database
> with the old version of pg_stat_statements in it that would only be found
> by dumping and restoring.
>

I'm unsure how this impacts the broader discussion.  At worse you'd have a
chance to manually run the update command on the new cluster before using
dump/restore.


>
>> "ALTER EXTENSION pg_stat_statements UPDATE;" while version 1.5 is
>> installed.
>>
>
>
>
>> So even without doing the copying of the old contrib libraries to the new
>> server such a "one at a time" procedure would work just fine for this
>> particular contrib extension.
>>
>
> You cannot copy the old contrib libraries into the new server. This will
> fail due to changes in the API and various exported variables either not
> being there or being renamed.
>

If this is true then the docs have a bug.  It sounds more like the
documentation should say "ensure that the new cluster has extension
libraries installed that are compatible with the version of the extension
installed on the old cluster".  Whether we want to be even more specific
with regards to contrib I cannot say - it seems like newer versions largely
retain backward compatibility so this is basically a non-issue for contrib
(though maybe individual extensions have their own requirements?)

but it fails to say that the versions that are installed may need to be
> updated.
>

OK, especially as this seems useful outside of pg_upgrade, and if done
separately is something pg_upgrade could just run as part of its new
cluster evaluation scripts.  Knowing whether an extension is outdated
doesn't require the old cluster.
David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread David G. Johnston
On Thursday, July 15, 2021, Jan Wieck  wrote:

> On 7/15/21 4:24 PM, David G. Johnston wrote:
>
> OK, especially as this seems useful outside of pg_upgrade, and if done
>> separately is something pg_upgrade could just run as part of its new
>> cluster evaluation scripts.  Knowing whether an extension is outdated
>> doesn't require the old cluster.
>>
>
> Knowing that (the extension is outdated) exactly how? Can you give us an
> example query, maybe a few SQL snippets explaining what exactly you are
> talking about? Because at this point you completely lost me.
>

I was mostly going off other people saying it was possible.  In any case,
looking at pg_available_extension_versions, once you figure out how to
order by the version text column, would let you check if any installed
extensions are not their latest version.

David J.


Re: Micro-optimizations to avoid some strlen calls.

2021-07-20 Thread David G. Johnston
On Tue, Jul 20, 2021 at 5:28 PM Michael Paquier  wrote:

> On Mon, Jul 19, 2021 at 07:48:55PM -0300, Ranier Vilela wrote:
> > There are some places, where strlen can have an overhead.
> > This patch tries to fix this.
> >
> > Pass check-world at linux ubuntu (20.04) 64 bits.
>
> Why does it matter?  No code paths you are changing here are
> performance-critical, meaning that such calls won't really show up
> high in profiles.
>
> I don't think there is anything to change here.
>

Agreed.  To borrow from a nearby email of a similar nature (PGConn
information retrieval IIRC) - it is not generally a benefit to avoid
function call access to data multiple times in a block by substituting in a
saved local variable. The function call tends to be more readable then
having yet one more unimportant name to keep in short-term memory.  As much
code already conforms to that the status quo is a preferred state unless
there is a demonstrable performance gain to be had.  The readability, and
lack of churn, is otherwise more important.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-28 Thread David G. Johnston
On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:

> I came up with the attached patch.
>

Thank you.  It is an improvement but I think more could be done here (not
exactly sure what - though removing the "copy binaries for contrib modules
from the old server" seems like a decent second step.)

I'm not sure it really needs a parenthetical, and I personally dislike
using "Consider" to start the sentence.

"Bringing extensions up to the newest version available on the new server
can be done later using ALTER EXTENSION UPGRADE (after ensuring the correct
binaries are installed)."

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 7:56 AM Bruce Momjian  wrote:

> On Wed, Jul 28, 2021 at 09:35:28PM -0700, David G. Johnston wrote:
> > On Wed, Jul 28, 2021 at 6:52 PM Bruce Momjian  wrote:
> >
> > I came up with the attached patch.
> >
> >
> > Thank you.  It is an improvement but I think more could be done here (not
> > exactly sure what - though removing the "copy binaries for contrib
> modules from
> > the old server" seems like a decent second step.)
>
> Uh, I don't see that text.
>
>
"""
 5. Install custom shared object files

Install any custom shared object files (or DLLs) used by the old cluster
into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
some other source. Do not install the schema definitions, e.g., CREATE
EXTENSION pgcrypto, because these will be upgraded from the old cluster.
Also, any custom full text search files (dictionary, synonym, thesaurus,
stop words) must also be copied to the new cluster.
"""
I have an issue with the fragment "whether they are from contrib" - my
understanding at this point is that because of the way we package and
version contrib it should not be necessary to copy those shared object
files from the old to the new server (maybe, just maybe, with a
qualification that you are upgrading between two versions that were in
support during the same time period).

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 8:36 AM Dave Cramer  wrote:

>
>
> I have an issue with the fragment "whether they are from contrib" - my
>> understanding at this point is that because of the way we package and
>> version contrib it should not be necessary to copy those shared object
>> files from the old to the new server (maybe, just maybe, with a
>> qualification that you are upgrading between two versions that were in
>> support during the same time period).
>>
>
> Just to clarify. In no case are binaries copied from the old server to the
> new server. Whether from contrib or otherwise.
>
>
I had used "binaries" when I should have written "shared object files".  I
just imagine a DLL as being a binary file so it seems accurate but we use
the term differently I suppose?

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 8:42 AM Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 11:36:19AM -0400, Dave Cramer wrote:
> >
> >
> > I have an issue with the fragment "whether they are from contrib" -
> my
> > understanding at this point is that because of the way we package and
> > version contrib it should not be necessary to copy those shared
> object
> > files from the old to the new server (maybe, just maybe, with a
> > qualification that you are upgrading between two versions that were
> in
> > support during the same time period).
> >
> >
> > Just to clarify. In no case are binaries copied from the old server to
> the new
> > server. Whether from contrib or otherwise.
>
> Right.  Those are _binaries_ and therefore made to match a specific
> Postgres binary.  They might work or might not, but copying them is
> never a good idea --- they should be recompiled to match the new server
> binary, even if the extension had no version/API changes.
>

Ok, looking at the flow again, where exactly would the user even be able to
execute "CREATE EXTENSION" meaningfully?  The relevant databases do not
exist (not totally sure what happens to the postgres database created
during the initdb step...) so at the point where the user is "installing
the extension" all they can reasonably do is a server-level install (they
could maybe create extension in the postgres database, but does that even
matter?).

So, I'd propose simplifying this all to something like:

Install extensions on the new server

Any extensions that are used by the old cluster need to be installed into
the new cluster.  Each database in the old cluster will have its current
version of all extensions migrated to the new cluster as-is.  You can use
the ALTER EXTENSION command, on a per-database basis, to update its
extensions post-upgrade.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:28 AM Jan Wieck  wrote:

> On 7/29/21 12:00 PM, David G. Johnston wrote:
> > Ok, looking at the flow again, where exactly would the user even be able
> > to execute "CREATE EXTENSION" meaningfully?  The relevant databases do
> > not exist (not totally sure what happens to the postgres database
> > created during the initdb step...) so at the point where the user is
> > "installing the extension" all they can reasonably do is a server-level
> > install (they could maybe create extension in the postgres database, but
> > does that even matter?).
> >
> > So, I'd propose simplifying this all to something like:
> >
> > Install extensions on the new server
>
> Extensions are not installed on the server level. Their binary
> components (shared objects) are, but the actual catalog modifications
> that make them accessible are performed per database by CREATE
> EXTENSION, which executes the SQL files associated with the extension.
> And they can be performed differently per database, like for example
> placing one and the same extension into different schemas in different
> databases.
>
> pg_upgrade is not (and should not be) concerned with placing the
> extension's installation components into the new version's lib and share
> directories. But it is pg_upgrade's job to perform the correct catalog
> modification per database during the upgrade.
>

That is exactly the point I am making.  The section is informing the user
of things to do that the server will not do.  Which is "install extension
code into the O/S" and that mentioning CREATE EXTENSION at this point in
the process is talking about something that is simply out-of-scope.



> > Any extensions that are used by the old cluster need to be installed
> > into the new cluster.  Each database in the old cluster will have its
> > current version of all extensions migrated to the new cluster as-is.
> > You can use the ALTER EXTENSION command, on a per-database basis, to
> > update its extensions post-upgrade.
>
> That assumes that the extension SQL files are capable of detecting a
> server version change and perform the necessary (if any) steps to alter
> the extension's objects accordingly.
>
> Off the top of my head I don't remember what happens when one executes
> ALTER EXTENSION ... UPGRADE ... when it is already on the latest version
> *of the extension*. Might be an error or a no-op.
>
> And to make matters worse, it is not possible to work around this with a
> DROP EXTENSION ... CREATE EXTENSION. There are extensions that create
> objects, like user defined data types and functions, that will be
> referenced by end user objects like tables and views.
>
>
These are all excellent points - but at present pg_upgrade simply doesn't
care and hopes that the extension author's documentation deals with these
possibilities in a sane manner.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:34 AM Bruce Momjian  wrote:

> On Thu, Jul 29, 2021 at 12:22:59PM -0400, Dave Cramer wrote:
> > On Thu, 29 Jul 2021 at 12:16, Bruce Momjian  wrote:
> > Can you review the text I just posted?  Thanks.   I think we are
> making
> > progress.  ;-)
> >
> >
> > I am OK with Everything except
> >
> > Do not load the schema definitions,
> > e.g., CREATE EXTENSION pgcrypto, because these
> > will be recreated from the old cluster.  (The extensions may be
> > upgraded later using ALTER EXTENSION ... UPGRADE.)
> >
> >  I take issue with the word "recreated". This implies something new is
> created,
> > when in fact the old definitions are simply copied over.
> >
> > As I said earlier; using the wording "may be upgraded" is not nearly
> cautionary
> > enough.
>
> OK, I changed it to "copy" though I used "recreated" earlier since I was
> worried "copy" would be confused with file copy.  I changed the
> recommendation to "should be".
>
>
I'm warming up to "should" but maybe add a "why" such as "the old versions
are considered unsupported on the newer server".

I dislike "usually via operating system commands", just offload this to the
extension, i.e., "must be installed in the new cluster via installation
procedures specific to, and documented by, each extension (for contrib it
is usually enough to ensure the -contrib package was chosen to be installed
along with the -server package for your operating system.)"

I would simplify the first two sentences to just:

If the old cluster used extensions those same extensions must be installed
in the new cluster via installation procedures specific to, and documented
by, each extension.  For contrib extensions it is usually enough to install
the -contrib package via the same method you used to install the PostgreSQL
server.

I would consider my suggestion for "copy as-is/alter extension" wording in
my previous email in lieu of the existing third and fourth sentences, with
the "should" and "why" wording possibly worked in.  But the existing works
ok.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 9:55 AM Jan Wieck  wrote:

> How exactly do you envision that we, the PostgreSQL project, make sure
> that extension developers provide those mechanisms in the future?
>
>
I have no suggestions here, and don't really plan to get deeply involved in
this area of the project anytime soon.  But it is definitely a topic worthy
of discussion on a new thread.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 11:28 AM Bruce Momjian  wrote:

> I am sorry but none of your suggestions are exciting me --- they seem to
> get into too much detail for the context.
>

Fair, I still need to consider Alvaro's anyway, but given the amount of
general angst surrounding performing a pg_upgrade I do not feel that being
detailed is necessarily a bad thing, so long as the detail is relevant.
But I'll keep this in mind for my next reply.

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 11:35 AM Bruce Momjian  wrote:

>
> Oh, and you can't use the same installation procedures as when you
> installed the extension because that probably included CREATE EXTENSION.
> This really highlights why this is tricky to explain --- we need the
> binaries, but not the SQL that goes with it.
>

Maybe...but the fact that "installation to the O/S" is cluster-wide and
"CREATE EXTENSION" is database-specific I believe this will generally take
care of itself in practice, especially if we leave the part (but ignore any
installation instructions that advise executing create extension).

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-29 Thread David G. Johnston
On Thu, Jul 29, 2021 at 12:28 PM Dave Cramer  wrote:

> So back to the original motivation for bringing this up. Recall that a
> cluster was upgraded. Everything appeared to work fine, except that the
> definitions of the functions were slightly different enough to cause a
> fatal issue on restoring a dump from pg_dump.
> Since pg_upgrade is now part of the core project, we need to make sure
> this is not possible or be much more insistent that the user needs to
> upgrade any extensions that require it.
>

I'm missing something here because I do not recall that level of detail
being provided.  The first email was simply an observation that the
pg_upgraded version and the create extension version were different in the
new cluster - which is working as designed (opinions of said design, at
least right here and now, are immaterial).

>From your piecemeal follow-on descriptions I do see that pg_dump seems to
be involved - though a self-contained demonstration is not available that I
can find.  But so far as pg_dump is concerned it just needs to export the
current version each database is running for a given extension, and
pg_restore issue a CREATE EXTENSION for the same version when prompted.  I
presume it does this correctly but admittedly haven't checked.  IOW, if
pg_dump is failing here it is more likely its own bug and should be fixed
rather than blame pg_upgrade.  Or its pg_stat_statement's bug and it should
be fixed.

In theory the procedure and requirements imposed by pg_upgrade here seem
reasonable.  Fewer moving parts during the upgrade is strictly better.  The
documentation was not clear on how things worked, and so its being cleaned
up, but the how hasn't been shown yet to be a problem nor that simply
running alter extension would be an appropriate solution for this single
case let alone in general.  Since running alter extension manually is
simple constructing such a test case and proving that the alter extension
at least works for it should be straight-forward.

Without that I cannot support changing the behavior or even saying that
users must run alter extension manually to overcome a limitation in
pg_upgrade.  They should do so in order to keep their code base current and
running supported code - but that is a judgement we've always left to the
DBA, with the exception of strongly discouraging not updating to the newest
point release and getting off unsupported major releases.

David J.


David J.


Re: Replace l337sp34k in comments.

2021-07-31 Thread David G. Johnston
On Saturday, July 31, 2021, Peter Geoghegan  wrote:

> On Sat, Jul 31, 2021 at 11:22 AM Andrey Borodin 
> wrote:
> > FWIW, my 2 cents.
> > I do not see much difference between up2date, up-to-date, up to date,
> current, recent, actual, last, newest, correct, fresh etc.
>
> +1.
>
> To me it seems normal to debate wording/terminology with new code
> comments, but that's about it. I find this zeal to change old code
> comments misguided.
>

Maybe in general I would agree but I agree that this warrants an
exception.  While maybe not explicitly stated the use of up2date as a term
is against the de-facto style guide for our project and should be corrected
regardless of how long it took to discover the violation.  We fix other
unimportant but obvious typos all the time and this is no different.  We
don’t ask people to police this but we also don’t turn down well-written
patches.

David J.


Re: Commitfest overflow

2021-08-07 Thread David G. Johnston
On Tue, Aug 3, 2021 at 8:53 AM Simon Riggs 
wrote:

> There are 273 patches in the queue for the Sept Commitfest already, so
> it seems clear the queue is not being cleared down each CF as it was
> before. We've been trying hard, but it's overflowing.
>

I think one simple change here would be of some help.  Every patch should
come on to the commitfest in "Proposed" Status.  Someone besides the author
of the patch then needs to move it from "Proposed" to "Needs Review".  That
person does not have to be a reviewer or committer and is not obligating
themselves to working on the patch (though likely they would provide
input).  They are simply, in the language of Robert's Rules of Order,
seconding the motion.

Removal from the CF could be automatic after the 3rd CF, though the CF
managers involved should, before returning it as "No Interest at this
Time", skim the thread and ask whether any commenters on the thread would
like to see it kept alive in the commitfest by seconding it.  If so
removed, the author should be encouraged to re-submit during the next
version's development cycle.

David J.


Re: Refactor ECPGconnect and allow IPv6 connection

2021-08-07 Thread David G. Johnston
On Mon, Jun 21, 2021 at 3:46 AM kuroda.hay...@fujitsu.com <
kuroda.hay...@fujitsu.com> wrote:

> I will try to remake patches based on the idea.
>

Based upon this comment, and the ongoing discussion about commitfest volume
and complexity, I've moved this to "Waiting on Author".

David J.


Re: Commitfest overflow

2021-08-07 Thread David G. Johnston
On Tue, Aug 3, 2021 at 8:53 AM Simon Riggs 
wrote:

> There are 273 patches in the queue for the Sept Commitfest already, so
> it seems clear the queue is not being cleared down each CF as it was
> before. We've been trying hard, but it's overflowing.
>
> Of those, about 50 items have been waiting more than one year, and
> about 25 entries waiting for more than two years.
>

The "Last Modified" field should probably omitted.  The fact that "Moved to
Next CF" changes this date is unhelpful.  I'd much rather replace it with a
"last patch provided" date, or, better, an "days since" calculation
instead.  Having the system record the email addresses that replied to the
thread instead of strictly looking at people who interact with the
commitfest application would probably also be useful information for
commitfest people looking to see where limited attention would be best
spent.

David J.


Re: Commitfest overflow

2021-08-07 Thread David G. Johnston
On Thu, Aug 5, 2021 at 7:36 AM Robert Haas  wrote:

> Patches that are not being updated regularly have no
> business being part of a CommitFest.
>

As the main issue seems to be "Needs Review" getting punted, the patch
author rightly expects feedback before supplying new patches.  If they are
waiting for months, that isn't on them.


> I don't think a patch should be
> rejected on the strength of a single -1, but when 2 or 3 people have
> shown up to say either that they don't like the idea or that the
> implementation is way off base, it's not helpful to leave things in a
> state that suggests it needs more eyeballs.
>
>
I would agree.  One of those 3 people should then state in the thread where
they put their -1 that they are, in addition, going to mark the patch as
RwF or Rejected in the CF, and then go do just that.  If that isn't done,
and the CfM comes across this situation, they should probably request the
person with the first -1 to make the change instead of doing it for them.
And it doesn't have to be explicit -1s to qualify.

David J.


Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread David G. Johnston
On Mon, Aug 9, 2021 at 1:38 PM Michael Meskes  wrote:

> > I don't want to upset anybody for any reason. I regret that my words
> > have upset you, but I think that they were misinterpreted in a way
> > that I couldn't possibly have predicted. The particular aspect of
>
> I strongly object to that. It's pretty obvious to me that addressing
> people in third person is very offending.
>

And using the third person to avoid making things personal, and properly
represent one's role as a representative as opposed to an individual, is
something at least two of us consider to be "professional".  If others
taking on a professional/formal tone with you is offending I politely
suggest you need to at least cut them some slack when you haven't informed
them of this previously.  Cultural differences happen in both directions.

> How could anybody on the RMT judge what was going on sensibly? There
> > was *zero* information from you (the original committer, our point of
> > contact) about an item that is in a totally unfamiliar part of the
> > code to every other committer. We were effectively forced to make
> > very
> > conservative assumptions about the deadline. I think that it's very
> > likely that this could have been avoided if only you'd engaged to
> > some
> > degree -- if you had said it was a short deadline then we'd likely
> > have taken your word for it, as the relevant subject matter expert
> > and
> > committer in charge of the item. But we were never given that choice.
>
> The same holds the other way round, I only understood later that you
> wanted more information. Had I known that earlier, I would have gladly
> given them.


There is clearly an expectation from the RMT, and at least myself, that:

"The RMT discussed this recently. We are concerned about this issue,
including how it has been handled so far.

If you're unable to resolve the issue (or at least commit time to
resolving the issue) then the RMT will call for the revert of the
original feature patch."

is expected to elicit a response from the comitter in a timely fashion.
Really, any email sent to -hackers from the RMT about a specific commit; or
even any email sent to -hackers by a core team member, is expected to be
responded to in a timely manner.  These teams should not be getting
involved with the day-to-day operations and being responsive to them is
part of the obligation of being a committer.

In hindsight probably the quoted email above should have been worded.

"If you're unable to resolve the issue, or communicate a timely plan for
doing so, within the next week please revert the patch."

Making it clear that the committer should be the one performing the
revert.  Then, absent feedback or a revert, the second email and the RMT
team performing the revert, would be appropriate.

David J.


Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2021-08-10 Thread David G. Johnston
On Tue, Aug 10, 2021 at 12:04 PM Robert Haas  wrote:

> On Tue, Mar 9, 2021 at 10:09 AM David G. Johnston
>  wrote:
> > Frankly, I am hoping for a bit more constructive feedback and even
> collaboration from a committer, specifically Tom, on this one given the
> outstanding user complaints received on the topic, our disagreement
> regarding fixing it (which motivates the patch to better document and add
> tests), and professional courtesy given to a fellow consistent community
> contributor.
> >
> > So, no, making it just go away because one of the dozens of committers
> can’t make time to try and make it work doesn’t sit well with me.  If a
> committer wants to actively reject the patch with an explanation then so be
> it.
>
> I have reviewed this patch and my opinion is that we should reject it,
>

Thank you for the feedback.

> So, the only change here is deleting the word "essentially."


I do tend to find this wishy-washy language to be more annoying than the
community at large.


> I'd suggest keeping
> the existing text and adding a sentence like: "Note that the command
> can still fail for other reasons; for example, it is an error if a
> type with the provided name exists but is not a domain."
>

I would at least like to see this added in response to the various bug
reports that found the shared namespace among types, and the fact that it
causes an error, to be a surprise.

> The final portion of the patch adds new regression tests. I'm not
> going to say that this is completely without merit, because it can be
> useful to know if some patch changes the behavior, but I guess I don't
> really see a whole lot of value in it, either.
>

I'd say the Bug # 16492 tests warrant keeping independent of the opinion
that demonstrating the complicated interplay between 10+ SQL commands isn't
worth the test suite time.  I'd say that probably half of the tests are
demonstrating non-intuitive behavior from my perspective.  The bug test
noted above plus the one the demonstration that a table in the non-first
schema in a search_path will not prevent a create  command from
succeeding but will cause a DROP  IF EXISTS to error out.
Does it need to test all 5 types, probably not, but it should at least test
DROP VIEW IF EXISTS test_rel_exists.

What about the inherent confusion that having both DROP DOMAIN when DROP
TYPE will also drop domains?  The doc change for that doesn't really fit
into your buckets.  It would include:

drop_domain.sgml
+   This duplicates the functionality provided by
+but restricts
+   the type's type to domain.

drop_type.sgml
+   This includes domains, though they can be removed specifically
+   by using the  command.

Adding sql-droptype to "See Also" on all the domain related command pages
as well.

After looking at this again I will say I do agree that the procedural
nature of the doc changes for the main issue were probably overkill and a
"oh-by-the-way" note as to the fact that we ERROR on a namespace conflict
would address that concern in a user-facing way adequately.  Looking back
and what I went through to put the test script together I don't regret
doing the work and feel that someone like myself would benefit from its
existence as a whole.  It's more useful than a README that would
communicate the same information.

David J.


Re: use-regular-expressions-to-simplify-less_greater-and-not_equals.patch

2021-08-10 Thread David G. Johnston
On Tue, Aug 10, 2021 at 8:36 PM Tom Lane  wrote:

> "=?UTF-8?B?5a2Z6K+X5rWpKOaAneaJjSk=?=" 
> writes:
> >  we can use regular expressions (<>|!=) to cover "<>" and "!=".
> There is no
> > need to have two definitions less_greater and not_equals, because it
> will confuse developer.
> > So, we can use only not_equals to cover this operator set.
>
> I do not find this an improvement.


Agreed, though mostly on a separation of responsibilities principle.
Labelling the distinctive tokens and then assigning them meaning are two
different things.

  Yeah, it's a bit shorter, but it's
> less clear;

not least because the comment explaining the <>-means-!=
> behavior is no longer anywhere near the code that implements that
> behavior.


The patch should just remove the comment for not_equals as well: if both
tokens are given the same name they would have to be treated identically.
The fact they have the same name is clearer in that the equivalency
knowledge is immediate and impossible to miss (if one doesn't go and check
how "less_greater" and "not_equal" are interpreted.)

  It would also get in the way if we ever had reason to treat <>
> and != as something other than exact equivalents.
>
>
This is unconvincing both on the odds of being able to treat them
differently and the fact that the degree of effort to overcome this
obstacle seems minimal - about the same amount as reverting this patch, not
accounting for the coding of the new behavior.

In short, for me, is a principled separation of concerns worth the slight
loss is clarity?  I'll stick to my status quo choice, but not strongly.
Some more context as to exactly what kind of confusion this setup is
causing could help tip the scale.

David J.


Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2021-08-11 Thread David G. Johnston
On Wed, Aug 11, 2021 at 5:54 AM Robert Haas  wrote:

> Yeah. I tend to feel like this is the kind of thing where it's not
> likely to be 100% obvious to users how it all works no matter what we
> put in the documentation, and that some of the behavior of a system
> has to be learned just by trying out the system.
>

I see where you are coming from and agree with a good portion of it, and
accept that you are accurately representing reality for most of the rest,
regardless of my feelings on that reality.  I'm not invested in this enough
to try and change mindsets.  I have withdrawn the patch from the commitfest
and do not plan on putting forth a replacement.

Thank you for your thorough review, I do appreciate it.  It did reaffirm
some of the suspicions I had about the wording I had chosen at least.

I will add that when I finished the patch I felt it was of more value to
future developers than it would be to our end users.  I never really
worried that a patch could be written to fill in the missing pieces that
prompted the various bug reports.  I went to the extra effort because the
underlying interactions seemed complicated and I wanted to see in practice
exactly how they behaved and what that meant for usability. I also still
disagree with our choice to emit an error on a namespace collision, and
generally feel this area could use improvement.  Thus I keep the tests
around, which basically communicate that "this is how things work and it is
intentional" and also are useful to have should future changes, however
unlikely to materialize, be made.  Sure, that isn't "regression" but in my
unfamiliarity I didn't know of a better existing place to put them and
didn't think to figure out (or create) a better location, one that doesn't
run on every build.

David J.


Re: Insert Documentation - Returning Clause and Order

2021-08-11 Thread David G. Johnston
On Mon, Dec 14, 2020 at 7:09 AM Ashutosh Bapat 
wrote:

> But we write what's guaranteed. Anything not written in
> the documents is not guaranteed.
>

In the case of LIMIT we go to great lengths to write what isn't
guaranteed.  I suggest that this is similar enough in nature to warrant the
same emphasis.

"Thus, using different LIMIT/OFFSET values to select different subsets of a
query result will give inconsistent results unless you enforce a
predictable result ordering with ORDER BY. This is not a bug; it is an
inherent consequence of the fact that SQL does not promise to deliver the
results of a query in any particular order unless ORDER BY is used to
constrain the order.

It is even possible for repeated executions of the same LIMIT query to
return different subsets of the rows of a table, if there is not an ORDER
BY to enforce selection of a deterministic subset. Again, this is not a
bug; determinism of the results is simply not guaranteed in such a case."

I'd go so far as to say that it's more important here since the observed
behavior is that things are ordered, and expected to be ordered, while with
limit the non-determinism seems more obvious.

David J.


Re: Default to TIMESTAMP WITH TIME ZONE?

2021-08-13 Thread David G. Johnston
On Fri, Aug 13, 2021 at 12:33 AM Gavin Flower 
wrote:

>
> I always use the tz version, except when I forget.


I find it nearly impossible for me to forget how this works.  But that is
probably because I just pretend that the standard, multi-word, data types
don't even exist.  It's not that "timestamp" defaults to "WITHOUT TIME
ZONE" but rather that there are only two types in existence: timestamp and
timestamptz.  It's self-evident which one doesn't handle time zones.


> Suspect that it is extremely rare when one would not want to use the TZ
> option, which suggests to me that using TZ should be the default.
>
>
I would agree, but at this point I'd leave it up to documentation and
educational materials to teach/encourage, not runtime behaviors.

David J.


Re: Default to TIMESTAMP WITH TIME ZONE?

2021-08-13 Thread David G. Johnston
On Fri, Aug 13, 2021 at 9:28 AM Simon Riggs 
wrote:

>
> The only hope is to eventually change the default, so probably
> the best thing is to apply pressure via the SQL Std process.
>
>
Then there is no hope because this makes the situation worse.

If anything I'd suggest the SQL standard should probably just admit this
"default behavior of timestamp" is a bad idea and deprecate its existence.
IOW, the only two standard conforming syntaxes are the
explicit WITH/WITHOUT TIME ZONE ones.  Any database implementation that
implements "timestamp" as a type alias is doing so in an implementation
dependent way.  Code that wants to be SQL standard conforming portable
needs to use the explicit types.

David J.


Re: Default to TIMESTAMP WITH TIME ZONE?

2021-08-13 Thread David G. Johnston
On Friday, August 13, 2021, Tom Lane  wrote:
>
> The one thing I could potentially see us doing is more strongly
> encouraging the use of the names "timestamp" and "timestamptz",
> up to and including changing what format_type() et al. put out.


 +1.  Having the canonical form be timestamptz would make pretending the
“with time zone” version doesn’t exist much easier.

David J.


Re: Allow composite foreign keys to reference a superset of unique constraint columns?

2021-08-16 Thread David G. Johnston
On Mon, Aug 16, 2021 at 4:37 PM Paul Martinez  wrote:

>
> It seems like a somewhat useful feature. If people think it would be
> useful to
> implement, I might take a stab at it when I have time.
>
>
This doesn't seem useful enough for us to be the only implementation to go
above and beyond the SQL Standard's specification for the references
feature (I assume that is what this proposal suggests).

This example does a good job of explaining but its assumptions aren't that
impactful and thus isn't that good at inducing desirability.

David J.


Re: [PATCH] Proof of concept for GUC improvements

2021-08-19 Thread David G. Johnston
On Thu, Aug 19, 2021 at 3:44 PM David Christensen <
david.christen...@crunchydata.com> wrote:

> Functionality-wise, any thoughts on the overall approach or the specific
> patch?
>

If this information was exposed only by an addition to pg_settings, and
thus not changeable via a GUC or affecting SHOW, I think it would be more
likely to get accepted.  Being more liberal in the accepting of these
values as input seems less problematic so that aspect could just stay.  But
the display changes should be done with new outputs, not repurposing
existing ones.

I'm at -0.5 as to whether such a patch would actually be an improvement or
whether the added possibilities would just be confusing and, because it is
all optional, indefinitely so.

David J.


Re: Improving some plpgsql error messages

2021-08-20 Thread David G. Johnston
On Fri, Aug 20, 2021 at 8:50 AM Tom Lane  wrote:

> Thoughts?  Should I back-patch this into v14 where 2f48ede08
> came in, or just do it in HEAD?
>

I'd say back-patch in the interest of preventing probably quite a few
emails from novices at plpgsql coding dealing with all the interplay
between queries and variables and returning syntaxes.

David J.


Re: ON ERROR in json_query and the like

2024-06-12 Thread David G. Johnston
On Wednesday, June 12, 2024, Markus Winand  wrote:
>
>
> 10.14 SR 1: The declared type of the  simply contained
> in the  immediately contained in the  item> shall be a string type or a JSON type.


> It might be best to think of it as two separate functions, overloaded:
>
> JSON_VALUE(context_item JSONB, path_expression …)
> JSON_VALUE(context_item TEXT, path_expression …)


Yes, we need to document that we deviate from (fail to fully implement) the
standard here in that we only provide jsonb parameter functions, not text
ones.

David J.


Re: ON ERROR in json_query and the like

2024-06-12 Thread David G. Johnston
On Tuesday, May 28, 2024, Markus Winand  wrote:

>
> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY
>
>17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
> a
>
> []
>(1 row)
>
>As NULL ON EMPTY is implied, it should give the same result as
>explicitly adding NULL ON EMPTY:
>
>17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON
> ERROR) a;
> a
>---
>
>(1 row)
>
>Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
>on the other hand returns NULL for both queries.
>
>I don’t think that PostgreSQL should follow Oracle DB's suit here
>but again, in case this is intentional it should be made explicit
>in the docs.
>

The docs here don’t seem to cover the on empty clause at all nor fully
cover all options.

Where do you find the claim that the one implies the other?  Is it a typo
that your examples says “implies null on empty” but the subject line says
“implies error on empty”?

Without those clauses a result is either empty or an error - they are
mutually exclusive (ignoring matches).  I would not expect one clause to
imply or affect the behavior of the other.  There is no chaining.  The
original result is transformed to the new result specified by the clause.

I’d need to figure out whether the example you show is actually producing
empty or error; but it seems correct if the result is empty.  The first
query ignores the error clause - the empty array row seems to be the
representation of empty here; the second one matches the empty clause and
outputs null instead of the empty array.

David J.


Re: Shouldn't jsonpath .string() Unwrap?

2024-06-12 Thread David G. Johnston
On Sat, Jun 8, 2024 at 3:50 PM David E. Wheeler 
wrote:

> Hackers,
>
> Most of the jsonpath methods auto-unwrap in lax mode:
>
> david=# select jsonb_path_query('[-2,5]', '$.abs()');
>  jsonb_path_query
> --
>  2
>  5
> (2 rows)
>
> The obvious exceptions are size() and type(), which apply directly to
> arrays, so no need to unwrap:
>
> david=# select jsonb_path_query('[-2,5]', '$.size()');
>  jsonb_path_query
> --
>  2
> (1 row)
>
> david=# select jsonb_path_query('[-2,5]', '$.type()');
>  jsonb_path_query
> --
>  "array"
>
> But what about string()? Is there some reason it doesn’t unwrap?
>
> david=# select jsonb_path_query('[-2,5]', '$.string()');
> ERROR:  jsonpath item method .string() can only be applied to a bool,
> string, numeric, or datetime value
>
> What I expect:
>
> david=# select jsonb_path_query('[-2,5]', '$.string()');
>  jsonb_path_query
> —
>  "2"
>  "5"
> (2 rows)
>
> However, I do see a test[1] for this behavior, so maybe there’s a reason
> for it?
>
>
Adding Andrew.

I'm willing to call this an open item against this feature as I don't see
any documentation explaining that string() behaves differently than the
others.

David J.


Re: improve predefined roles documentation

2024-06-13 Thread David G. Johnston
On Thu, Jun 13, 2024 at 12:48 PM Nathan Bossart 
wrote:

> I think we could improve matters by abandoning the table and instead
> documenting these roles more like we document GUCs, i.e., each one has a
> section below it where we can document it in as much detail as we want.
>
>
One of the main attributes for the GUCs is their category.  If we want to
improve organization we'd need to assign categories first.  We already
implicitly do so in the description section where we do group them together
and explain why - but it is all informal.  But getting rid of those
groupings and descriptions and isolating each role so it can be linked to
more easily seems like a net loss in usability.

I'm against getting rid of the table.  If we do add authoritative
subsection anchors we should just do like we do in System Catalogs and make
the existing table name values hyperlinks to those newly added anchors.
Breaking the one table up into multiple tables along category lines is
something to consider.

David J.


Re: jsonpath: Missing Binary Execution Path?

2024-06-13 Thread David G. Johnston
On Thu, Jun 13, 2024 at 6:10 PM Chapman Flack  wrote:

> On 06/13/24 16:43, David E. Wheeler wrote:
> > Paging Mr. Eisentraut!
>
> I'm not Mr. Eisentraut, but I have at last talked my way into some
> access to the standard, so ...
>
> Note 487 emphasizes that JSON path predicates "are not expressions;
> instead they form a separate language that can only be invoked within
> a ".
>
> The only operators usable in a general expression (that is, a
>  are binary + - and binary * / % and unary + -
> over a .
>
> Inside a filter, you get to use a . That's where
> you can use ! and && and ||. But ! can only be applied to a
> : either a ,
> or any other  wrapped in parentheses.
>
> On 06/13/24 11:32, David E. Wheeler wrote:
> > david=# select jsonb_path_query('true', '$ && $');
> > david=# select jsonb_path_query('true', '$.boolean() && $.boolean()');
>
> Those don't work because, as you recognized, they're not inside filters.
>

I'm content that the operators in the 'filter operators' table need to be
within filter but then I cannot reconcile why this example worked:

david=# select jsonb_path_query('1', '$ >= 1');
 jsonb_path_query
--
 true
(1 row)

David J.


Re: jsonpath: Missing Binary Execution Path?

2024-06-13 Thread David G. Johnston
On Thursday, June 13, 2024, Chapman Flack  wrote:

> On 06/13/24 21:24, David G. Johnston wrote:
> > I'm content that the operators in the 'filter operators' table need to be
> > within filter but then I cannot reconcile why this example worked:
> >
> > david=# select jsonb_path_query('1', '$ >= 1');
>
> Good point. I can't either. No way I can see to parse that as
> a .
>


Whether we note it as non-standard or not is an open question then, but it
does work and opens up a documentation question.  It seems like it needs to
appear in table T9.50.  Whether it also should appear in T9.51 is the
question.  It seems like anything in T9.50 is allowed in a filter while the
stuff in T9.51 should be limited to those things only allowed in a filter.
Which suggests moving it from T9.51 to T9.50

David J.


Re: jsonpath: Missing Binary Execution Path?

2024-06-13 Thread David G. Johnston
On Thursday, June 13, 2024, Chapman Flack  wrote:

> On 06/13/24 21:46, David G. Johnston wrote:
> >>> david=# select jsonb_path_query('1', '$ >= 1');
> >>
> >> Good point. I can't either. No way I can see to parse that as
> >> a .
> >
> > Whether we note it as non-standard or not is an open question then, but
> it
> > does work and opens up a documentation question.
>
> Does the fact that it does work raise any potential concern that our
> grammar is nonconformant in some way that could present a headache
> somewhere else, or down the road with a later standard edition?
>

This isn’t new in v17 nor, to my knowledge, has the behavior changed, so I
think we just need to live with whatever, likely minimal, chance of
headache there is.

I don’t get why the outcome of a boolean producing operation isn’t just
generally allowed to be produced, and would hope the standard would move
toward allowing that across the board, and in doing so end up matching what
we already have implemented.

David J.


Re: create role manual

2024-06-15 Thread David G. Johnston
On Sat, Jun 15, 2024 at 7:25 PM Tatsuo Ishii  wrote:

>The rules for which initial
>role membership options are enabled described below in the
>IN ROLE, ROLE, and
>ADMIN clauses.
>
> Maybe we need "are" in front of "described"?
>
>
Agreed.

David J.


Re: Document NULL

2024-06-17 Thread David G. Johnston
On Sat, May 11, 2024 at 11:00 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Though I haven’t settled on a phrasing I really like.  But I’m trying to
> avoid a parenthetical.
>
>
Settled on this:

The cardinal rule, a null value is neither
   equal nor unequal
   to any value, including other null values.

I've been tempted to just say, "to any value.", but cannot quite bring
myself to do it...

David J.


Re: Document NULL

2024-06-18 Thread David G. Johnston
On Tue, Jun 18, 2024 at 8:34 PM Yugo NAGATA  wrote:

>
> It may be a trivial thing but I am not sure we need to mention case
> insensitivity
> here, because all keywords and unquoted identifiers are case-insensitive in
> PostgreSQL and it is not specific to NULL.
>

But it is neither a keyword nor an identifier.  It behaves more like:
SELECT 1 as one;  A constant, which have no implied rules - mainly because
numbers don't have case.  Which suggests adding some specific mention there
- and also probably need to bring up it and its "untyped" nature in the
syntax chapter, probably here:

https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS-GENERIC


> Also, I found the other parts of the documentation use "case-insensitive"
> in which
> words are joined with hyphen, so I wonder it is better to use the same
> form if we
> leave the description.
>
>
Typo on my part, fixed.

I'm not totally against just letting this content be assumed to be learned
from elsewhere in the documentation but it also seems reasonable to
include.  I'm going to leave it for now.

David J.


Re: Document NULL

2024-06-18 Thread David G. Johnston
On Tuesday, June 18, 2024, Tom Lane  wrote:

> Yugo NAGATA  writes:
> > On Tue, 18 Jun 2024 20:56:58 -0700
> > "David G. Johnston"  wrote:
> >> But it is neither a keyword nor an identifier.
>
> The lexer would be quite surprised by your claim that NULL isn't
> a keyword.  Per src/include/parser/kwlist.h, NULL is a keyword,
> and a fully reserved one at that.
>
>
>

Can’t it be both a value and a keyword?  I figured the not null constraint
and is null predicates are why it’s a keyword but the existence of those
doesn’t cover its usage as a literal value that can be stuck anywhere you
have an expression.

David J.


Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-19 Thread David G. Johnston
On Wed, Jun 19, 2024 at 8:19 AM Jelte Fennema-Nio  wrote:


> Because part of it would
> only be relevant once we support upgrading from PG18. So for now the
> upgrade_code I haven't actually run.
>

Does it apply against v16?  If so, branch off there, apply it, then upgrade
from the v16 branch to master.

David J.


Re: SQL/JSON query functions context_item doc entry and type requirement

2024-06-19 Thread David G. Johnston
On Wed, Jun 19, 2024 at 8:29 AM jian he  wrote:

> On Mon, Jun 17, 2024 at 9:05 PM Chapman Flack  wrote:
> >
> > Hi,
> >
> > On 06/17/24 02:43, Amit Langote wrote:
> > > context_item expression can be a value of
> > > any type that can be cast to jsonb. This includes types
> > > such as char,  text, bpchar,
> > > character varying, and bytea (with
> > > ENCODING UTF8), as well as any domains over these types.
> >
> > Reading this message in conjunction with [0] makes me think that we are
> > really talking about a function that takes a first parameter of type
> jsonb,
> > and behaves exactly that way (so any cast required is applied by the
> system
> > ahead of the call). Under those conditions, this seems like an unusual
> > sentence to add in the docs, at least until we have also documented that
> > tan's argument can be of any type that can be cast to double precision.
> >
>
> I guess it would be fine to add an unusual sentence to the docs.
>
> imagine a function: array_avg(anyarray) returns anyelement.
> array_avg calculate an array's elements's avg. like
> array('{1,2,3}'::int[]) returns 2.
> but array_avg won't make sense if the input argument is a date array.
> so mentioning in the doc: array_avg can accept anyarray, but anyarray
> cannot date array.
> seems ok.
>

There is existing wording for this:

"The expression can be of any JSON type, any character string type, or
bytea in UTF8 encoding."

If you add this sentence to the paragraph the link that already exists,
which simply points the reader to this sentence, becomes redundant and
should be removed.

As for table 9.16.3 - it is unwieldy already.  Lets try and make the core
syntax shorter, not longer.  We already have precedence in the subsequent
json_table section - give each major clause item a name then below the
table define the syntax and meaning for those names.  Unlike in that
section - which probably should be modified too - context_item should have
its own description line.

David J.


Re: ON ERROR in json_query and the like

2024-06-20 Thread David G. Johnston
On Thu, Jun 20, 2024 at 2:19 AM Amit Langote 
wrote:

> Hi,
>
> On Mon, Jun 17, 2024 at 10:07 PM Markus Winand 
> wrote:
> > > On 17.06.2024, at 08:20, Amit Langote  wrote:
> > > Agree that the documentation needs to be clear about this. I'll update
> > > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
> > > Functions.
> >
> > Considering another branch of this thread [1] I think the
> > "Supported Features” appendix of the docs should mention that as well.
> >
> > The way I see it is that the standards defines two overloaded
> > JSON_QUERY functions, of which PostgreSQL will support only one.
> > In case of valid JSON, the implied CAST makes it look as though
> > the second variant of these function was supported as well but that
> > illusion totally falls apart once the JSON is not valid anymore.
> >
> > I think it affects the following feature IDs:
> >
> >   - T821, Basic SQL/JSON query operators
> >  For JSON_VALUE, JSON_TABLE and JSON_EXISTS
> >   - T828, JSON_QUERY
> >
> > Also, how hard would it be to add the functions that accept
> > character strings? Is there, besides the effort, any thing else
> > against it? I’m asking because I believe once released it might
> > never be changed — for backward compatibility.
>
> Hmm, I'm starting to think that adding the implied cast to json wasn't
> such a great idea after all, because it might mislead the users to
> think that JSON parsing is transparent (respects ON ERROR), which is
> what you are saying, IIUC.
>
>
Actually, the implied cast is exactly the correct thing to do here - the
issue is that we aren't using the newly added soft errors infrastructure
[1] to catch the result of that cast and instead produce whatever output on
error tells us to produce.  This seems to be in the realm of doability so
we should try in the interest of being standard conforming.  I'd even argue
to make this an open item unless and until the attempt is agreed upon to
have failed (or it succeeds of course).

David J.

[1] d9f7f5d32f201bec61fef8104aafcb77cecb4dcb


<    3   4   5   6   7   8   9   10   11   12   >