Re: Document DateStyle effect on jsonpath string()

2024-07-04 Thread David E. Wheeler
On Jul 4, 2024, at 04:28, jian he  wrote:

> Do you need to reset the datestyle?

Wouldn’t hurt but it’s not necessary, no. It’s set only for the execution of 
this file, and there are no more calls that rely on it.

> also the above query is time zone sensitive, maybe the time zone is
> set in another place, but that's not explicit?

It’s implicit in how PostgreSQL runs its test suite; other tests later change 
it.

>
> -String value converted from a JSON boolean, number, string, or 
> datetime
> +String value converted from a JSON boolean, number, string, or
> +datetime. Note that the string output of datetimes is determined by
> +the  parameter.
>
> imho, your patch has just too many examples.

I’m confused. There are no examples in my patch, or this bit you cite. 

> for explaining the above sentence, the following example should be enough.

Are you referring to the tests? I made them comprehensive so that we reliably 
demonstrate the behavior of the string() method on all the date/time data 
types. They are not examples, not in the documentation sense at least.

Best,

David





Document DateStyle effect on jsonpath string()

2024-07-02 Thread David E. Wheeler
Hackers,

In fuzing around trying to work out what’s going on with the formatting of 
timestamptz values cast by the timestamp_tz() jsonpath method[1], I noticed 
that the formatting of the string() method applied to date and time objects was 
not fully tested, or how the output is determined by the DateStyle method.

The attached path aims to rectify this situation by adding tests that chain 
string() after the jsonpath date/time methods, both with the default testing 
“PostreSQL” DateStyle and “ISO”. It also mentions the impact of the DateStyle 
parameter in the string() documentation.

Also available to review as a pull request[2].

Best,

David

[1]: 
https://www.postgresql.org/message-id/7DE080CE-6D8C-4794-9BD1-7D9699172FAB%40justatheory.com
[2]: https://github.com/theory/postgres/pull/7/files



v1-0001-Document-impact-of-datestyle-on-jsonpath-string.patch
Description: Binary data




Re: jsonpath: Inconsistency of timestamp_tz() Output

2024-07-02 Thread David E. Wheeler
On Jul 1, 2024, at 11:02, David E. Wheeler  wrote:

> Anyway, should the output of timestamptz JSONB values be made more 
> consistent? I’m happy to make a patch to do so, but could use a hand figuring 
> out where the behavior varies.

I think if the formatting was more consistent, the test output would be:


``` patch
--- a/src/test/regress/expected/jsonb_jsonpath.out
+++ b/src/test/regress/expected/jsonb_jsonpath.out
@@ -2914,7 +2914,7 @@ HINT:  Use *_tz() function for time zone support.
 select jsonb_path_query_tz('"2023-08-15"', '$.timestamp_tz()'); -- should work
  jsonb_path_query_tz 
 -
- "2023-08-15T07:00:00+00:00"
+ "2023-08-15T00:00:00-07:00"
 (1 row)
 
 select jsonb_path_query('"12:34:56"', '$.timestamp_tz()');
@@ -3003,7 +3003,7 @@ HINT:  Use *_tz() function for time zone support.
 select jsonb_path_query_tz('"2023-08-15 12:34:56"', '$.timestamp_tz()'); -- 
should work
  jsonb_path_query_tz 
 -
- "2023-08-15T12:34:56+00:00"
+ "2023-08-15T12:34:56+10:00"
 (1 row)
 
 select jsonb_path_query('"10-03-2017 12:34"', '$.datetime("dd-mm- 
HH24:MI")');
```

That second example is a bit different than I noticed up-thread, not just a 
formatting issue but the offset is never applied!. That test run under tz +10, 
and this is how it works with the non-JSONB data types:

david=# set time zone '+10';
SET
Time: 0.689 ms
david=# select '2023-08-15 12:34:56'::timestamptz;
  timestamptz   

 2023-08-15 12:34:56+10
(1 row)

Time: 0.491 ms
david=# select '2023-08-15 12:34:56'::timestamp::timestamptz;
  timestamptz   

 2023-08-15 12:34:56+10
(1 row)


Best,

David





jsonpath: Inconsistency of timestamp_tz() Output

2024-07-01 Thread David E. Wheeler
Hackers,

There’s an odd difference in the behavior of timestamp_tz() outputs. Running 
with America/New_York as my TZ, it looks fine for a full timestamptz, identical 
to how casting the types directly works:

david=# set time zone 'America/New_York';
SET

david=# select '2024-08-15 12:34:56-04'::timestamptz;
  timestamptz   

 2024-08-15 12:34:56-04
(1 row)

david=# select jsonb_path_query_tz('"2024-08-15 12:34:56-04"', 
'$.timestamp_tz()');
 jsonb_path_query_tz 
-
 "2024-08-15T12:34:56-04:00"

Both show the time in America/New_York, which is great. But when casting from a 
date, the behavior differs. Casting directly:

david=# select '2024-08-15'::date::timestamptz;
  timestamptz   

 2024-08-15 00:00:00-04

It stringifies to the current zone setting again, as expected. But look at the 
output from a path query:

david=# select jsonb_path_query_tz('"2023-08-15"', '$.timestamp_tz()');
 jsonb_path_query_tz 
-
 "2023-08-15T04:00:00+00:00"

It’s using UTC for the display output! Shouldn’t it be using America/New_York?

Note that I’m comparing a cast from date to timestamptz because that’s how the 
jsonpath parsing works[1]: it ultimately uses 
date2timestamptz_opt_overflow()[2] to make the conversion, which appears to set 
the offset from the time zone GUC, so I’m not sure where it’s converted to UTC 
before stringifying.

Maybe an argument is missing from the stringification path?

FWIW, explicitly calling the string() jsonpath method does produce a result in 
the current TZ:

david=# select jsonb_path_query_tz('"2023-08-15"', '$.timestamp_tz().string()');
   jsonb_path_query_tz
--
 "2023-08-15 00:00:00-04"

That bit uses timestamptz_out to format the output, but JSONB has its own 
stringification[4] (called here[5]), but I can’t tell what might be different 
between a timestamptz cast from a date and one not cast from a date.

Note the same divergency in behavior occurs when the source value is a 
timestamp, too. Compare:

david=# select '2024-08-15 12:34:56'::timestamp::timestamptz;
  timestamptz   

 2024-08-15 12:34:56-04
(1 row)

david=# select jsonb_path_query_tz('"2023-08-15 12:34:56"', '$.timestamp_tz()');
 jsonb_path_query_tz 
-
 "2023-08-15T16:34:56+00:00"
(1 row)

Anyway, should the output of timestamptz JSONB values be made more consistent? 
I’m happy to make a patch to do so, but could use a hand figuring out where the 
behavior varies.

Best,

David

[1]: 
https://github.com/postgres/postgres/blob/3497c87/src/backend/utils/adt/jsonpath_exec.c#L2708-L2718
[2]: 
https://github.com/postgres/postgres/blob/3497c87/src/backend/utils/adt/date.c#L613-L698
[3]: 
https://github.com/postgres/postgres/blob/3fb59e789dd9f21610101d1ec106ad58095e24f3/src/backend/utils/adt/jsonpath_exec.c#L1650-L1653
[4]: 
https://github.com/postgres/postgres/blob/3fb59e789dd9f21610101d1ec106ad58095e24f3/src/backend/utils/adt/json.c#L369-L407
[5]: 
https://github.com/postgres/postgres/blob/3fb59e7/src/backend/utils/adt/jsonb.c#L743-L748





Re: Proposal: Document ABI Compatibility

2024-06-27 Thread David E. Wheeler
On Jun 27, 2024, at 17:48, Jeremy Schneider  wrote:

> Minor nit - misspelled “considerd"

Thank you, Jeremy. V3 attached.

Best,

David



v3-0001-Add-API-an-ABI-guidance-to-the-C-language-docs.patch
Description: Binary data


Re: Patch bug: Fix jsonpath .* on Arrays

2024-06-27 Thread David E. Wheeler
On Jun 27, 2024, at 04:17, Michael Paquier  wrote:

> The tests of jsonb_jsonpath.sql include a lot of patterns for @? and
> jsonb_path_query with the lax and strict modes, so shouldn't these
> additions be grouped closer to the existing tests rather than added at 
> the end of the file?

I think you could argue that they should go with other tests for array 
unwrapping, though it’s kind of mixed throughout. But that’s more the bit I was 
testing; almost all the tests are lax, and it’s less the strict behavior to 
test here than the explicit behavior of unwrapping in lax mode.

But ultimately I don’t care where they go, just that we have them.

D





Re: Proposal: Document ABI Compatibility

2024-06-26 Thread David E. Wheeler
On Jun 26, 2024, at 15:20, David E. Wheeler  wrote:

> CF: https://commitfest.postgresql.org/48/5080/
> PR: https://github.com/theory/postgres/pull/6

Aaaand v2 without the unnecessary formatting of unrelated documentation 臘‍♂️.

Best,

David



v2-0001-Add-API-an-ABI-guidance-to-the-C-language-docs.patch
Description: Binary data




Re: Proposal: Document ABI Compatibility

2024-06-26 Thread David E. Wheeler
On Jun 26, 2024, at 15:14, David E. Wheeler  wrote:

> Okay here’s a patch that adds the proposed API and ABI guidance to the C 
> Language docs. The content is the same as Peter proposed, with some light 
> copy-editing.

CF: https://commitfest.postgresql.org/48/5080/
PR: https://github.com/theory/postgres/pull/6

D





Re: Proposal: Document ABI Compatibility

2024-06-26 Thread David E. Wheeler
On Jun 25, 2024, at 13:55, David E. Wheeler  wrote:

> Oh man this is fantastic, thank you! I’d be more than happy to just turn this 
> into a patch. But where should it go? Upthread I assumed xfunc.sgml, and 
> still think that’s a likely candidate. Perhaps I’ll just start there --- 
> unless someone thinks it should go somewhere other than the docs.

Okay here’s a patch that adds the proposed API and ABI guidance to the C 
Language docs. The content is the same as Peter proposed, with some light 
copy-editing.

Best,

David



v1-0001-Add-API-an-ABI-guidance-to-the-C-language-docs.patch
Description: Binary data




Re: Patch bug: Fix jsonpath .* on Arrays

2024-06-26 Thread David E. Wheeler
On Jun 25, 2024, at 13:48, David E. Wheeler  wrote:

> I have since realized it’s not a complete fix for the issue, and hacked 
> around it in my Go version. Would be fine to remove that bit, but IIRC this 
> was the only execution function that would return `jperNotFound` when it in 
> fact adds items to the `found` list. The current implementation only looks at 
> one or the other, so it’s not super important, but I found the inconsistency 
> annoying and sometimes confusing.

I’ve removed this change.

>> [1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*');
>> I propose adding a similar test with explicitly specified lax mode: select 
>> jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode 
>> is set by default.
> 
> Very few of the other tests do so; I can add it if it’s important for this 
> case, though.

Went ahead and added lax.

> @? suppresses a number of errors. Perhaps I should add a variant of the 
> error-raising query that passes the silent arg, which would also suppress the 
> error.

Added a variant where the silent param suppresses the error, too.

V2 attached and the PR updated:

  https://github.com/theory/postgres/pull/4/files

Best,

David



v2-0001-Add-tests-for-jsonpath-.-on-arrays.patch
Description: Binary data




Re: Proposal: Document ABI Compatibility

2024-06-26 Thread David E. Wheeler
On Jun 26, 2024, at 04:48, Laurenz Albe  wrote:

> Perhaps such information should go somewhere here:
> https://www.postgresql.org/support/versioning/

This seems deeper and more detailed than what’s there now, but I can certainly 
imagine wanting to include this policy on the web site. That said, it didn’t 
occur to me to look under support when trying to find a place to put this; I 
was looking under Developers, on the principle that extension developers would 
look there.

Best,

David





Re: RFC: Additional Directory for Extensions

2024-06-25 Thread David E . Wheeler
On Jun 25, 2024, at 10:43 AM, Robert Haas  wrote:

> If we want to work on making the sorts of changes that
> you're proposing, let's do it on a separate thread. It's not going to
> be meaningfully harder to move in that direction after some patch like
> this than it is today.

I appreciate this separation of concerns, Robert.

In other news, here’s an updated patch that expands the documentation to record 
that the destination directory is a prefix, and full paths should be used under 
it. Also take the opportunity to document the PGXS DESTDIR variable as the 
thing to use to install files under the destination directory.

It still requires a server restart; I can change it back to superuser-only if 
that’s the consensus.

For those who prefer a GitHub patch review experience, see this PR:

  https://github.com/theory/postgres/pull/3/files

Best,

David



v4-0001-Add-extension_destdir-GUC.patch
Description: Binary data




Re: Proposal: Document ABI Compatibility

2024-06-25 Thread David E. Wheeler
On Jun 25, 2024, at 7:33 AM, Peter Eisentraut  wrote:

> I took at a stab at this, using some of your text, but discussing API and ABI 
> separately.

Oh man this is fantastic, thank you! I’d be more than happy to just turn this 
into a patch. But where should it go? Upthread I assumed xfunc.sgml, and still 
think that’s a likely candidate. Perhaps I’ll just start there --- unless 
someone thinks it should go somewhere other than the docs.

Best,

David





Re: Patch bug: Fix jsonpath .* on Arrays

2024-06-25 Thread David E. Wheeler
On Jun 25, 2024, at 12:46 AM, Степан Неретин  wrote:

> Hi! Looks good to me, but I have several comments.

Thanks for your review!

> Your patch improves tests, but why did you change formatting in 
> jsonpath_exec.c? What's the motivation?

It’s not just formatting. From the commit message:

> While at it, teach `executeAnyItem()` to return `jperOk` when `found`
> exist, not because it will be used (the result and `found` are inspected
> by different functions), but because it seems like the proper thing to
> return from `executeAnyItem()` when considered in isolation.


I have since realized it’s not a complete fix for the issue, and hacked around 
it in my Go version. Would be fine to remove that bit, but IIRC this was the 
only execution function that would return `jperNotFound` when it in fact adds 
items to the `found` list. The current implementation only looks at one or the 
other, so it’s not super important, but I found the inconsistency annoying and 
sometimes confusing.

>  [1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*');
> I propose adding a similar test with explicitly specified lax mode: select 
> jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode 
> is set by default.

Very few of the other tests do so; I can add it if it’s important for this 
case, though.

> Odd empty result for the test: select jsonb '[1,2,3,{"b": [3,4,5]}]' @? 
> 'strict $.*';
> I expected an error like in test [1]. This behavior is not obvious to me.

@? suppresses a number of errors. Perhaps I should add a variant of the 
error-raising query that passes the silent arg, which would also suppress the 
error.

> Everything else is cool. Thanks to the patch and the discussion above, I 
> began to understand better how wildcards in JSON work.

Yeah, it’s kind of wild TBH.

Best,

David





Re: RFC: Additional Directory for Extensions

2024-06-25 Thread David E. Wheeler
On Jun 24, 2024, at 5:32 PM, Jelte Fennema-Nio  wrote:

> Still, for the sake of completeness it might make sense to support
> this whole list in extension_destdir. (assuming it's easy to do)

It should be with the current patch, which just uses a prefix to paths in 
`pg_config`. So if SHAREDIR is set to /usr/share/postgresql/16 and 
extension_destdir is set to /mount/ext, then Postgres will look for files in 
/mount/ext/usr/share/postgresql/16. The same rule applies (or should apply) for 
all other pg_config directory configs and where the postmaster looks for 
specific files. And PGXS already supports installing files in these locations, 
thanks to its DESTDIR param.

(I don’t know how it works on Windows, though.)

That said, this is very much a pattern designed for RPM and Debian package 
management patterns, and not for actually installing and managing extensions. 
And maybe that’s fine for now, as it can still be used to address the 
immutability problems descried in the original post in this thread.

Ultimately, I’d like to figure out a way to more tidily organize installed 
extension files, but I think that, too, might be a separate thread.

Best,

David





Re: RFC: Additional Directory for Extensions

2024-06-24 Thread David E. Wheeler
On Jun 24, 2024, at 17:17, Jelte Fennema-Nio  wrote:

> If you want to only change $libdir during CREATE EXTENSION (or ALTER
> EXTENSION UPDATE), then why not just change it there. And really you'd
> only want to change it when creating an extension from which the
> control file is coming from extension_destdir.

IIUC, the postmaster needs to load an extension on first use in every session 
unless it’s in shared_preload_libraries.

> However, I can also see a case for really always changing $libdir.
> Because some extensions in shared_preload_libraries, might want to
> trigger loading other libraries that they ship with dynamically. And
> these libraries are probably also in extension_destdir.

Right, it can be more than just the DSOs for the extension itself.

Best,

David





Re: RFC: Additional Directory for Extensions

2024-06-24 Thread David E. Wheeler
On Jun 24, 2024, at 4:28 PM, Robert Haas  wrote:

> As long as the GUC is superuser-only, I'm not sure what else there is
> to do here. The only question is whether there's some reason to
> disallow this even from the superuser, but I'm not quite seeing such a
> reason.

I can switch it back from requiring a restart to allowing a superuser to set it.

>> I sketched them quickly, so agree they can be better. Reading the code, I 
>> now see that it appears to be the former case. I’d like to advocate for the 
>> latter.
> 
> Sounds good.

Yeah, though then I have a harder time deciding how it should work. pg_config’s 
paths are absolute. With your first example, we just use them exactly as they 
are, but prefix them with the destination directory. So if it’s set to 
`/my/temp/root/`, then files go into

/my/temp/root/$(pg_conifg --sharedir)
/my/temp/root/$(pg_conifg --pkglibdir)
/my/temp/root/$(pg_conifg --bindir)
# etc.

Which is exactly how RPM and Apt packages are built, but seems like an odd 
configuration for general use.

>> BINDIR
>> DOCDIR
>> HTMLDIR
>> PKGINCLUDEDIR
>> LOCALEDIR
>> MANDIR
>> 
>> I can imagine an extension wanting or needing to use any and all of these.
> 
> Are these really all relevant to backend code?

Oh I think so. Especially BINDIR; lots of extensions ship with binary 
applications. And most ship with docs, too (PGXS puts items listed in DOCS into 
DOCDIR). Some might also produce man pages (for their binaries), HTML docs, and 
other stuff. Maybe an FTE extension would include locale files?

I find it pretty easy to imagine use cases for all of them. So much so that I 
wrote an extension binary distribution RFC[1] and its POC[2] around them.

Best,

David

[1]: https://github.com/orgs/pgxn/discussions/2
[1]: https://justatheory.com/2024/06/trunk-poc/





Re: Proposal: Document ABI Compatibility

2024-06-24 Thread David E. Wheeler
On Jun 19, 2024, at 05:41, Peter Eisentraut  wrote:

> This is probably a bit confusing.  This might as well mean client application 
> code against libpq.  Better something like "server plugin code that uses the 
> PostgreSQL server APIs".

That works.

> But now we're talking about API.  That might be subject of another document 
> or another section in this one, but it seems confusing to mix this with the 
> ABI discussion.

Hrm. They’re super closely-related in my mind, as an extension developer. I 
need to know both! I guess I’m taking of this policy as what I can expect may 
be changed (and how to adapt to it) and what won’t.

That said, I’m fine to remove the API stuff if there’s consensus objecting to 
it, to be defined in a separate policy (perhaps on the same doc page).

>> PostgreSQL avoids unnecessary API changes in major releases, but usually 
>> ships a few necessary API changes, including deprecation, renaming, and 
>> argument variation.
> 
> Obviously, as a practical matter, there won't be random pointless changes.  
> But I wouldn't go as far as writing anything down about how these APIs are 
> developed.

Fair enough, was trying to give some idea of the sorts of changes. Don’t have 
to include them.

>> In such cases the incompatible changes will be listed in the Release Notes.
> 
> I don't think anyone is signing up to do that.

It needn’t be comprehensive. Just mention that an ABI or API changed in the 
release note item. Unless they almost *all* make such changes.

>> Minor Releases
>> --

> I think one major problem besides actively avoiding or managing such 
> minor-version ABI breaks is some automated detection.  Otherwise, this just 
> means "we try, but who knows”.

I think you *do* try, and the fact that there are so few issues means you 
succeed at that. I’m not advocating for an ABI guarantee here, just a 
description of the policy committees already follow.

Here’s an update based on all the feedback, framing things more from the 
perspective of “do I need to recompile this or change my code”. Many thanks!

``` md
ABI Policy
==

Changes to the the PostgreSQL server APIs may require recompilation of server 
plugin code that uses them. This policy describes the core team's approach to 
such changes, and what server API users can expect.

Major Releases
--

Applications that use server APIs must be compiled for each major release 
supported by the application. The inclusion of `PG_MODULE_MAGIC` ensures that 
code compiled for one major version will rejected by other major versions. 
Developers needing to support multiple versions of PostgreSQL with incompatible 
APIs should use the `PG_VERSION_NUM` constant to adjust code as appropriate. 
For example:

``` c
#if PG_VERSION_NUM >= 16
#include "varatt.h"
#endif
```

The core team avoids unnecessary breakage, but users of the server APIs should 
expect and be prepared to make adjustments and recompile for every major 
release.

Minor Releases
--

PostgreSQL makes an effort to avoid server API and ABI breaks in minor 
releases. In general, an application compiled against any minor release will 
work with any other minor release, past or future. In the absence of automated 
detection of such changes, this is not a guarantee, but history such breaking 
changes have been extremely rare.

When a change *is* required, PostgreSQL will choose the least invasive change 
possible, for example by squeezing a new field into padding space or appending 
it to the end of a struct. This sort of change should not impact dependent 
applications unless they use `sizeof(the struct)` on a struct with an appended 
field, or create their own instances of such structs --- patterns best avoided.

In rare cases, however, even such non-invasive changes may be impractical or 
impossible. In such an event, the change will be documented in the Release 
Notes, and details on the issue will also be posted to [TBD; mail list? Blog 
post? News item?].

To minimize issues and catch changes early, the project strongly recommends 
that developers adopt continuous integration testing at least for the latest 
minor release all major versions of Postgres they support.
```


Best,

David





Re: Proposal: Document ABI Compatibility

2024-06-24 Thread David E . Wheeler
On Jun 24, 2024, at 14:51, Robert Haas  wrote:

> I suppose that it's true that we try to avoid gratuitous breakage, but
> I feel like it would be weird to document that.

I see how that can seem weird to a committer deeply familiar with the 
development process and how things happen. But people outside the -hackers 
bubble have very little idea. It’s fair to say it needn’t be a long statement 
for major versions: a single sentence such as “we try to avoid gratuitous 
breakage” is a perfectly reasonable framing. But I’d say, in the interest of 
completeness, it would be useful to document the policy for major release *as 
well as* minor releases.

Best,

David





Re: Proposal: Document ABI Compatibility

2024-06-24 Thread David E. Wheeler
On Jun 19, 2024, at 05:42, Peter Eisentraut  wrote:

>>> https://postgr.es/m/CAH2-Wzm-W6hSn71sUkz0Rem=qdeu7tnfmc7_jg2djrlfef_...@mail.gmail.com
>>> 
>>> Theoretically anybody can do this themselves. In practice they don't.
>>> So something as simple as providing automated reports about ABI
>>> changes might well move the needle here.
>> What would be required to make such a thing? Maybe it’d make a good Summer 
>> of Code project.
> 
> The above thread contains a lengthy discussion about what one could do.

I somehow missed that GSoC 2024 is already going with contributors. Making a 
mental note to add an item for 2025.

D





Re: RFC: Additional Directory for Extensions

2024-06-24 Thread David E. Wheeler
On Jun 24, 2024, at 1:53 PM, Robert Haas  wrote:

> Is "tighten up what the superuser can do" on our list of objectives?
> Personally, I think we should be focusing mostly, and maybe entirely,
> on letting non-superusers do more things, with appropriate security
> controls. The superuser can ultimately do anything, since they can
> cause shell commands to be run and load arbitrary code into the
> backend and write code in untrusted procedural languages and mutilate
> the system catalogs and lots of other terrible things.

I guess the question then is what security controls are appropriate for this 
feature, which after all tells the postmaster what directories to read files 
from. It feels a little outside the scope of a regular user to even be aware of 
the file system undergirding the service. But perhaps there’s a non-superuser 
role for whom it is appropriate?

> Now, I think there are environments where people have used things like
> containers to try to lock down the superuser, and while I'm not sure
> that can ever be particularly water-tight, if it were the case that
> this patch would make it a whole lot easier for a superuser to bypass
> the kinds of controls that people are imposing today, that might be an
> argument against this patch. But ... off-hand, I'm not seeing such an
> exposure.

Yeah I’m not even sure I follow. Containers are immutable, other than mutable 
mounted volumes --- which is one use case this patch is attempting to enable.

> On the patch itself, I find the documentation for this to be fairly
> hard to understand. I think it could benefit from an example. I'm
> confused about whether this is intended to let me search for
> extensions in /my/temp/root/usr/lib/postgresql/... by setting
> extension_directory=/my/temp/dir, or whether it's intended me to
> search both /usr/lib/postgresql as I normally would and also
> /some/other/place.

I sketched them quickly, so agree they can be better. Reading the code, I now 
see that it appears to be the former case. I’d like to advocate for the latter. 

> If the latter, I wonder why we don't handle shared
> libraries by setting dynamic_library_path and then just have an
> analogue of that for control files.

The challenge is that it applies not just to shared object libraries and 
control files, but also extension SQL files and any other SHAREDIR files an 
extension might include. But also, I think it should support all the pg_config 
installation targets that extensions might use, including:

BINDIR
DOCDIR
HTMLDIR
PKGINCLUDEDIR
LOCALEDIR
MANDIR

I can imagine an extension wanting or needing to use any and all of these.

Best,

David







Re: Inconsistent Parsing of Offsets with Seconds

2024-06-24 Thread David E. Wheeler
On Jun 22, 2024, at 14:10, David E. Wheeler  wrote:

> I believe the former issue is caused by the latter: The jsonpath 
> implementation uses the formatting strings to parse the timestamps[1], and 
> since there is no formatting to support offsets with seconds, it doesn’t work 
> at all in JSON timestamp parsing.
> 
> [1]: 
> https://github.com/postgres/postgres/blob/70a845c/src/backend/utils/adt/jsonpath_exec.c#L2420-L2442

A side-effect of this implementation of date/time parsing using the to_char 
templates is that only time zone offsets and abbreviations are supported. I 
find the behavior a little surprising TBH:

david=# select to_timestamp('2024-06-03 12:35:00America/New_York', '-MM-DD 
HH24:MI:SSTZ');
ERROR:  invalid value "America/New_York" for "TZ"
DETAIL:  Time zone abbreviation is not recognized.

Unless the SQL standard only supports offsets and abbreviations, I wonder if 
we’d be better off updating the above parsing code to also try the various 
date/time input functions, as well as the custom formats that *are* defined by 
the standard.

Best,






Re: Inconsistent Parsing of Offsets with Seconds

2024-06-22 Thread David E. Wheeler
On Jun 22, 2024, at 13:15, Tom Lane  wrote:

> It's hard to get excited about this.

I freely admit I’m getting into the weeds here. :-)

>> The corresponding jsonpath methods don’t like offsets with seconds *at all*:
> 
> Perhaps that should be fixed, but it's pretty low-priority IMO.
> I doubt there is any standard saying that JSON timestamps need
> to be able to include that.
> 
>> I see from the source[1] that offsets between plus or minus 15:59:59
>> are allowed; should the `OF` and `TZ formats be able to parse them?
> 
> I'd vote no.  to_date/to_char already have enough trouble with format
> strings being squishier than one might expect.

I believe the former issue is caused by the latter: The jsonpath implementation 
uses the formatting strings to parse the timestamps[1], and since there is no 
formatting to support offsets with seconds, it doesn’t work at all in JSON 
timestamp parsing.

[1]: 
https://github.com/postgres/postgres/blob/70a845c/src/backend/utils/adt/jsonpath_exec.c#L2420-L2442

So if we were to fix the parsing of offsets in jsonpath, we’d either have to 
change the parsing code there or augment the to_timestamp() formats and use 
them.

Totally agree not a priority; happy to just pretend offsets with seconds don’t 
exist in any practical sense.

Best,

David





Inconsistent Parsing of Offsets with Seconds

2024-06-22 Thread David E. Wheeler
Hackers,

The treatment of timestamptz (and timetz) values with offsets that include 
seconds seems a bit inconsistent. One can create such timestamps through the 
input function:

david=# select '2024-06-22T12:35:00+02:30:15'::timestamptz;
  timestamptz   

 2024-06-22 10:04:45+00

But the offset seconds are dropped (or rounded away?) by to_timestamp()’s `OF` 
and `TZ` formats[2]:

david=# select to_timestamp('2024-06-03 12:35:00+02:30:15', '-MM-DD 
HH24:MI:SSOF');
  to_timestamp  

 2024-06-03 10:05:00+00

david=# select to_timestamp('2024-06-03 12:35:00+02:30:15', '-MM-DD 
HH24:MI:SSTZ');
  to_timestamp  

 2024-06-03 02:05:00-08

The corresponding jsonpath methods don’t like offsets with seconds *at all*:

david=# select jsonb_path_query('"2024-06-03 12:35:00+02:30:15"', 
'$.datetime("-MM-DD HH24:MI:SSOF")');
ERROR:  trailing characters remain in input string after datetime format

david=# select jsonb_path_query('"2024-06-03 12:35:00+02:30:15"', 
'$.timestamp_tz()');
ERROR:  timestamp_tz format is not recognized: "2024-06-03 12:35:00+02:30:15"

I see from the source[1] that offsets between plus or minus 15:59:59 are 
allowed; should the `OF` and `TZ formats be able to parse them? Or perhaps 
there should be a `TZS` format to complement `TZH` and `TZM`?

Best,

David

[1] 
https://github.com/postgres/postgres/blob/70a845c/src/include/datatype/timestamp.h#L136-L142
[2]: https://www.postgresql.org/docs/16/functions-formatting.html





Re: call for applications: mentoring program for code contributors

2024-06-20 Thread David E. Wheeler
On Jun 20, 2024, at 13:12, Robert Haas  wrote:

> I'm working to start a mentoring program where code contributors can
> be mentored by current committers. Applications are now open:
> https://forms.gle/dgjmdxtHYXCSg6aB7

This is amazing! Thank you for putting it together, Robert.

Best,

David





Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-20 Thread David E. Wheeler
On Jun 19, 2024, at 13:50, Jelte Fennema-Nio  wrote:

> This indeed does sound like the behaviour that pretty much every
> existing extension wants to have. One small addition/clarification
> that I would make to your definition: fully qualified references to
> other objects should still be allowed.

Would be tricky for referring to objects from other extensions  with no defined 
schema, or are relatable.

> 1. To have a safe search_path that can be used in SET search_path on a
> function (see also [1]).
> 2. To make it easy for extension authors to avoid conflicts with other
> extensions/UDFs.

These would indeed be nice improvements IMO.

Best,

David






Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-20 Thread David E. Wheeler
On Jun 19, 2024, at 11:28, Robert Haas  wrote:

> But I wonder if there might also be another possible approach: could
> we, somehow, prevent object references in extension scripts from
> resolving to anything other than the system catalogs and the contents
> of that extension? Perhaps with a control file setting to specify a
> list of trusted extensions which we're also allowed to reference?

It would also have to allow access to other extensions it depends upon.

D





Re: jsonpath Time and Timestamp Special Cases

2024-06-20 Thread David E. Wheeler
On Apr 29, 2024, at 20:45, David E. Wheeler  wrote:

> I noticed that the jsonpath date/time functions (.time() and timestamp(), et 
> al.) don’t support some valid but special-case PostgreSQL values, notably 
> `infinity`, `-infinity`, and, for times, '24:00:00`:

Looking at ECMA-404[1], “The JSON data interchange syntax”, it says, of numbers:

> Numeric values that cannot be represented as sequences of digits (such as 
> Infinity and NaN) are not
permitted.

So it makes sense that the JSON path standard would be the same, since such 
JSON explicitly cannot represent these values as numbers.

Still not sure about `24:00:00` as a time, though. I presume the jsonpath 
standard disallows it.

Best,

David

[1]: 
https://ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf





Re: jsonpath: Missing regex_like && starts with Errors?

2024-06-17 Thread David E. Wheeler
On Jun 17, 2024, at 6:44 PM, Chapman Flack  wrote:

> The current implementation seems to have made each of our
> s responsible for swallowing its own errors, which
> is one perfectly cromulent way to satisfy the SQL standard behavior saying
> all errors within a  should be swallowed.

Naw, executePredicate does it for all of them, as for the left operand here[1].

> The standard says nothing on how they should behave outside of a
> , because as far as the standard's concerned,
> they can't appear there.
> 
> Ours currently behave the same way, and swallow their errors.

Yes, and they’re handled consistently, at least.

> It would have been possible to write them in such a way as to raise errors,
> but not when inside a , and that would also satisfy
> the standard, but it would also give us the errors you would like from our
> nonstandard "predicate check expressions". And then you could easily use
> silent => true if you wanted them silent.

I’m okay without the errors, as long as the behaviors are consistent. I mean it 
might be cool to have a way to get them, but the consistency I thought I saw 
was the bit that seemed like a bug.

> I'd be leery of changing that, though, as we've already documented that
> a "predicate check expression" returns true, false, or unknown, so having
> it throw by default seems like a change of documented behavior.

Right, same for using jsonb_path_match().

> The current situation can't make much use of 'silent', since it's already
> false by default; you can't make it any falser to make predicate-check
> errors show up.

EXTREAMLY FALSE! 

> Would it be a thinkable thought to change the 'silent' default to null?
> That could have the same effect as false for SQL standard expressions, and
> the same effect seen now for "predicate check expressions", and you could
> pass it explicitly false if you wanted errors from the predicate checks.

Thaat seems like it’d be confusing TBH.

> If that's no good, I don't see an obvious solution other than adding
> another nonstandard construct to what's nonstandard already, and allowing
> something like nonsilent(predicate check expression).

The only options I can think of are a GUC to turn on SUPER STRICT mode or 
something (yuck, action at a distance) or introduce new functions with the new 
behavior. I advocate for neither (at this point).

Best,

David

[1]: 
https://github.com/postgres/postgres/blob/82ed67a/src/backend/utils/adt/jsonpath_exec.c#L2058-L2059





Re: Proposal: Document ABI Compatibility

2024-06-17 Thread David E. Wheeler
On Jun 12, 2024, at 11:57, Peter Geoghegan  wrote:

> That having been said, it would be useful if there was a community web
> resource for this -- something akin to coverage.postgresql.org, but
> with differential ABI breakage reports. You can see an example report
> here:
> 
> https://postgr.es/m/CAH2-Wzm-W6hSn71sUkz0Rem=qdeu7tnfmc7_jg2djrlfef_...@mail.gmail.com
> 
> Theoretically anybody can do this themselves. In practice they don't.
> So something as simple as providing automated reports about ABI
> changes might well move the needle here.

What would be required to make such a thing? Maybe it’d make a good Summer of 
Code project.

Best,

David





Re: Proposal: Document ABI Compatibility

2024-06-17 Thread David E. Wheeler
On Jun 12, 2024, at 11:30, Peter Geoghegan  wrote:

> I'm a little surprised that we don't seem to have all that many
> problems with ABI breakage, though. Although we theoretically have a
> huge number of APIs that extension authors might choose to use, that
> isn't really true in practical terms. The universe of theoretically
> possible problems is vastly larger than the areas where we see
> problems in practice. You have to be pragmatic about it.

Things go wrong far less often than one might fear! Given this relative 
stability, I think it’s reasonable to document what heretofore assumed the 
policy is so that the fears can largely be put to rest by clear expectations.

Best,

David





Re: Proposal: Document ABI Compatibility

2024-06-17 Thread David E. Wheeler
On Jun 12, 2024, at 11:20, Andres Freund  wrote:

>> The PostgreSQL core team maintains two application binary interface (ABI) 
>> guarantees: one for major releases and one for minor releases.
> 
> I.e. for major versions it's "there is none"?

Is it? ISTM that there is the intention not to break things that don’t need to 
be broken, though that doesn’t rule out interface improvements.

>> In such cases the incompatible changes will be listed in the Release Notes.
> 
> I don't think we actually exhaustively list all of them.

Should they be? I can maybe see the argument not to for major releases. But 
I’ve also has the experience of a new failure on a major release and having to 
go find the reason for it and the fix, often requiring the attention of someone 
on a mailing list who might rather tap the “compatibility changes” sign in the 
latest change log. :-)

> s/every/a reasonable/ or just s/every/an/

✅

> The padding case doesn't affect sizeof() fwiw.

✅

> I think there's too often not an alternative to using sizeof(), potentially
> indirectly (via makeNode() or such. So this sounds a bit too general.

Is there some other way to avoid the issue? Or would constructor APIs need to 
be added to core?


Updated with your suggestions:


``` md

ABI Policy
==

The PostgreSQL core team maintains two application binary interface (ABI) 
guarantees: one for major releases and one for minor releases.

Major Releases
--

Applications that use the PostgreSQL APIs must be compiled for each major 
release supported by the application. The inclusion of `PG_MODULE_MAGIC` 
ensures that code compiled for one major version will rejected by other major 
versions.

Furthermore, new releases may make API changes that require code changes. Use 
the `PG_VERSION_NUM` constant to adjust code in a backwards compatible way:

``` c
#if PG_VERSION_NUM >= 16
#include "varatt.h"
#endif
```

PostgreSQL avoids unnecessary API changes in major releases, but usually ships 
a few necessary API changes, including deprecation, renaming, and argument 
variation. In such cases the incompatible changes will be listed in the Release 
Notes.

Minor Releases
--

PostgreSQL makes an effort to avoid both API and ABI breaks in minor releases. 
In general, an application compiled against any minor release will work with 
any other minor release, past or future.

When a change *is* required, PostgreSQL will choose the least invasive way 
possible, for example by squeezing a new field into padding space or appending 
it to the end of a struct. This sort of change should not impact dependent 
applications unless, they use `sizeof(the struct)` on a struct with an appended 
field, or create their own instances of such structs --- patterns best avoided.

In rare cases, however, even such non-invasive changes may be impractical or 
impossible. In such an event, the change will be documented in the Release 
Notes, and details on the issue will also be posted to [TBD; mail list? Blog 
post? News item?].

The project strongly recommends that developers adopt continuous integration 
testing at least for the latest minor release all major versions of Postgres 
they support.
```

Best,

David





Re: jsonpath: Missing regex_like && starts with Errors?

2024-06-17 Thread David E. Wheeler
On Jun 16, 2024, at 11:52, David E. Wheeler  wrote:

> I think that’s how it should be; I prefer that it raises errors by default 
> but you can silence them:
> 
> david=# select jsonb_path_query(target => '{"x": "hi"}', path => 
> '$.integer()', silent => false);
> ERROR:  jsonpath item method .integer() can only be applied to a string or 
> numeric value
> 
> david=# select jsonb_path_query(target => '{"x": "hi"}', path => 
> '$.integer()', silent => true);
> jsonb_path_query 
> --
> (0 rows)
> 
> I suggest that the same behavior be adopted for `like_regex` and `starts 
> with`.

Okay, I think I’ve figured this out, and the key is that I am, once again, 
comparing predicate path queries to SQL standard queries. If I update the first 
example to use a comparison I no longer get an error:

david=# select jsonb_path_query('{"x": "hi"}', '$.integer() == 1');
 jsonb_path_query 
--
 null

So I think that’s the key: There’s not a difference between the behavior of 
`like_regex` and `starts with` vs other predicate expressions.

This dichotomy continues to annoy. I would very much like some way to have 
jsonb_path_query() raise an error (or even a warning!) if passed a predate 
expression, and for jsonb_path_match() to raise an error or warning if its path 
is not a predicate expression. Because I keep confusing TF out of myself.

Best,

David





Re: FYI: LLVM Runtime Crash

2024-06-17 Thread David E. Wheeler
On Jun 17, 2024, at 16:37, Andres Freund  wrote:

> I suspect the issue might be that the version of clang and LLVM are diverging
> too far. Does it work if you pass CLANG=/opt/homebrew/opt/llvm/bin/clang to
> configure?

It does! It didn’t occur to me this would be the issue, but I presumes 
/usr/bin/clang is not compatible with the latest LLVM installed from Homebrew. 
Interesting! I’ll update that issue.

Thanks,

David





Re: FYI: LLVM Runtime Crash

2024-06-17 Thread David E. Wheeler
On Jun 17, 2024, at 11:52, David E. Wheeler  wrote:

> I don’t *think* it’s something that can be fixed in Postgres core. This is 
> mostly in FYI in case anyone else runs into this issue or knows something 
> more about it.

Okay, a response to the issue[1] says the bug is in Postgres:

> The error message is LLVM reporting the backend can't handle the particular 
> form of "probe-stack" attribute in the input LLVM IR. So this is likely a bug 
> in the way postgres is generating LLVM IR: please file a bug against 
> Postgres. (Feel free to reopen if you have some reason to believe the issue 
> is on the LLVM side.)

Would it be best for me to send a report to pgsql-bugs?

Best,

David

[1]: https://github.com/llvm/llvm-project/issues/95804#issuecomment-2174310977





FYI: LLVM Runtime Crash

2024-06-17 Thread David E. Wheeler
Hackers,

FYI, I wanted to try using PostgreSQL with LLVM on my Mac, but the backend 
repeatedly crashes during `make check`. I found the same issue in master and 
REL_16_STABLE. The crash message is:

FATAL:  fatal llvm error: Unsupported stack probing method
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

Same thing appears in the log output. I poked around and found a similar bug[1] 
was fixed in LLVM in December, but as I’m getting it with the latest release 
from 2 weeks ago, something else might be going on. I’ve opened a new LLVM bug 
report[2] for the issue.

I don’t *think* it’s something that can be fixed in Postgres core. This is 
mostly in FYI in case anyone else runs into this issue or knows something more 
about it.

In the meantime I’ll be building without --with-llvm.

Best,

David

[1]: https://github.com/llvm/llvm-project/issues/57171
[2]: https://github.com/llvm/llvm-project/issues/95804





Re: jsonpath: Missing regex_like && starts with Errors?

2024-06-16 Thread David E. Wheeler
On Jun 15, 2024, at 12:23, Chapman Flack  wrote:

> I see. Yes, that documentation now says "predicate check expressions return
> the single three-valued result of the predicate: true, false, or unknown".

It has been there since jsonpath was introduced in v12[1]:

> A path expression can be a Boolean predicate, although the SQL/JSON standard 
> allows predicates only in filters. This is necessary for implementation of 
> the @@ operator. For example, the following jsonpath expression is valid in 
> PostgreSQL:
> 
> '$.track.segments[*].HR < 70'



> (Aside: are all readers of the docs assumed to have learned the habit
> of calling SQL null "unknown" when speaking of a boolean? They can flip
> back to 8.6 Boolean Type and see 'a third state, “unknown”, which is
> represented by the SQL null value'. But would it save them some page
> flipping to add " (represented by SQL null)" to the sentence here?)

In 9.16.2[2] it says:

> The unknown value plays the same role as SQL NULL and can be tested for with 
> the is unknown predicate.

> As Unknown is typically what the predicates return within a filter (where
> errors get trapped) when an error has occurred, the existing docs seem to
> suggest they behave the same way in a "predicate check expression", so a
> change to that behavior now would be a change to what we've documented.

It’s reasonable to ask, then, whether `starts with` and `like_regex` are 
correct and the others shouldn’t throw errors in predicate check expressions, 
yes. I don’t know the answer, but would like it to be consistent.

> Can't really overload jsonb_path_query's 'silent' parameter for that,
> because it is already false by default. If predicate check expressions
> were nonsilent by default, the existing 'silent' parameter would be a
> perfect way to silence them.

I think that’s how it should be; I prefer that it raises errors by default but 
you can silence them:

david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', 
silent => false);
ERROR:  jsonpath item method .integer() can only be applied to a string or 
numeric value

david=# select jsonb_path_query(target => '{"x": "hi"}', path => '$.integer()', 
silent => true);
 jsonb_path_query 
--
(0 rows)

I suggest that the same behavior be adopted for `like_regex` and `starts with`.

> No appetite to add yet another optional boolean parameter to
> jsonb_path_query for the sole purpose of controlling the silence of
> our nonstandard syntax extension 

You don’t need it IMO, the existing silent parameter already does it existing 
error-raising operators.

Best,

David

[1]: 
https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH
[2]: 
https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-PATH






Re: Shouldn't jsonpath .string() Unwrap?

2024-06-15 Thread David E. Wheeler
On Jun 15, 2024, at 12:48, Andrew Dunstan  wrote:

> Not really needed, I will commit shortly.

Ah, okay, I wasn’t sure so just defaulted to making sure it was tracked. :-)

Thanks Andrew,

D





Re: Shouldn't jsonpath .string() Unwrap?

2024-06-15 Thread David E. Wheeler
On Jun 15, 2024, at 10:39, David E. Wheeler  wrote:

>> The changes are straightforward and look good to me. However, I have kept 
>> the existing test of an empty array as is, assuming that it is one of the 
>> valid tests. It now returns zero rows instead of an error. Your added test 
>> case covers this issue.
> 
> Makes sense, thank you.

Added https://commitfest.postgresql.org/48/5039/.

D





Re: jsonpath: Missing regex_like && starts with Errors?

2024-06-15 Thread David E. Wheeler
On Jun 14, 2024, at 22:29, Chapman Flack  wrote:

> So I should go look at our code to see what grammar we've implemented,
> exactly. It is beginning to seem as if we have simply added
>  as another choice for an expression, not restricted
> to only appearing in a filter. If so, and we add documentation about how
> we diverge from the standard, that's probably the way to say it.

Yes, if I understand correctly, these are predicate check expressions, 
supported and documented as an extension to the standard since Postgres 12[1]. 
I found their behavior quite confusing for a while, and spent some time 
figuring it out and submitting a doc patch (committed in 7014c9a[2]) to 
hopefully clarify things in Postgres 17.

> So that's where the errors went.

Ah, great, that explains the error suppression in filters. Thank you. I still 
think the supression of `like_regex` and `starts with` errors in predicate path 
queries is odd, though.

> The question of what should happen to the errors when a
>  appears outside of a 
> of course isn't answered in the standard, because that's not supposed
> to be possible. So if we're allowing predicates to appear on their own
> as expressions, it's also up to us to say what should happen with errors
> when they do.

Right, and I think there’s an inconsistency right now.

Best,

David

[1]: 
https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS
[2]: https://github.com/postgres/postgres/commit/7014c9a





Re: Shouldn't jsonpath .string() Unwrap?

2024-06-15 Thread David E. Wheeler
On Jun 15, 2024, at 10:27, Jeevan Chalke  wrote:

> Sorry, I have missed this in the original patch. I am surprised how that 
> happened. But thanks for catching the same and fixing it.

No worries. :-)

> The changes are straightforward and look good to me. However, I have kept the 
> existing test of an empty array as is, assuming that it is one of the valid 
> tests. It now returns zero rows instead of an error. Your added test case 
> covers this issue.

Makes sense, thank you.

D





jsonpath: Missing regex_like && starts with Errors?

2024-06-14 Thread David E. Wheeler
Hackers,

I noticed that neither `regex_like` nor `starts with`, the jsonpath operators, 
raise an error when the operand is not a string (or array of strings):

david=# select jsonb_path_query('true', '$ like_regex "^hi"');
 jsonb_path_query 
--
 null
(1 row)

david=# select jsonb_path_query('{"x": "hi"}', '$ starts with "^hi"');
 jsonb_path_query 
--
 null
(1 row)

This is true in strict and lax mode, and with verbosity enabled (as in these 
examples). Most other operators raise an error when they can’t operate on the 
operand:

david=# select jsonb_path_query('{"x": "hi"}', '$.integer()');
ERROR:  jsonpath item method .integer() can only be applied to a string or 
numeric value

david=# select jsonb_path_query('{"x": "hi"}', '$+$');
ERROR:  left operand of jsonpath operator + is not a single numeric value

Should `like_regex` and `starts with` adopt this behavior, too?

I note that filter expressions seem to suppress these sorts of errors, but I 
assume that’s by design:

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ starts with 
"^hi")');
 jsonb_path_query 
--
(0 rows)

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@ like_regex 
"^hi")');
 jsonb_path_query 
--
(0 rows)

david=# select jsonb_path_query('{"x": "hi"}', 'strict $ ?(@.integer() == 1)');
 jsonb_path_query 
--
(0 rows)

D





Re: Shouldn't jsonpath .string() Unwrap?

2024-06-14 Thread David E. Wheeler


> On Jun 14, 2024, at 11:25, Chapman Flack  wrote:
> 
> I would s/extepsions/exceptions/ in the added documentation. :)

Bah, fixed and attached, thanks.

> Offhand (as GitHub PRs aren't really The PG Way), if they were The Way,
> I would find this one a little hard to follow, being based at a point
> 28 unrelated commits ahead of the ref it's opened against. I suspect
> 'master' on theory/postgres could be fast-forwarded to f1affb6 and then
> the PR would look much more approachable.

Yeah, I pushed the PR and branch before I synced master, and GitHub was taking 
a while to notice and update the PR. I fixed it with `git commit --all --amend 
--date now --reedit-message HEAD` and force-pushed (then fixed the typo and 
fixed again).

D



v2-0001-Teach-jsonpath-string-to-unwrap-in-lax-mode.patch
Description: Binary data




Re: Shouldn't jsonpath .string() Unwrap?

2024-06-14 Thread David E. Wheeler
On Jun 13, 2024, at 21:55, Chapman Flack  wrote:

> My opinion is yes, that should be done. 9.46, umm, General
> Rule 11 g ii 6) A) says just "if MODE is lax and  is not
> type or size, then let BASE be Unwrap(BASE)." No special exemption
> there for string(), nor further below at C) XV) for the operation
> of string().

Thank you! Cited that bit in the commit message in the attached patch (also 
available as a GitHub PR[1]).

D

[1]: https://github.com/theory/postgres/pull/5




v1-0001-Teach-jsonpath-string-to-unwrap-in-lax-mode.patch
Description: Binary data


Re: jsonpath: Missing Binary Execution Path?

2024-06-14 Thread David E. Wheeler
On Jun 13, 2024, at 22:31, Chapman Flack  wrote:

> It's baked right into the standard grammar: || can only have a
>  on its right and a 
> on its left.
> 
> && can only have a  on its right and a
>  on its left.

Wow. 

> The case for ! is even more limiting: it can't be applied to anything
> but a . That can be either the exists predicate,
> or, any other  but wrapped in parentheses.
> 
> The language seems sort of gappy in the same way XPath 1.0 was. XPath 2.0
> became much more consistent and conceptually unified, only by that time,
> XML was old school, and JSON was cool, and apparently started inventing
> a path language.

I suppose that’s the reason for this design. But if these sorts of limitations 
were changed in XPath, perhaps SQL-Next could fix them, too.

Thanks for citing the standard; super helpful.

D





Re: jsonpath: Missing Binary Execution Path?

2024-06-13 Thread David E. Wheeler
On Jun 13, 2024, at 21:58, Chapman Flack  wrote:

 david=# select jsonb_path_query('1', '$ >= 1');
>>> 
>>> Good point. I can't either. No way I can see to parse that as
>>> a .
>> 
>> Whether we note it as non-standard or not is an open question then, but it
>> does work and opens up a documentation question.
> 
> Does the fact that it does work raise any potential concern that our
> grammar is nonconformant in some way that could present a headache
> somewhere else, or down the road with a later standard edition?

I believe this case is already covered in the docs as a Postgres-specific 
feature: predicate path expressions.

But even inside filters I don’t understand why &&, ||, at least, currently only 
work if their operands are predicate expressions. Seems weird; and your notes 
above suggest that rule applies only to !, which makes slightly more sense.

D





Re: Shouldn't jsonpath .string() Unwrap?

2024-06-13 Thread David E . Wheeler
On Jun 13, 2024, at 3:53 PM, Andrew Dunstan  wrote:

> Hmm. You might be right. Many of these items have this code, but the string() 
> branch does not:
> if (unwrap && JsonbType(jb) == jbvArray)
>return executeItemUnwrapTargetArray(cxt, jsp, jb, found,
>false);

Exactly, would be pretty easy to add. I can work up a patch this weekend.

D





Re: jsonpath: Missing Binary Execution Path?

2024-06-13 Thread David E. Wheeler
On Jun 13, 2024, at 3:33 PM, Andrew Dunstan  wrote:

> What does the spec say about these? What do other implementations do?

Paging Mr. Eisentraut!

:-)

D






Re: jsonpath: Missing Binary Execution Path?

2024-06-13 Thread David E. Wheeler
On Jun 13, 2024, at 11:32, David E. Wheeler  wrote:

> Should && and || not also work on scalar operands?

I see the same issue for unary !, too:

david=# select jsonb_path_query('true', '!$');
ERROR:  syntax error at or near "$" of jsonpath input
LINE 1: select jsonb_path_query('true', '!$');
^
david=# select jsonb_path_query('[1, 3, 7]', '$[*] ? (!true)');
ERROR:  syntax error at end of jsonpath input
LINE 1: select jsonb_path_query('[1, 3, 7]', '$[*] ? (!true)');
 ^
david=# select jsonb_path_query('[1, 3, 7]', '$[*] ? (!@.boolean())');
ERROR:  syntax error at or near "@" of jsonpath input
LINE 1: select jsonb_path_query('[1, 3, 7]', '$[*] ? (!@.boolean())'...
 ^

Best,

David





jsonpath: Missing Binary Execution Path?

2024-06-13 Thread David E. Wheeler
Hackers,

Another apparent inconsistency I’ve noticed in jsonpath queries is the 
treatment of the && and || operators: They can’t operate on scalar functions, 
only on other expressions. Some examples:

david=# select jsonb_path_query('true', '$ && $');
ERROR:  syntax error at or near "&&" of jsonpath input
LINE 1: select jsonb_path_query('true', '$ && $');
^
david=# select jsonb_path_query('true', '$.boolean() && $.boolean()');
ERROR:  syntax error at or near "&&" of jsonpath input
LINE 1: select jsonb_path_query('true', '$.boolean() && $.boolean()'...
^
The only place I’ve seen them work is inside filters with binary or unary 
operands:

jsonb_path_query('[1, 3, 7]', '$[*] ? (@ > 1 && @ < 5)');
 jsonb_path_query 
--
 3

It doesn’t even work with boolean methods!

david=# select jsonb_path_query('[1, 3, 7]', '$[*] ? (@.boolean() && 
@.boolean())');
ERROR:  syntax error at or near "&&" of jsonpath input
LINE 1: select jsonb_path_query('[1, 3, 7]', '$[*] ? (@.boolean() &&...
 ^
Other binary operators work just fine in these sorts of contexts:

david=# select jsonb_path_query('1', '$ >= 1');
 jsonb_path_query 
--
 true
(1 row)

david=# select jsonb_path_query('[1, 3, 7]', '$[*] ? (@ > 1)');
 jsonb_path_query 
--
 3
 7
(2 rows)

Should && and || not also work on scalar operands?

Best,

David






Re: Shouldn't jsonpath .string() Unwrap?

2024-06-12 Thread David E. Wheeler
On Jun 12, 2024, at 4:02 PM, David G. Johnston  
wrote:

> Adding Andrew.

Thank you.

> I'm willing to call this an open item against this feature as I don't see any 
> documentation explaining that string() behaves differently than the others.

Maybe there’s some wording in the standard on this topic?

I’m happy to provide a patch to auto-unwrap .string() in lax mode. Seems pretty 
straightforward.

D





Re: Proposal: Document ABI Compatibility

2024-06-12 Thread David E. Wheeler
On Jun 12, 2024, at 10:47, Robert Haas  wrote:

> What I think would be useful to document is our usual practices e.g.
> adding new struct members at the end of structs, trying to avoid
> changing public function signatures. If we document promises to
> extension authors, I don't know how much difference that will make:
> we'll probably end up needing to violate them at some point for one
> reason or another.

I think that’s fine if there is some sort of notification process. The policy I 
drafted upthread starts with making sure the such a break is mentioned in the 
release notes.

> But if we document what committers should do, then
> we might do better than we're now, because committers will be more
> likely to do it right, and extension authors can also read those
> instructions to understand what our practices are.

Yes, this, thank you!

D





Re: Proposal: Document ABI Compatibility

2024-06-12 Thread David E. Wheeler
On Jun 12, 2024, at 8:58 AM, Jelte Fennema-Nio  wrote:

> While not strictly an ABI break I guess, the backport of 32d5a4974c81
> broke building Citus against 13.10 and 14.7[1].
> 
> [1]: https://github.com/citusdata/citus/pull/6711

Interesting one. We might want to advise projects to use deferent names if they 
copy code from the core, use an extension-specific prefix perhaps. That way if 
it gets backported by the core, as in this example, it won’t break anything, 
and the extension can choose to switch.

D





Re: Proposal: Document ABI Compatibility

2024-06-12 Thread David E. Wheeler
On Jun 12, 2024, at 8:43 AM, Peter Eisentraut  wrote:

>> Right, it’s just that extension authors could use some notification that 
>> such a change is coming so they can update their code, if necessary.
> 
> I think since around 6 years ago we have been much more vigilant about 
> avoiding ABI breaks.  So if there aren't any more recent examples of 
> breakage, then maybe that was ultimately successful, and the upshot is, 
> continue to be vigilant at about the same level?

That sounds great to me. I’d like to get it documented, though, so that 
extension and other third party developers are aware of it, and not just making 
wild guesses and scaring each other over (perhaps) misconceptions.

D





Re: Proposal: Document ABI Compatibility

2024-06-11 Thread David E. Wheeler
On Jun 10, 2024, at 15:39, Andres Freund  wrote:

> That's 6 years ago, not sure we can really learn that much from that.
> 
> And it's not like it's actually impossible, #ifdefs aren't great, but they are
> better than nothing.

Right, it’s just that extension authors could use some notification that such a 
change is coming so they can update their code, if necessary.

>> Or, to David C’s point, perhaps it would be better to say there are some 
>> categories of APIs that are not subject to any guarantees in minor releases?
> 
> I'm honestly very dubious that this is a good point to introduce a bunch of
> formalism. It's a already a lot of work to maintain them, if we make it even
> harder we'll end up more fixes not being backported, because it's not worth
> the pain.

Well it’s a matter of distributing the work. I don’t want to increase anyone’s 
workload unnecessarily, but as it is stuff like this can be surprising to 
extension maintainers with some expectation of minor release stability who had 
no warning of the change. That kind of thing can dissuade some people from 
deciding to write or maintain extensions, and lead others to recompile and 
distribute binaries for every single minor release.

> To be blunt, the number of examples raised here doesn't seem to indicate that
> this is an area where we need to invest additional resources. We are already
> severely constrained as a project by committer bandwidth, there are plenty
> other things that seem more important to focus on.

So my question is, what’s the least onerous thing for committers to commit to 
doing that we can write down to properly set expectations? That’s where I want 
to start: can we publish a policy that reflects what committers already adhere 
to? And is there some way to let people know that an incompatible change is 
being released? Even if it just starts out in the release notes?

Based on this thread, I’ve drafted the sort of policy I have in mind. Please 
don’t assume I’m advocating for exactly the wording here! Let’s workshop this 
until it’s something the committers and core team can agree to. (At that point 
I’ll turn it into a doc patch) Have a look and let me know what you think.

``` md

ABI Policy
==

The PostgreSQL core team maintains two application binary interface (ABI) 
guarantees: one for major releases and one for minor releases.

Major Releases
--

Applications that use the PostgreSQL APIs must be compiled for each major 
release supported by the application. The inclusion of `PG_MODULE_MAGIC` 
ensures that code compiled for one major version will rejected by other major 
versions.

Furthermore, new releases may make API changes that require code changes. Use 
the `PG_VERSION_NUM` constant to adjust code in a backwards compatible way:

``` c
#if PG_VERSION_NUM >= 16
#include "varatt.h"
#endif
```

PostgreSQL avoids unnecessary API changes in major releases, but usually ships 
a few necessary API changes, including deprecation, renaming, and argument 
variation. In such cases the incompatible changes will be listed in the Release 
Notes.

Minor Releases
--

PostgreSQL makes every effort to avoid both API and ABI breaks in minor 
releases. In general, an application compiled against any minor release will 
work with any other minor release, past or future.

When a change *is* required, PostgreSQL will choose the least invasive way 
possible, for example by squeezing a new field into padding space or appending 
it to the end of a struct. This sort of change should not impact dependent 
applications unless they use `sizeof(the struct)` or create their own instances 
of such structs --- patterns best avoided.

In rare cases, however, even such non-invasive changes may be impractical or 
impossible. In such an event, the change will be documented in the Release 
Notes, and details on the issue will also be posted to [TBD; mail list? Blog 
post? News item?].

The project strongly recommends that developers adopt continuous integration 
testing at least for the latest minor release all major versions of Postgres 
they support.
```

Best,

David





Re: Proposal: Document ABI Compatibility

2024-06-10 Thread David E. Wheeler
On Jun 4, 2024, at 03:18, Peter Eisentraut  wrote:

> This could possibly be avoided by renaming the symbol in backbranches. Maybe 
> something like
> 
> #define InitResultRelInfo InitResultRelInfo2
> 
> Then you'd get a specific error message when loading the module, rather than 
> a crash.

That sounds more useful, yes. Is that a practice the project would consider 
adopting?

There’s also oracle_fdw@d137d15[1], which says:

> An API break in PostgreSQL 10.4 and 9.6.9 makes it impossible
> to use these versions: the "extract_actual_join_clauses" function
> gained an additional parameter.

The 10.4 commit is 68fab04, and it does indeed add a new function:

``` patch
--- a/src/include/optimizer/restrictinfo.h
+++ b/src/include/optimizer/restrictinfo.h
@@ -36,6 +36,7 @@ extern List *get_actual_clauses(List *restrictinfo_list);
 extern List *extract_actual_clauses(List *restrictinfo_list,
   bool pseudoconstant);
 extern void extract_actual_join_clauses(List *restrictinfo_list,
+   Relids joinrelids,
List **joinquals,
List **otherquals);
 extern bool join_clause_is_movable_to(RestrictInfo *rinfo, RelOptInfo 
*baserel);
```

I wonder if that sort of change could be avoided in backpatches, maybe by 
adding and using a `extract_actual_join_clauses_compat` function and using that 
internally instead?

Or, to David C’s point, perhaps it would be better to say there are some 
categories of APIs that are not subject to any guarantees in minor releases?

Best,

David

[1]: 
https://github.com/laurenz/oracle_fdw/commit/d137d15edca8c67df1e5a01f417f4833b028
[2]: 
https://github.com/postgres/postgres/commit/68fab04f7c2a07c5308e3d2957198ccd7a80ebc5#diff-bb6fa74cb115e19684092f0938131cd5d99b26fa2d49480f7ea7f28e937a7fb4






Shouldn't jsonpath .string() Unwrap?

2024-06-08 Thread David E. Wheeler
Hackers,

Most of the jsonpath methods auto-unwrap in lax mode:

david=# select jsonb_path_query('[-2,5]', '$.abs()');
 jsonb_path_query
--
 2
 5
(2 rows)

The obvious exceptions are size() and type(), which apply directly to arrays, 
so no need to unwrap:

david=# select jsonb_path_query('[-2,5]', '$.size()');
 jsonb_path_query
--
 2
(1 row)

david=# select jsonb_path_query('[-2,5]', '$.type()');
 jsonb_path_query
--
 "array"

But what about string()? Is there some reason it doesn’t unwrap?

david=# select jsonb_path_query('[-2,5]', '$.string()');
ERROR:  jsonpath item method .string() can only be applied to a bool, string, 
numeric, or datetime value

What I expect:

david=# select jsonb_path_query('[-2,5]', '$.string()');
 jsonb_path_query
—
 "2"
 "5"
(2 rows)

However, I do see a test[1] for this behavior, so maybe there’s a reason for it?

Best,

David

[1]: 
https://github.com/postgres/postgres/blob/REL_17_BETA1/src/test/regress/expected/jsonb_jsonpath.out#L2527-L2530





Re: Patch bug: Fix jsonpath .* on Arrays

2024-06-07 Thread David E. Wheeler
On Jun 7, 2024, at 10:23, David E. Wheeler  wrote:

> Rebased and moved the new tests to the end of the file.

Bah, sorry, that was the previous patch. Here’s v3.

D



v3-0001-Add-tests-for-jsonpath-.-on-arrays.patch
Description: Binary data


Re: Patch bug: Fix jsonpath .* on Arrays

2024-06-07 Thread David E. Wheeler
On Jun 4, 2024, at 20:45, David E. Wheeler  wrote:

> Oh FFS, unwrapping still breaks my brain. You’re right, of course. Here’s a 
> new patch that demonstrates that behavior, since that code path is not 
> currently represented in tests AFAICT (I would have expected to have broken 
> it with this patch).

Rebased and moved the new tests to the end of the file.

D


v2-0001-Add-tests-for-jsonpath-.-on-arrays.patch
Description: Binary data


Re: Patch bug: Fix jsonpath .* on Arrays

2024-06-05 Thread David E. Wheeler
On Jun 4, 2024, at 20:45, David E. Wheeler  wrote:

> Here’s a new patch that demonstrates that behavior, since that code path is 
> not currently represented in tests AFAICT (I would have expected to have 
> broken it with this patch).

Commitfest link:

  https://commitfest.postgresql.org/48/5017/

D





Re: Patch bug: Fix jsonpath .* on Arrays

2024-06-04 Thread David E . Wheeler
On Jun 4, 2024, at 12:28 PM, David G. Johnston  
wrote:

> This seems to be working correctly. Lax mode causes the first array level to 
> unwrap and produce new context item values.  Then the wildcard member 
> accessor is applied to each.  Numbers don’t have members so no matches exist 
> in the first example.  The object in the second indeed has a single member 
> and so matches the wildcard and its value, the array, is returned.

Oh FFS, unwrapping still breaks my brain. You’re right, of course. Here’s a new 
patch that demonstrates that behavior, since that code path is not currently 
represented in tests AFAICT (I would have expected to have broken it with this 
patch).

D



0001-Add-tests-for-jsonpath-.-on-arrays.patch
Description: Binary data


Patch bug: Fix jsonpath .* on Arrays

2024-06-04 Thread David E. Wheeler
Hackers,

The behavior of the .* jpiAnyKey jsonpath selector seems incorrect.

```
select jsonb_path_query('[1,2,3]', '$.*');
jsonb_path_query 
--
(0 rows)

select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', '$.*');
jsonb_path_query 
--
[3, 4, 5]
```

The first example might be expected, since .* is intended for object keys, but 
the handing of `jpiAnyKey` has a branch for unwrapping arrays. The second 
example, however, just seems weird: this is .*, not .**.

The attached patch fixes it by passing the next node to `executeAnyItem()` (via 
`executeItemUnwrapTargetArray()`) and then properly setting `jperOk` when 
`executeAnyItem()` finds values and there is no current (next) node.

I took this approach given what appears to be the intended behavior or $* on 
arrays in lax mode. However, I could see an argument that .* should not apply 
to arrays at all. If so, I can submit a new patch removing the branch that 
unwraps an array with .*.

Best,

David



0001-Fix-behavior-of-jsonpath-.-on-arrays.patch
Description: Binary data


Re: Proposal: Document ABI Compatibility

2024-06-03 Thread David E. Wheeler
On Jun 3, 2024, at 5:56 PM, Andres Freund  wrote:

> I don't see how this would trigger random crashes.
> 
> Unfortunately [4] doesn't seem to take me to a relevant message (pruned chat
> history?), so I can't infer more from that context.

You can use [4] to join the Slack (if you haven’t already) and [5] for the 
relevant post.

> Regardless of ABI issues, it's probably a good idea to continually run tests
> against in-development minor versions, just to prevent something breaking from
> creeping in. IIRC there were a handful of cases where we accidentally broke
> some extension, because they relied on some implementation details.

Oh yeah, I run regular tests against the latest minor release of all supported 
Postgres version for my extensions, using pgxn-tools[6], which looks like 
this[7]. Which I consider absolutely essential. But it doesn’t mean that 
something compiled against .4 will work with .3 and vice versa. That’s what we 
could use the guidance/guarantees on.

>> Sure, probably not a problem, but if that’s the sole qualifier for making
>> binary changes, I think it’s worth saying, as opposed to “we don’t make
>> any”. Something like “Only changes to padding, which you never used anyway,
>> right?” :-)
> 
> IDK, to me something like this seems to promise more than we actually can.

What I’d like to do is figure out exactly what we *can* promise and perhaps 
some guidelines, and start with that.

Best,

David

[4]: https://pgtreats.info/slack-invite
[5]: 
https://postgresteam.slack.com/archives/C056ZA93H1A/p1716502630690559?thread_ts=1716500801.036709=C056ZA93H1A
[6]: https://github.com/pgxn/docker-pgxn-tools
[7]: https://github.com/pgxn/docker-pgxn-tools/actions/runs/9351752462





Re: Proposal: Document ABI Compatibility

2024-06-03 Thread David E. Wheeler
On Jun 3, 2024, at 14:58, Andres Freund  wrote:

> Hi,

Hello Andres.

> Are there notes for the session?

Yes, but not posted yet. Here’s what Andreas 'ads' Scherbaum sent me for that 
bit of the conversation:

*   Core is focused on core ABI stability
*   David: No "statement of stability" in Core
*   David/Jeremy/Tom: coding guidelines, style guidelines
*useful to have docs in core about what's stable and what's not, what 
you should compile against or not, and ABI guarantees
*   Abigale: there are hooks, but no overall concept for extensions
*   Tom: Peter Eisentraut is working on tests for extensions stability
*   Jeremy: nothing is preventing people from installing incompatible versions


> It'd be nice if the slides for the talk could be uploaded...

He also talked about it at Mini-Summit Five[1], for which I posted slides[2] 
(slides 11-14) and video[3] (starting at 29:08).

> I don't think we can really make this a hard guarantee. Yes, we try hard to
> avoid ABI breaks, but there IIRC have been a few cases over the years where
> that wasn't practical for some reason. If we have to decide between a bad bug
> and causing an ABI issue that's unlikely to affect anybody, we'll probably
> choose the ABI issue.

We can document that too, and perhaps a policy for letting people know. I 
thought I recalled something like this in the past, but Rob Treat did some 
spelunking through change logs and found only CVEs that needed repairs by 
manually running some SQL. So some sense of its rarity would also be useful.

> Thus I am not really on board with this statement as-is.

That’s fine, we should get it to where there’s consensus on the ordering and 
agreement on what, exactly, it should be.

> Extensions in general can do lots of stuff, guaranteeing that bug fixes don't
> cause any problems is just not feasible.
> 
> It'd be interesting to see a few examples of actual minor-version-upgrade
> extension breakages, so we can judge what caused them.

In the community Slack[4], Matthias van de Meent writes[5]:

> Citus’ pg_version_compat.h[7] where it re-implements a low-level function 
> that was newly introduced in PG14.7. If you build against PG14.7+ headers, 
> you may get random crashes when running on 14.6.

I suppose it would work fine on 14.7 if compiled on 14.6 though. I suspect 
there aren’t many examples, though, mostly just a lot of anxiety, and some have 
decided that extensions must be recompiled for every minor release in order to 
avoid the issue. StackGres[7] is one example, but I suspect Omni (Yurii’s 
company) may follow.

> I don't think it's common for such new-fields-in-padding to cause problems
> when using an earlier minor PG version. For that the extension would need to
> actually rely on the presence of the new field, but typically that'd not be
> the case when we introduce a new field in a minor version.

That’s what I was thinking, so “compatibility assured only if you don’t use 
padding yourself” would be super helpful.

>> Unless, that is, we can provide a complete list of things not to do (like
>> make use of padding) to avoid it. Is that feasible?
> 
> You can't really rely on the contents of padding, in general. So I don't think
> this is really something that needs to be called out.

Sure, probably not a problem, but if that’s the sole qualifier for making 
binary changes, I think it’s worth saying, as opposed to “we don’t make any”. 
Something like “Only changes to padding, which you never used anyway, right?” 
:-)

D

[1]: https://justatheory.com/2024/05/mini-summit-five/
[2]: 
https://justatheory.com/shared/extension-ecosystem-summit/omni-universally-buildable-extensions.pdf
[3]: https://youtu.be/R5ijx8IJyaM
[4]: https://pgtreats.info/slack-invite
[5]: 
https://postgresteam.slack.com/archives/C056ZA93H1A/p1716502630690559?thread_ts=1716500801.036709=C056ZA93H1A
[6] 
https://github.com/citusdata/citus/blob/main/src/include/pg_version_compat.h#L236-L248
[7]: https://stackgres.io/doc/latest/intro/extensions/





Proposal: Document ABI Compatibility

2024-06-03 Thread David E. Wheeler
Hackers,

At the PGConf Unconference session on improving extension support in core, we 
talked quite a bit about the recent anxiety among extension developers about a 
lack of an ABI compatibility guarantee in Postgres. Yurii Rashkovskii did a 
little header file spelunking and talked[1] about a few changes in minor 
version releases, including to apparent field order in structs. Jeremy 
Schneider posted the example on Twitter[2], and Peter G replied[3]:

> You must be referring to my commit 714780dc. The new field is stored within 
> alignment padding (though only on back branches). Has this been tied to a 
> known problem?

At the Unconference, Tom Lane said that this approach is pretty well drilled 
into the heads of every committer, and new ones pick it up through experience. 
The goal, IIUC, is to never introduce binary incompatibilities into the C APIs 
in minor releases. This standard would be good to document, to let extension 
developers know exactly what the guarantees are.

I’m happy to start a doc patch to add an ABI compatibility guarantee 
(presumably in xfunc.sgml), but want to be sure the details are right, namely:

* There are no source or binary compatibility guarantees for major releases.

* The ABI is guaranteed to change only in backward compatible ways in minor 
releases. If for some reason it doesn’t it’s a bug that will need to be fixed.

This ensures that an extension compiled against an earlier minor release will 
continue to work without recompilation on later minor releases of the same 
major version.

But if I understand correctly, the use of techniques like adding a new field in 
padding does not mean that extensions compiled on later minor releases will 
work on earlier minor releases of the same major version. Unless, that is, we 
can provide a complete list of things not to do (like make use of padding) to 
avoid it. Is that feasible?

Are there other details it should include?

Thanks,

David

[1]: https://www.pgevents.ca/events/pgconfdev2024/schedule/session/14
[2]: https://x.com/jer_s/status/1785717368804815026
[3]: https://x.com/petervgeoghegan/status/1785720228237717627





jsonpath Time and Timestamp Special Cases

2024-04-29 Thread David E. Wheeler
Hello hackers,

I noticed that the jsonpath date/time functions (.time() and timestamp(), et 
al.) don’t support some valid but special-case PostgreSQL values, notably 
`infinity`, `-infinity`, and, for times, '24:00:00`:

❯ psql
psql (17devel)
Type "help" for help.

david=# select jsonb_path_query(to_jsonb('24:00:00'::time), '$.time()');
ERROR:  time format is not recognized: "24:00:00"

david=# select jsonb_path_query(to_jsonb('infinity'::timestamptz), 
'$.timestamp_tz()');
ERROR:  timestamp_tz format is not recognized: "infinity"

I assume this is because the standard doesn’t support these, or references 
JavaScript-only values or some such. Is that right?

Best,

David





Re: New committers: Melanie Plageman, Richard Guo

2024-04-26 Thread David E. Wheeler
On Apr 26, 2024, at 07:54, Jonathan S. Katz  wrote:

> The Core Team would like to extend our congratulations to Melanie Plageman 
> and Richard Guo, who have accepted invitations to become our newest 
> PostgreSQL committers.
> 
> Please join us in wishing them much success and few reverts!

Great news, congratulations!

Best,

David





Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 3:22 PM, Erik Wienhold  wrote:

> Thanks Peter!  But what is the definition of the entire path expression?
> Perhaps something like:
> 
> ::=  { "."  }
> 
> That would imply that "$.$foo" is a valid path that accesses a variable
> member (but I guess the path evaluation is also specified somewhere).

I read it as “if it starts with a dollar sign, it’s a variable and not a path 
identifier”, and I assume any `.foo` expression is a path identifier.

> What bugs me about this description, after reading it a couple of times,
> is that it's not clear what is meant by ."$varname".  It could mean two
> things: (1) the double-quoting masks $varname in order to not interpret
> those characters as a variable or (2) an interpolated string that
> resolves $varname and yields a dynamic member accessor.

My understanding is that if it’s in double quotes it’s never anything other 
than a string (whether a string literal or a path identifier string literal). 
IOW, variables don’t interpolate inside strings.

> Under case (2) I'd expect that query to return 456 (because $foo
> resolves to "bar").  (Similar to how psql would resolve :'foo' to
> 'bar'.)

Yes, I suspect this is the correct interpretation, but agree the wording could 
use some massaging, especially since path identifiers cannot start with a 
dollar sign anyway. Perhaps:

"If the key name starts with $ or does not meet the JavaScript rules for an 
identifier, it must be enclosed in double quotes to make it a string literal."

> Variables already work in array accessors and table 8.25 says that "The
> specified index can be an integer, as well as an expression returning a
> single numeric value [...]".  A variable is such an expression.
> 
>=> select jsonb_path_query('[2,3,5]', '$[$i]', '{"i":1}');
> jsonb_path_query
>--
> 3
>(1 row)
> 
> So I'd expect a similar behavior for member accessors as well when
> seeing ."$varname" in the same table.

Oh, interesting point! Now I wonder if the standard has this inconsistency (and 
is aware of it).

> Yes, I think so.  That would be case C in the spec excerpt provided by
> Peter.  So it's just a key name that happens to contain (but not start
> with) the dollar sign.

Exactly. It also matches the doc you quote above. Something would have to 
change in src/backend/utils/adt/jsonpath_scan.l to fix that, but that file 
makes my eyes water, so I’m not gonna take a stab at it. :-)

D





Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 05:46, Peter Eisentraut  wrote:

> I have committed this patch, and backpatched it, as a bug fix, because the 
> existing description was wrong.  To keep the patch minimal for backpatching, 
> I didn't do the conversion to a list.  I'm not sure I like that anyway, 
> because it tends to draw more attention to that part over the surrounding 
> parts, which didn't seem appropriate in this case.  But anyway, if you have 
> any more non-bug-fix editing in this area, which would then target PG18, 
> please send more patches.

Makes sense, that level of detail gets into the weeks so maybe doesn’t need to 
be quite so prominent as a list. Thank you!

David





Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 05:51, Peter Eisentraut  wrote:

>A  is classified as follows.
> 
>Case:
> 
>a) A  that is a  is acontext variable>.
> 
>b) A  that begins with  is a
>   .
> 
>c) Otherwise, a  is a .
> 
> Does this help?  I wasn't following all the discussion to see if there is 
> anything wrong with the implementation.

Yes, it does, as it ties the special meaning of the dollar sign to the 
*beginning* of an expression. So it makes sense that this would be an error:

david=# select '$.$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
   ^

But I’m less sure when a dollar sign is used in the *middle* (or end) of a json 
path identifier:

david=# select '$.xx$foo'::jsonpath;
ERROR:  syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.xx$foo'::jsonpath;
   ^

Perhaps that should be valid?

Best,

David





Re: RFC: Additional Directory for Extensions

2024-04-11 Thread David E. Wheeler
On Apr 4, 2024, at 1:20 PM, David E. Wheeler  wrote:

> I’ve added some docs based on your GUC description; updated patch attached.

Here’s a rebase.

I realize this probably isn’t going to happen for 17, given the freeze, but I 
would very much welcome feedback and pointers to address concerns about 
providing a second directory for extensions and DSOs. Quite a few people have 
talked about the need for this in the Extension Mini Summits[1], so I’m sure I 
could get some collaborators to make improvements or look at a different 
approach.

Best,

David

[1] 
https://justatheory.com/2024/02/extension-ecosystem-summit/#extension-ecosystem-mini-summit




v3-0001-Add-extension_directory-GUC.patch
Description: Binary data




Re: ❓ JSON Path Dot Precedence

2024-04-10 Thread David E. Wheeler
On Apr 10, 2024, at 10:29, Peter Eisentraut  wrote:

> So the whole thing is
> 
>   
> 
> The syntax of  and  is then punted to 
> ECMAScript 5.1.
> 
> 0x2 is a HexIntegerLiteral.  (There can be no dots in that.)
> 
> p10 is an Identifier.
> 
> So I think this is all correct.

That makes sense, thanks. It’s just a little odd to me that the resulting path 
isn’t a query at all. To Erik’s point: what path can `'0x2.p10` even select?

Best,

David





Re: ❓ JSON Path Dot Precedence

2024-04-07 Thread David E. Wheeler
On Apr 7, 2024, at 15:46, Erik Wienhold  wrote:

> I guess jsonpath assumes that hex, octal, and binary literals are
> integers.  So there's no ambiguity about any fractional part that might
> follow.

Yeah, that’s what the comment in the flex file says:

https://github.com/postgres/postgres/blob/b4a71cf/src/backend/utils/adt/jsonpath_scan.l#L102-L105


> Also, is there even a use case for path "0x2.p10"?  The path has to
> start with "$" or ("@" in case of a filter expression), doesn't it?  And
> it that case it doesn't parse:
> 
>test=# select '$.0x2.p10'::jsonpath;
>ERROR:  trailing junk after numeric literal at or near ".0x" of jsonpath 
> input
>LINE 1: select '$.0x2.p10'::jsonpath;
> 
> Even with extra whitespace:
> 
>test=# select '$ . 0x2 . p10'::jsonpath;
>ERROR:  syntax error at or near "0x2" of jsonpath input
>LINE 1: select '$ . 0x2 . p10'::jsonpath;
> 
> Or should it behave like an array accessor?  Similar to:
> 
>test=# select jsonb_path_query('[0,1,{"p10":42},3]', 
> '$[0x2].p10'::jsonpath);
> jsonb_path_query
>--
> 42
>(1 row)

I too am curious why these parse successfully, but don’t appear to be useful.

Best,

David





❓ JSON Path Dot Precedence

2024-04-07 Thread David E. Wheeler
Hello Hackers,

A question about the behavior of the JSON Path parser. The docs[1] have this to 
say about numbers:

>  Numeric literals in SQL/JSON path expressions follow JavaScript rules, which 
> are different from both SQL and JSON in some minor details. For example, 
> SQL/JSON path allows .1 and 1., which are invalid in JSON.

In other words, this is valid:

david=# select '2.'::jsonpath;
 jsonpath 
--
 2

But this feature creates a bit of a conflict with the use of a dot for path 
expressions. Consider `0x2.p10`. How should that be parsed? As an invalid 
decimal expression ("trailing junk after numeric literal”), or as a valid 
integer 2 followed by the path segment “p10”? Here’s the parser’s answer:

david=# select '0x2.p10'::jsonpath;
 jsonpath  
---
 (2)."p10"

So it would seem that, other things being equal, a path key expression (`.foo`) 
is slightly higher precedence than a decimal expression. Is that 
intentional/correct?

Discovered while writing my Go lexer and throwing all of Go’s floating point 
literal examples[2] at it and comparing to the Postgres path parser. Curiously, 
this is only an issue for 0x/0o/0b numeric expressions; a decimal expression 
does not behave in the same way:

david=# select '2.p10'::jsonpath;
ERROR:  trailing junk after numeric literal at or near "2.p" of jsonpath input
LINE 1: select '2.p10'::jsonpath;

Which maybe seems a bit inconsistent.

Thoughts on what the “correct” behavior should be?

Best,

David

  [1]: 
https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
  [2]: https://tip.golang.org/ref/spec#Floating-point_literals



Re: RFC: Additional Directory for Extensions

2024-04-04 Thread David E. Wheeler
On Apr 3, 2024, at 12:46 PM, Christoph Berg  wrote:

> Thanks for bringing this up, I should have submitted this years ago.
> (The patch is originally from September 2020.)

That’s okay, it’s still 2020 in some ways. 

> I designed the patch to require a superuser to set it, so it doesn't
> matter very much by which mechanism it gets updated. There should be
> little reason to vary it at run-time, so I'd be fine with requiring a
> restart, but otoh, why restrict the superuser from reloading it if
> they know what they are doing?

I think that’s fair. I’ll keep it requiring a restart now, on the theory it 
would be easier to loosen it later than have to tighten it later.

> I'm not sure the concept of "core libraries" exists. PG happens to
> dlopen things at run time, and it doesn't know/care if they were
> installed by users or by the original PG server. Also, an exploited
> libpgtypes library is not worse than any other untrusted "user"
> library, so you really don't want to allow users to provide their own
> .so files, no matter by what mechanism.

Yes, I guess my concern is whether it could be used to “shadow” core libraries. 
Maybe it’s no different, really.

>> This would also allow the lookup of extension libraries prefixed by the 
>> directory field from the control file, which would enable much tidier 
>> extension installation: The control file, SQL scripts, and DSOs could all be 
>> in a single directory for an extension.
> 
> Nice idea, but that would mean installing .so files into PGSHAREDIR.
> Perhaps the whole extension stuff would have to move to PKGLIBDIR
> instead.

Yes, I was just poking around the code, and realized that, when extension 
functions are created they may or may not not use `MODULE_PATHNAME`, but in any 
event, there is nothing different about loading an extension DSO than any other 
DSO. I was hoping to find a path where it knows it’s opening a DSO for the 
purpose of an extension, so we could limit the lookup there. But that does not 
(currently) exist.

Maybe we could add an `$extensiondir` variable to complement `$libdir`?

Or is PGKLIBDIR is the way to go? I’m not familiar with it. It looks like 
extension JIT files are put there already.

> Fwiw, I wrote this patch to solve the problem of testing extensions at
> build-time where the build process does not have write access to
> PGSHAREDIR. It solves that problem quite well, almost all PG
> extensions have build-time test coverage now (where there was
> basically 0 before).

Yeah, good additional use case.

On Apr 3, 2024, at 1:03 PM, Christoph Berg  wrote:

> Also, it's called extension "destdir" because it behaves like DESTDIR
> in Makefiles: It prepends the given path to the path that PG is trying
> to open when set. So it doesn't allow arbitrary new locations as of
> now, just /home/build/foo-1/debian/foo/usr/share/postgresql/17/extension
> in addition to /usr/share/postgresql/17/extension. (That is what the
> Debian package build process needs, so that restriction/design choice
> made sense.

Right, this makes perfect sense, in that you don’t have to copy all the 
extension files from the destdir to the SHAREDIR to test them, which I imagine 
could be a PITA.

> That's also included in the current GUC description:
> 
>   This directory is prepended to paths when loading extensions
>   (control and SQL files), and to the '$libdir' directive when
>   loading modules that back functions. The location is made
>   configurable to allow build-time testing of extensions that do not
>   have been installed to their proper location yet.
> 
> Perhaps I should have included a more verbose "NOT FOR PRODUCTION"
> there.

The use cases I described upthread are very much production use cases. Do you 
think it’s not for production just because we need to really think it through?

I’ve added some docs based on your GUC description; updated patch attached.

Best,

David



v2-0001-Add-extension_directory-GUC.patch
Description: Binary data


Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 8:54 AM, David E. Wheeler  wrote:

> Yes, I like the suggestion to make it require a restart, which lets the 
> sysadmin control it and not limited to whatever the person who compiled it 
> thought would make sense.

Here’s a revision of the Debian patch that requires a server start.

However, in studying the patch, it appears that the `extension_directory` is 
searched for *all* shared libraries, not just those being loaded for an 
extension. Am I reading the `expand_dynamic_library_name()` function right?

If so, this seems like a good way for a bad actor to muck with things, by 
putting an exploited libpgtypes library into the extension directory, where it 
would be loaded in preference to the core libpgtypes library, if they couldn’t 
exploit the original.

I’m thinking it would be better to have the dynamic library lookup for 
extension libraries (and LOAD libraries?) separate, so that the 
`extension_directory` would not be used for core libraries.

This would also allow the lookup of extension libraries prefixed by the 
directory field from the control file, which would enable much tidier extension 
installation: The control file, SQL scripts, and DSOs could all be in a single 
directory for an extension.

Thoughts?

Best,

David




v1-0001-Add-extension_directory-GUC.patch
Description: Binary data


Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 8:54 AM, David E. Wheeler  wrote:

> Yes, I like the suggestion to make it require a restart, which lets the 
> sysadmin control it and not limited to whatever the person who compiled it 
> thought would make sense.

Or SIGHUP?

D



Re: RFC: Additional Directory for Extensions

2024-04-03 Thread David E. Wheeler
On Apr 3, 2024, at 3:57 AM, walt...@technowledgy.de wrote:

> I can also imagine that it would be very helpful in a container setup to be 
> able to set an environment variable with this path instead of having to 
> recompile all of postgres to change it.

Yes, I like the suggestion to make it require a restart, which lets the 
sysadmin control it and not limited to whatever the person who compiled it 
thought would make sense.

Best,

David





RFC: Additional Directory for Extensions

2024-04-02 Thread David E. Wheeler
Hackers,

In the Security lessons from liblzma thread[1], walther broached the subject of 
an extension directory path[1]:

> Also a configurable directoy to look up extensions, possibly even to be 
> changed at run-time like [2]. The patch says this:
> 
>> This directory is prepended to paths when loading extensions (control and 
>> SQL files), and to the '$libdir' directive when loading modules that back 
>> functions. The location is made configurable to allow build-time testing of 
>> extensions that do not have been installed to their proper location yet.
> 
> This seems like a great thing to have. This might also be relevant in 
> light of recent discussions in the ecosystem around extension management.


That quotation comes from this Debian patch[2] maintained by Christoph Berg. 
I’d like to formally propose integrating this patch into the core. And not only 
because it’s overhead for package maintainers like Christoph, but because a 
number of use cases have emerged since we originally discussed something like 
this back in 2013[3]:

Docker Immutability
---

Docker images must be immutable. In order for users of a Docker image to 
install extensions that persist, they must create a persistent volume, map it 
to SHAREDIR/extensions, and copy over all the core extensions (or muck with 
symlink magic[4]). This makes upgrades trickier, because the core extensions 
are mixed in with third party extensions. 

By supporting a second directory pretended to the list of directories to 
search, as the Debian patch does, users of Docker images can keep extensions 
they install separate from core extensions, in a directory mounted to a 
persistent volume with none of the core extensions. Along with tweaking 
dynamic_library_path to support additional directories for shared object 
libraries, which can also be mounted to a separate path, we can have a 
persistent and clean separation of immutable core extensions and extensions 
installed at runtime.

Postgres.app


The Postgres.app project also supports installing extensions. However, because 
they must go into the SHAREDIR/extensions, once a user installs one the package 
has been modified and the Apple bundle signature will be broken. The OS will no 
longer be able to validate that the app is legit.

If the core supported an additional extension (and PKGLIBDIR), it would allow 
an immutable PostgreSQL base package and still allow extensions to be installed 
into directories outside the app bundle, and thus preserve bundle signing on 
macOS (and presumably other systems --- is this the nix issue, too?)

RFC
---

I know there was some objection to changes like this in the past, but the 
support I’m seeing in the liblzma thread for making pkglibdir configurable  me 
optimistic that this might be the right time to support additional 
configuration for the extension directory, as well, starting with the Debian 
patch, perhaps.

Thoughts?

I would be happy to submit a clean version of the Debian patch[2].

Best,

David

[1] 
https://www.postgresql.org/message-id/99c41b46-616e-49d0-9ffd-a29432cec818%40technowledgy.de
[2] 
https://salsa.debian.org/postgresql/postgresql/-/blob/17/debian/patches/extension_destdir?ref_type=heads
[3] https://www.postgresql.org/message-id/flat/51ae0845.8010...@ocharles.org.uk
[4] https://speakerdeck.com/ongres/postgres-extensions-in-kubernetes?slide=14





Re: Security lessons from liblzma

2024-04-01 Thread David E. Wheeler
On Apr 1, 2024, at 06:55, walt...@technowledgy.de wrote:

> Also a configurable directoy to look up extensions, possibly even to be 
> changed at run-time like [2]. The patch says this:
> 
>> This directory is prepended to paths when loading extensions (control and 
>> SQL files), and to the '$libdir' directive when loading modules that back 
>> functions. The location is made configurable to allow build-time testing of 
>> extensions that do not have been installed to their proper location yet.
> 
> This seems like a great thing to have. This might also be relevant in light 
> of recent discussions in the ecosystem around extension management.
> 
> All the path-related issues have in common, that while it's easy to move 
> files around to their proper locations later, they all need to adjust 
> pg_config's output.

Funny timing, I was planning to resurrect this old patch[1] and propose that 
patch this week. One of motivators is the increasing use of Docker images in 
Kubernetes to run Postgres, where there’s a desire to keep the core service and 
extensions immutable, and to have a second directory mounted to a persistent 
volume into which other extensions can be installed and preserved independently 
of the Docker image.

The current approach involves symlinking shenanigans[2] that complicate things 
pretty substantially, making it more difficult to administer. A second 
directory fit for purpose would be far better.

There are some other motivators, so I’ll do some additional diligence and start 
a separate thread (or reply to the original[3]).

Best,

David

[1] https://commitfest.postgresql.org/5/170/
[2] https://speakerdeck.com/ongres/postgres-extensions-in-kubernetes?slide=14
[3] 
https://www.postgresql.org/message-id/flat/51AE0845.8010600%40ocharles.org.uk





Re: Patch: Add parse_type Function

2024-03-20 Thread David E. Wheeler
On Mar 20, 2024, at 17:23, Tom Lane  wrote:

> Pushed with some editorialization.  Mostly, I whacked the
> documentation around pretty heavily: we have a convention for what
> examples in function descriptions should look like, and this wasn't
> it.  Not entirely your fault, since some nearby entries in that
> table hadn't gotten the word either.

Ah, great, and your wording on the parser error issue is much better, thank you!

Best,

David





Re: Q: Escapes in jsonpath Idents

2024-03-19 Thread David E. Wheeler
On Mar 17, 2024, at 20:09, Erik Wienhold  wrote:
> 
> On 2024-03-17 20:50 +0100, David E. Wheeler wrote:
>> On Mar 17, 2024, at 15:12, Erik Wienhold  wrote:
>>> So I think it makes sense to reword the entire backslash part of the
>>> paragraph and remove references to JSON entirely.  The attached patch
>>> does that and also formats the backslash escapes as a bulleted list for
>>> readability.
>> 
>> Ah, it’s JavaScript format, not JSON! This does clarify things quite
>> nicely, thank you. Happy to add my review once it’s in a commit fest.
> 
> Thanks.  https://commitfest.postgresql.org/48/4899/

Applies cleanly, `make -C doc/src/sgml check` runs without error. Doc 
improvement welcome and much clearer than before.

> I had the same reasoning while writing my first reply but scrapped that
> part because I found it obvious:  That jsonpath is parsed before calling
> jsonb_path_exists() and therefore the parser has no context about any
> variables, which might not even be hardcoded but may result from a
> query.

Right, there’s a chicken/egg problem.

> Unfortunately, I don't have access to that part of the SQL spec.  So I
> don't know how the jsonpath grammar is specified.

Seems quite logical; I think it should be documented, but I’d also be 
interested to know what the 2016 and 2023 standards say, exactly.

> Also checked git log src/backend/utils/adt/jsonpath_scan.l for some
> insights but haven't found any yet.

Everybody’s taking shortcuts relative to the standard, AFAICT. For example, 
jsonpath_scan.l matches unqouted identifiers with these two regular expressions:

{other}+
\/\*
\\.

Plus the backslash escapes. {other} is defined as:

/* "other" means anything that's not special, blank, or '\' or '"' */
other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f]

Which is wy more liberal than the ECMA standard[1], by my reading, but the 
MSDN[2] description is quite succinct (thanks for the links!):

> In JavaScript, identifiers are commonly made of alphanumeric characters, 
> underscores (_), and dollar signs ($). Identifiers are not allowed to start 
> with numbers. However, JavaScript identifiers are not only limited to ASCII — 
> many Unicode code points are allowed as well. Namely, any character in the 
> ID_Start category can start an identifier, while any character in the 
> ID_Continue category can appear after the first character.


ID_Start[3] and ID_Continue[4] point to the unicode standard codes lister, 
nether of which reference Emoji. Sure enough, in Safari:

> x = {"": true}
< {: true}
> x.
< SyntaxError: Invalid character '\ud83c’

But in Postgres jsonpath:

david=# select '$.'::jsonpath;
 jsonpath 
--
 $.""

If the MSDN references to ID_Start and ID_Continue are correct, then the 
Postgres path parser is being overly-liberal. Maybe that’s totally fine? Not 
sure what should be documented and what’s not worth it.

Aside: I’m only digging into these details because I’m busy porting the path 
parser, so trying to figure out where to be compatible and where not to. So far 
I’m rejecting '$' (but allowing '\$' and '\u0024') but taking advantage of the 
unicode support in Go to specifically validate against ID_Start and ID_Continue.

Best,

David

[1] https://262.ecma-international.org/#sec-identifier-names
[2] 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers
[3] 
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Start%7D
[4] 
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Continue%7D











Re: Q: Escapes in jsonpath Idents

2024-03-17 Thread David E. Wheeler
On Mar 17, 2024, at 15:12, Erik Wienhold  wrote:

> Hi David,

Hey Erik. Thanks for the detailed reply and patch!

> So I think it makes sense to reword the entire backslash part of the
> paragraph and remove references to JSON entirely.  The attached patch
> does that and also formats the backslash escapes as a bulleted list for
> readability.

Ah, it’s JavaScript format, not JSON! This does clarify things quite nicely, 
thank you. Happy to add my review once it’s in a commit fest.

> The first case ($.$foo) is in line with the restriction on member
> accessors that you quoted first.

Huh, that’s now how I read it. Here it is again:

>> Member accessor that returns an object member with the specified
>> key. If the key name matches some named variable starting with $ or
>> does not meet the JavaScript rules for an identifier, it must be
>> enclosed in double quotes to make it a string literal.


Note that in my example `$foo` does not match a variable. I mean it looks like 
a variable, but none is used here. I guess it’s being conservative because it 
might be used in one of the functions, like jsonb_path_exists(), to which 
variables might be passed.

> The error message 'syntax error at or near "$oo" of jsonpath input' for
> the second case ($.f$oo), however, looks as if the scanner identifies
> '$oo' as a variable instead of contiuing the scan of identifier (f$oo)
> for the member accessor.  Looks like a bug to me because a variable
> doesn't even make sense in that place.

Right. Maybe the docs should be updated to say that a literal dollar sign isn’t 
supported in identifiers, unlike in JavaScript, except through escapes like 
this:

> What works though, besides double quoting, is escaping the dollar sign:
> 
> regress=# select '$.\u0024foo'::jsonpath;
> jsonpath
> --
> $."$foo"
> (1 row)
> 
> And we've come full circle :)



Best,

David






Re: Q: Escapes in jsonpath Idents

2024-03-16 Thread David E. Wheeler
On Mar 16, 2024, at 14:39, David E. Wheeler  wrote:

> I went looking for the JavaScript rules for an identifier and found this in 
> the MDN docs[2]:
> 
>> In JavaScript, identifiers can contain Unicode letters, $, _, and digits 
>> (0-9), but may not start with a digit. An identifier differs from a string 
>> in that a string is data, while an identifier is part of the code. In 
>> JavaScript, there is no way to convert identifiers to strings, but sometimes 
>> it is possible to parse strings into identifiers.

Coda: Dollar signs don’t work at all outside double-quoted string identifiers:

david=# select '$.$foo'::jsonpath;
ERROR:  syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
   ^

david=# select '$.f$oo'::jsonpath;
ERROR:  syntax error at or near "$oo" of jsonpath input
LINE 1: select '$.f$oo'::jsonpath;
   ^

david=# select '$."$foo"'::jsonpath;
 jsonpath 
--
 $."$foo"

This, too, contradicts the MDM definition an identifier (and some quick browser 
tests).

Best,

David





Q: Escapes in jsonpath Idents

2024-03-16 Thread David E. Wheeler
Hackers,

The jsonpath doc[1] has an excellent description of the format of strings, but 
for unquoted path keys, it simply says:

> Member accessor that returns an object member with the specified key. If the 
> key name matches some named variable starting with $ or does not meet the 
> JavaScript rules for an identifier, it must be enclosed in double quotes to 
> make it a string literal.

I went looking for the JavaScript rules for an identifier and found this in the 
MDN docs[2]:

> In JavaScript, identifiers can contain Unicode letters, $, _, and digits 
> (0-9), but may not start with a digit. An identifier differs from a string in 
> that a string is data, while an identifier is part of the code. In 
> JavaScript, there is no way to convert identifiers to strings, but sometimes 
> it is possible to parse strings into identifiers.


However, the Postgres parsing of jsonpath keys appears to follow the same rules 
as strings, allowing backslash escapes:

david=# select '$.fo\u00f8 == $x'::jsonpath;
 jsonpath   ---
 ($."foø" == $"x")

This would seem to contradict the documentation. Is this behavior required by 
the SQL standard? Do the docs need updating? Or should the code actually follow 
the JSON identifier behavior?

Thanks,

David

PS: Those excellent docs on strings mentions support for \v, but the grammar in 
the right nav of https://www.json.org/json-en.html does not. Another bonus 
feature?

[1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH
[2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier





Re: Patch: Add parse_type Function

2024-03-08 Thread David E. Wheeler
On Mar 7, 2024, at 23:39, Erik Wienhold  wrote:

> I think you need to swap the examples.  The text mentions the error case
> first and the NULL case second.

臘‍♂️

Thanks, fixed in the attached patch.

David



v11-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-03-07 Thread David E. Wheeler
Hello Hackers,

On Feb 25, 2024, at 13:00, David E. Wheeler  wrote:

>> postgres=# SELECT to_regtypemod('timestamp(-4)');
>> ERROR:  syntax error at or near "-"
>> LINE 1: SELECT to_regtypemod('timestamp(-4)');
>>  ^
>> CONTEXT:  invalid type name "timestamp(-4)"
>> 
>> postgres=# SELECT to_regtypemod('text(-4)');
>> ERROR:  type modifier is not allowed for type "text"
> 
> Yeah, there was quite a bit of discussion of this issue back in September[1].
> 
>> This behaviour is mentioned in the documentation, so I'd say it is ok.
> 
> This is my attempt to make it clearer that it can return an error, but I 
> don’t love the wording TBH.

I’ve rebased the patch and, in an attempt to clarify this behavior, added a 
couple of examples to the docs for to_regtype. Updated patch attached.

Best,

David



v10-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: An improved README experience for PostgreSQL

2024-02-28 Thread David E. Wheeler
On Feb 28, 2024, at 1:51 PM, Alvaro Herrera  wrote:
 
> *IF* people don't go overboard, yes.  I agree, but let's keep an eye so
> that it doesn't become an unreadable mess.  I've seen some really
> horrible markdown files that I'm sure most of you would object to.

Markdown++

IME the keys to decent-looking Markdown are:

1. Wrapping lines to a legible width (76-80 chars)
2. Link references rather than inline links

I try to follow these for my blog; posts end up looking like this:

https://justatheory.com/2024/02/extension-metadata-typology.text

(Append `.text` to any post to see the raw(ish) Markdown.

Best,

David





Re: Patch: Add parse_type Function

2024-02-25 Thread David E. Wheeler
On Feb 24, 2024, at 19:11, Jim Jones  wrote:

>> What’s the protocol for marking a patch ready for committer?
> 
> I guess after the review of the last assigned reviewer

Oh, I didn’t realize someone was assigned. :-)

> The fact that a completely invalid type returns NULL ..
> 
> SELECT to_regtypemod('foo');
>  to_regtypemod
> ---
>   
> (1 row)
> 
> 
> .. but a "partially" valid one returns an error might be confusing
> 
> postgres=# SELECT to_regtypemod('timestamp(-4)');
> ERROR:  syntax error at or near "-"
> LINE 1: SELECT to_regtypemod('timestamp(-4)');
>   ^
> CONTEXT:  invalid type name "timestamp(-4)"
> 
> postgres=# SELECT to_regtypemod('text(-4)');
> ERROR:  type modifier is not allowed for type "text"

Yeah, there was quite a bit of discussion of this issue back in September[1].

> This behaviour is mentioned in the documentation, so I'd say it is ok.

This is my attempt to make it clearer that it can return an error, but I don’t 
love the wording TBH.

> I would personally prefer either NULL or an error in both cases, but I
> can totally live with the current design.

SAME.

Best,

David

[1] 
https://www.postgresql.org/message-id/flat/09f9cad6-5096-43cc-b6a7-685703e47...@justatheory.com





Re: Patch: Add parse_type Function

2024-02-24 Thread David E. Wheeler
On Feb 21, 2024, at 19:13, David E. Wheeler  wrote:

> Thanks. Anyone else? Or is it ready for committer?

What’s the protocol for marking a patch ready for committer?

Thanks,

David





Re: Patch: Add parse_type Function

2024-02-21 Thread David E. Wheeler
On Feb 21, 2024, at 17:19, Erik Wienhold  wrote:

> Thanks David!  LGTM.

Thanks. Anyone else? Or is it ready for committer?

Best,

David





Re: Patch: Add parse_type Function

2024-02-21 Thread David E. Wheeler
On Feb 21, 2024, at 16:43, Erik Wienhold  wrote:

> The docs still state that to_regtypemod() has a named parameter, which
> is not the case per pg_proc.dat.

Bah, I cleaned it up before but somehow put it back. Thanks for the catch. 
Fixed.

Best,

David



v9-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: to_regtype() Raises Error

2024-02-21 Thread David E. Wheeler
On Feb 21, 2024, at 11:54 AM, David Wheeler  wrote:

> Merged this change into the [to_regtypemod 
> patch](https://commitfest.postgresql.org/47/4807/), which has exactly the 
> same issue.
> 
> The new status of this patch is: Needs review

Bah, withdrawn.

D





Re: Patch: Add parse_type Function

2024-02-21 Thread David E. Wheeler
On Feb 21, 2024, at 11:18, Erik Wienhold  wrote:

> Thanks.  But it's an applefile again, not a patch :P

WTF. I still have that feature disabled.

Oh, I think I deleted the file after dragged it into Mail but before sending, 
because it’s empty everywhere I look. 臘‍♂️ Let’s try that again.

Best,

David


v8-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-21 Thread David E. Wheeler
On Feb 20, 2024, at 21:09, Erik Wienhold  wrote:

> The references are printed as "???" right now.  Can be fixed with
> xreflabel="format_type" and xreflabel="to_regtype" on those 
> elements.

Thanks.

> The docs show parameter name "type" but pg_proc.data does not define
> proargnames.  So the to_regtypemod() cannot be called using named
> notation:
> 
> test=> select to_regtypemod(type => 'int'::text);
> ERROR:  function to_regtypemod(type => text) does not exist

Yes, this format is identical to that of to_regtype():

david=# select to_regtype(type => 'int'::text);
ERROR:  function to_regtype(type => text) does not exist

> +Parses a string of text, extracts a potential type name from it, and
>> +translates its typmod, the modifier for the type, if any. Failure to
> 
> s/typmod, the modifier of the type/type modifier/
> 
> Because format_type() already uses "type modifier" and I think it helps
> searchability to use the same wording.

Yes, definitely better wording, thanks. V8 attached.

Best,

David



v8-0001-Add-to_regtypemod-SQL-function.patch
Description: application/applefile




Re: to_regtype() Raises Error

2024-02-20 Thread David E. Wheeler


On Feb 5, 2024, at 09:01, David E. Wheeler  wrote:

> Ah, thank you. Updated patch attached.

I’ve moved this patch into the to_regtype patch thread[1], since it exhibits 
the same behavior.

Best,

David

[1] 
https://www.postgresql.org/message-id/60ef4e11-bc1c-4034-b37b-695662d28...@justatheory.com





Re: Patch: Add parse_type Function

2024-02-20 Thread David E. Wheeler
On Feb 20, 2024, at 01:30, jian he  wrote:

> the second hint `-- grammar error expected` seems to contradict with
> the results?

Quite right, thank you, that’s actually a trapped error. I’ve tweaked the 
comments and their order in v7, attached.

This goes back to the discussion of the error raising of to_regtype[1], so I’ve 
incorporated the patch from that thread into this patch, and set up the docs 
for to_regtypemod() with similar information. The wording is still a little 
opaque for my taste, though, written more for someone who knows a bit about the 
internals, but it’s a start.

I’ve also fixed the wayward parameter in the function signature in the docs, 
and added a note about why I’ve also patched genbki.pl.

Best,

David

[1] 
https://www.postgresql.org/message-id/flat/57E1FDDC-5A38-452D-82D7-A44DA2E13862%40justatheory.com#1ae0b11634bc33c7ad3cd728e43d504e



v7-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-19 Thread David E. Wheeler
On Feb 19, 2024, at 21:58, Erik Wienhold  wrote:

> See the patch I wrote for my benchmarks.  But it's pretty easy anyway to
> cut down parse_type() ;)

LOL, I missed that, just wrote it myself in the last hour. :-) v6 attached.

> But you don't actually need reformat_type() in pgTAP.  You can just get
> the type OID and modifier of the want_type and have_type and compare
> those.  Then use format_type() for the error message.  Looks a bit
> cleaner to me than doing the string comparison.

Fair.

> On second thought, I guess comparing the reformatted type names is
> necessary in order to have a uniform API on older Postgres releases
> where pgTAP has to provide its own to_regtypmod() based on typmodin
> functions.

Maybe. Worth playing with.

>> For the latter, it could easily be an example in the docs.
> 
> Can be mentioned right under format_type().

Well I included it in the to_regtypemod docs here, but could so either.

Best,

David



v6-0001-Add-to_regtypemod-SQL-function.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-19 Thread David E. Wheeler
On Feb 19, 2024, at 15:47, Tom Lane  wrote:

>> 1. Add a to_regtypmod() for those who just want the typemod.
> 
> Seems like there's a good case for doing that.

I’ll work on that.

> I'm less thrilled about that, mainly because I can't think of
> a good name for it ("format_type_string" is certainly not that).
> Is the use-case for this functionality really strong enough that
> we need to provide it as a single function rather than something
> assembled out of spare parts?

Not essential for pgTAP, no, as we can just use the parts. It was the typmod 
bit that was missing.

On Feb 19, 2024, at 17:13, Tom Lane  wrote:

> After some time ruminating, a couple of possibilities occurred to
> me:
> reformat_type(text)
> canonical_type_name(text)

I was just thinking about hitting the thesaurus entry for “canonical”, but I 
quite like reformat_type. It’s super clear and draws the parallel to 
format_type() more clearly. Will likely steal the name for pgTAP if we don’t 
add it to core. :-)

> I'm still unconvinced about that, though.

I guess the questions are:

* What are the other use cases for it?
* How obvious is it how to do it?

For the latter, it could easily be an example in the docs.

Best,

David





Re: Patch: Add parse_type Function

2024-02-19 Thread David E. Wheeler
On Feb 18, 2024, at 15:55, Erik Wienhold  wrote:

>> The overhead of parse_type_and_format can be related to higher planning
>> time. PL/pgSQL can assign composite without usage FROM clause.
> 
> Thanks, didn't know that this makes a difference.  In that case both
> variants are on par.

Presumably this is a side-effect of the use of a RECORD here, which requires a 
FROM clause in pure SQL, yes?

The only way I can think of to avoid that is to:

1. Add a to_regtypmod() for those who just want the typemod.

2. Add a format_type_string() function that returns a string, the equivalent of 
this function but in C:

CREATE FUNCTION format_type_string(text) RETURNS TEXT AS $$
   SELECT format_type(to_regtype($1), to_regtypmod($1))
$$;

We could also keep the record-returning function for users who want both the 
regtype and the regytypemod at once, but with the other two I consider it less 
essential.

Thoughts?

Best,

David





  1   2   >