Re: pgsql: Add more SQL/JSON constructor functions

2024-05-28 Thread Amit Langote
Hi Alvaro,

On Mon, May 27, 2024 at 7:10 PM Alvaro Herrera  wrote:
>
> On 2024-May-27, Alvaro Herrera wrote:
>
> > > JSON_SERIALIZE()
>
> I just noticed this behavior, which looks like a bug to me:
>
> select json_serialize('{"a":1, "a":2}' returning varchar(5));
>  json_serialize
> 
>  {"a":
>
> I think this function should throw an error if the destination type
> doesn't have room for the output json.  Otherwise, what good is the
> serialization function?

I remember using the reasoning mentioned by David G when testing
json_query() et al with varchar(n), so you get:

select json_query('{"a":1, "a":2}', '$' returning varchar(5));
 json_query

 {"a":
(1 row)

which is the same as:

select '{"a":1, "a":2}'::varchar(5);
 varchar
-
 {"a":
(1 row)

Also,

select json_value('{"a":"abcdef"}', '$.a' returning varchar(5) error on error);
 json_value

 abcde
(1 row)

This behavior comes from using COERCE_EXPLICIT_CAST when creating the
coercion expression to convert json_*() functions' argument to the
RETURNING type.

-- 
Thanks, Amit Langote




Re: First draft of PG 17 release notes

2024-05-20 Thread Amit Langote
On Thu, May 9, 2024 at 1:04 PM Bruce Momjian  wrote:
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html
>
> It will be improved until the final release.  The item count is 188,
> which is similar to recent releases:
>
> release-10:  189
> release-11:  170
> release-12:  180
> release-13:  178
> release-14:  220
> release-15:  184
> release-16:  206
> release-17:  188
>
> I welcome feedback.  For some reason it was an easier job than usual.

Thanks Bruce for working on this as always.

Failed to notice when I read the notes before:



Add SQL/JSON constructor functions JSON(), JSON_SCALAR(), and
JSON_SERIALIZE() (Amit Langote)



Should be:



Add SQL/JSON constructor functions JSON(), JSON_SCALAR(), and
JSON_SERIALIZE() (Nikita Glukhov, Teodor Sigaev, Oleg Bartunov,
Alexander Korotkov, Andrew Dunstan, Amit Langote)



Patch attached.

-- 
Thanks, Amit Langote


sql-json-credits.patch
Description: Binary data


Re: generic plans and "initial" pruning

2024-05-20 Thread Amit Langote
On Sun, May 19, 2024 at 9:39 AM David Rowley  wrote:
> For #1, the locks taken for SELECT queries are less likely to conflict
> with other locks obtained by PostgreSQL, but at least at the moment if
> someone is getting deadlocks with a DDL type operation, they can
> change their query or DDL script so that locks are taken in the same
> order.  If we allowed executor startup to do this then if someone
> comes complaining that PG18 deadlocks when PG17 didn't we'd just have
> to tell them to live with it.  There's a comment at the bottom of
> find_inheritance_children_extended() just above the qsort() which
> explains about the deadlocking issue.

Thought to chime in on this.

A deadlock may occur with the execution-time locking proposed in the
patch if the DDL script makes assumptions about how a cached plan's
execution determines the locking order for children of multiple parent
relations. Specifically, the deadlock can happen if the script tries
to lock the child relations directly, instead of locking them through
their respective parent relations.  The patch doesn't change the order
of locking of relations mentioned in the query, because that's defined
in AcquirePlannerLocks().

--
Thanks, Amit Langote




Re: remaining sql/json patches

2024-05-20 Thread Amit Langote
Hi Thom,

On Thu, May 16, 2024 at 8:50 AM Thom Brown  wrote:
> On Mon, 8 Apr 2024 at 10:09, Amit Langote  wrote:
>>
>> On Mon, Apr 8, 2024 at 2:02 PM jian he  wrote:
>> > On Mon, Apr 8, 2024 at 11:21 AM jian he  
>> > wrote:
>> > >
>> > > On Mon, Apr 8, 2024 at 12:34 AM jian he  
>> > > wrote:
>> > > >
>> > > > On Sun, Apr 7, 2024 at 9:36 PM Amit Langote  
>> > > > wrote:
>> > > > > 0002 needs an expanded commit message but I've run out of energy 
>> > > > > today.
>> > > > >
>> > >
>> > > other than that, it looks good to me.
>> >
>> > one more tiny issue.
>> > +EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view1;
>> > +ERROR:  relation "jsonb_table_view1" does not exist
>> > +LINE 1: EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view1...
>> > +   ^
>> > maybe you want
>> > EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view7;
>> > at the end of the sqljson_jsontable.sql.
>> > I guess it will be fine, but the format json explain's out is quite big.
>> >
>> > you also need to `drop table s cascade;` at the end of the test?
>>
>> Pushed after fixing this and other issues.  Thanks a lot for your
>> careful reviews.
>>
>> I've marked the CF entry for this as committed now:
>> https://commitfest.postgresql.org/47/4377/
>>
>> Let's work on the remaining PLAN clause with a new entry in the next
>> CF, possibly in a new email thread.
>
>
> I've just taken a look at the doc changes,

Thanks for taking a look.

> and I think we need to either remove the leading "select" keyword, or 
> uppercase it in the examples.
>
> For example (on 
> https://www.postgresql.org/docs/devel/functions-json.html#SQLJSON-QUERY-FUNCTIONS):
>
> json_exists ( context_item, path_expression [ PASSING { value AS varname } [, 
> ...]] [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ])
>
> Returns true if the SQL/JSON path_expression applied to the context_item 
> using the PASSING values yields any items.
> The ON ERROR clause specifies the behavior if an error occurs; the default is 
> to return the boolean FALSE value. Note that if the path_expression is strict 
> and ON ERROR behavior is ERROR, an error is generated if it yields no items.
> Examples:
> select json_exists(jsonb '{"key1": [1,2,3]}', 'strict $.key1[*] ? (@ > 2)') → 
> t
> select json_exists(jsonb '{"a": [1,2,3]}', 'lax $.a[5]' ERROR ON ERROR) → f
> select json_exists(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' ERROR ON ERROR) →
> ERROR:  jsonpath array subscript is out of bounds
>
> Examples are more difficult to read when keywords appear to be at the same 
> level as predicates.  Plus other examples within tables on the same page 
> don't start with "select", and further down, block examples uppercase 
> keywords.  Either way, I don't like it as it is.

I agree that the leading SELECT should be removed from these examples.
Also, the function names should be capitalized both in the syntax
description and in the examples, even though other functions appearing
on this page aren't.

> Separate from this, I think these tables don't scan well (see json_query for 
> an example of what I'm referring to).  There is no clear separation of the 
> syntax definition, the description, and the example. This is more a matter 
> for the website mailing list, but I'm expressing it here to check whether 
> others agree.

Hmm, yes, I think I forgot to put  around the syntax like
it's done for a few other functions listed on the page.

How about the attached?  Other than the above points, it removes the
 tags from the description text of each function to turn it into
a single paragraph, because the multi-paragraph style only seems to
appear in this table and it's looking a bit weird now.  Though it's
also true that the functions in this table have the longest
descriptions.

-- 
Thanks, Amit Langote


v1-0001-Doc-some-stylistic-fixes-to-SQL-JSON-query-functi.patch
Description: Binary data


Re: New committers: Melanie Plageman, Richard Guo

2024-04-26 Thread Amit Langote
On Fri, Apr 26, 2024 at 8:54 PM Jonathan S. Katz  wrote:
> The Core Team would like to extend our congratulations to Melanie
> Plageman and Richard Guo, who have accepted invitations to become our
> newest PostgreSQL committers.
>
> Please join us in wishing them much success and few reverts!

Congratulations to both!

-- 
Thanks, Amit Langote




Re: sql/json remaining issue

2024-04-25 Thread Amit Langote
On Thu, Apr 18, 2024 at 9:33 AM Amit Langote  wrote:
> On Mon, Apr 15, 2024 at 9:46 PM Amit Langote  wrote:
> > On Sat, Apr 13, 2024 at 11:12 PM jian he  
> > wrote:
> > > On Fri, Apr 12, 2024 at 5:44 PM Amit Langote  
> > > wrote:
> > > >
> > > > > elog(ERROR, "unrecognized json wrapper %d", wrapper);
> > > > > should be
> > > > > elog(ERROR, "unrecognized json wrapper %d", (int) wrapper);
> > > >
> > > > Fixed in 0003.
> > > >
> > > the fix seems not in 0003?
> > > other than that, everything looks fine.
>
> Oops, really fixed now in 0002.
>
> > I've combined these patches into one -- attached 0001.  Will push tomorrow.
>
> Decided to break the error message improvement patch into its own
> after all -- attached 0001.
>
> > Now studying the JsonBehavior DEFAULT expression issue and your patch.
>
> I found some more coercion-related expression nodes that must also be
> checked along with CoerceViaIO and CoerceToDomain.  Also, after fixing
> the code to allow them, I found that we'd need to also check
> recursively whether their argument expression is also one of the
> supported expression nodes.  Also, I decided that it's not necessary
> to include "cast" in the error message as one of the supported
> expressions.
>
> Will push all today.

Totally forgot to drop a note here that I pushed those and marked the
2 open items as resolved.

-- 
Thanks, Amit Langote




Re: sql/json remaining issue

2024-04-17 Thread Amit Langote
On Mon, Apr 15, 2024 at 9:46 PM Amit Langote  wrote:
> On Sat, Apr 13, 2024 at 11:12 PM jian he  wrote:
> > On Fri, Apr 12, 2024 at 5:44 PM Amit Langote  
> > wrote:
> > >
> > > > elog(ERROR, "unrecognized json wrapper %d", wrapper);
> > > > should be
> > > > elog(ERROR, "unrecognized json wrapper %d", (int) wrapper);
> > >
> > > Fixed in 0003.
> > >
> > the fix seems not in 0003?
> > other than that, everything looks fine.

Oops, really fixed now in 0002.

> I've combined these patches into one -- attached 0001.  Will push tomorrow.

Decided to break the error message improvement patch into its own
after all -- attached 0001.

> Now studying the JsonBehavior DEFAULT expression issue and your patch.

I found some more coercion-related expression nodes that must also be
checked along with CoerceViaIO and CoerceToDomain.  Also, after fixing
the code to allow them, I found that we'd need to also check
recursively whether their argument expression is also one of the
supported expression nodes.  Also, I decided that it's not necessary
to include "cast" in the error message as one of the supported
expressions.

Will push all today.

-- 
Thanks, Amit Langote


v5-0002-SQL-JSON-Miscellaneous-fixes-and-improvements.patch
Description: Binary data


v5-0001-SQL-JSON-Improve-some-error-messages.patch
Description: Binary data


v5-0003-SQL-JSON-Fix-issues-with-DEFAULT-.-ON-ERROR-EMPTY.patch
Description: Binary data


Re: sql/json remaining issue

2024-04-15 Thread Amit Langote
Hi,

On Sat, Apr 13, 2024 at 11:12 PM jian he  wrote:
> On Fri, Apr 12, 2024 at 5:44 PM Amit Langote  wrote:
> >
> > > elog(ERROR, "unrecognized json wrapper %d", wrapper);
> > > should be
> > > elog(ERROR, "unrecognized json wrapper %d", (int) wrapper);
> >
> > Fixed in 0003.
> >
> the fix seems not in 0003?
> other than that, everything looks fine.
>
>
> 
> SELECT * FROM JSON_TABLE (
> '{"favorites":
> {"movies":
>   [{"name": "One", "director": "John Doe"},
>{"name": "Two", "director": "Don Joe"}],
>  "books":
>   [{"name": "Mystery", "authors": [{"name": "Brown Dan"}]},
>{"name": "Wonder", "authors": [{"name": "Jun Murakami"},
> {"name":"Craig Doe"}]}]
> }}'::json, '$.favs[*]'
> COLUMNS (user_id FOR ORDINALITY,
>   NESTED '$.movies[*]'
> COLUMNS (
> movie_id FOR ORDINALITY,
> mname text PATH '$.name',
> director text),
>   NESTED '$.books[*]'
> COLUMNS (
>   book_id FOR ORDINALITY,
>   bname text PATH '$.name',
>   NESTED '$.authors[*]'
> COLUMNS (
>   author_id FOR ORDINALITY,
>   author_name text PATH '$.name';
> 
>
> I actually did run the query, it returns null.
> '$.favs[*]'
> should be
> '$.favorites[*]'

Oops, fixed.

I've combined these patches into one -- attached 0001.  Will push tomorrow.

> one more minor thing, I previously mentioned in getJsonPathVariable
> ereport(ERROR,
> (errcode(ERRCODE_UNDEFINED_OBJECT),
> errmsg("could not find jsonpath variable \"%s\"",
> pnstrdup(varName, varNameLength;
>
> do we need to remove pnstrdup?

Looking at this again, it seems like that's necessary because varName,
being a string extracted from JsonPathItem, is not necessarily
null-terminated.  There are many pndstrdup()s in jsonpath_exec.c
because of that aspect.

Now studying the JsonBehavior DEFAULT expression issue and your patch.

-- 
Thanks, Amit Langote


v4-0001-SQL-JSON-Miscellaneous-fixes-and-improvements.patch
Description: Binary data


Re: sql/json remaining issue

2024-04-12 Thread Amit Langote
On Thu, Apr 11, 2024 at 12:02 PM jian he  wrote:
> On Wed, Apr 10, 2024 at 4:39 PM Amit Langote  wrote:
> > Attached is a bit more polished version of that, which also addresses
> > the error messages in JsonPathQuery() and JsonPathValue().  I noticed
> > that there was comment I had written at one point during JSON_TABLE()
> > hacking that said that we should be doing this.
> >
> > I've also added an open item for this.
>
> `
>  | NESTED [ PATH ] json_path_specification [ AS json_path_name ]
> COLUMNS ( json_table_column [, ...] )
> NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS
> ( json_table_column [, ...] )
> `
>  "json_path_specification" should be "path_expression"?

Fixed in 0002.

> your explanation about memory usage is clear to me!
>
>
> The following are minor cosmetic issues while applying v2.
> +errmsg("JSON path expression in JSON_VALUE should return singleton
> scalar item")));
> "singleton" is not intuitive to me.
> Then I looked around.
> https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F=singleton
> There is only one appearance of "singleton" in the manual.

Yes, singleton is a term used a lot in the source code but let's keep
it out of error messages and docs.  So fixed.

> errhint("Use WITH WRAPPER clause to wrap SQL/JSON item sequence into 
> array.")));
> maybe
> errhint("Use WITH WRAPPER clause to wrap SQL/JSON items into array.")));
> or
> errhint("Use WITH WRAPPER clause to wrap SQL/JSON item sequences into
> array.")));

Changed to use "SQL/JSON items into array.".

> then I wonder what's the difference between
> 22038 ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED
> 2203F ERRCODE_SQL_JSON_SCALAR_REQUIRED
>
> i assume '{"hello":"world"}'  is a singleton, but not a scalar item?
> if so, then I think the error message within the "if (count > 1)"
> branch in JsonPathValue
> should use ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED
> within the "if (!IsAJsonbScalar(res))" branch should use
> ERRCODE_SQL_JSON_SCALAR_REQUIRED
> ?
> + if (column_name)
> + ereport(ERROR,
> + (errcode(ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM),
> + errmsg("JSON path expression for column \"%s\" should return
> singleton scalar item",
> + column_name)));
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_SQL_JSON_SCALAR_REQUIRED),
> + errmsg("JSON path expression in JSON_VALUE should return singleton
> scalar item")));
> the error message seems similar, but the error code is different?
> both within "if (count > 1)" and "if (!IsAJsonbScalar(res))" branch.

Using different error codes for the same error is a copy-paste mistake
on my part.  Fixed.

> in src/include/utils/jsonpath.h, comments
> /* SQL/JSON item */
> should be
> /* SQL/JSON query functions */
>
>
> elog(ERROR, "unrecognized json wrapper %d", wrapper);
> should be
> elog(ERROR, "unrecognized json wrapper %d", (int) wrapper);

Fixed in 0003.

-- 
Thanks, Amit Langote


v3-0001-JSON_TABLE-mention-column-name-in-some-error-mess.patch
Description: Binary data


v3-0002-JSON_TABLE-Use-term-path_expression-more-consiste.patch
Description: Binary data


v3-0003-SQL-JSON-Fix-some-comments-in-jsonpath.h.patch
Description: Binary data


Re: sql/json remaining issue

2024-04-10 Thread Amit Langote
On Tue, Apr 9, 2024 at 8:37 PM Amit Langote  wrote:
> On Tue, Apr 9, 2024 at 4:47 PM jian he  wrote:
> > the last query error message is:
> > `
> > ERROR:  no SQL/JSON item
> > `
> >
> > we are in ExecEvalJsonExprPath, can we output it to be:
> > `
> > ERROR:  after applying json_path "5s", no SQL/JSON item found
> > `
> > in a json_table query, we can have multiple path_expressions, like the
> > above query.
> > it's not easy to know applying which path_expression failed.
>
> Hmm, I'm not so sure about mentioning the details of the path because
> path names are optional and printing path expression itself is not a
> good idea.  Perhaps, we could mention the column name which would
> always be there, but we'd then need to add a new field column_name
> that's optionally set to JsonFuncExpr and JsonExpr, that is, when they
> are being set up for JSON_TABLE() columns.  As shown in the attached.
> With the patch you'll get:
>
> ERROR:  no SQL/JSON item found for column "b"

Attached is a bit more polished version of that, which also addresses
the error messages in JsonPathQuery() and JsonPathValue().  I noticed
that there was comment I had written at one point during JSON_TABLE()
hacking that said that we should be doing this.

I've also added an open item for this.

-- 
Thanks, Amit Langote


v2-0001-JSON_TABLE-mention-column-name-in-some-error-mess.patch
Description: Binary data


Re: sql/json remaining issue

2024-04-09 Thread Amit Langote
Hi,

On Tue, Apr 9, 2024 at 4:47 PM jian he  wrote:
>
> hi.
> `
>  | NESTED [ PATH ] json_path_specification [ AS json_path_name ]
> COLUMNS ( json_table_column [, ...] )
> NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS
> ( json_table_column [, ...] )
> `
>  "json_path_specification" should be "path_expression"?
> --
> drop table s1;
> create or replace function random_text_1000() returns text
> as $$select string_agg(md5(random()::text),'') from
> generate_Series(1,1000) s $$ LANGUAGE SQL;
>
> create unlogged table s1(a int GENERATED BY DEFAULT AS IDENTITY, js jsonb);
> insert into s1(js)
> select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g
> || ']},{"z22": [32, 204,145]}]},"c": ' || g
> || ',"id": "' || random_text_1000() || '"}')
> from generate_series(1_000_000, 1_000_000) g;
> insert into s1(js)
> select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g
> || ']},{"z22": [32, 204,145]}]},"c": ' || g
> || ',"id": "' || random_text_1000() || '"}')
> from generate_series(235, 235 + 20,1) g;
>
> select count(*), pg_size_pretty(pg_total_relation_size('s1')) from s1;
>  count  | pg_size_pretty
> +
>  22 | 6398 MB
> --
>
> explain(analyze, costs off,buffers, timing)
> SELECT sub.*, s.a as s_a FROM s,
> (values(23)) x(x),
> generate_series(13, 13) y,
> JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y
> COLUMNS (
> xx1 int PATH '$.c',
> NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS
> (c int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'),
> NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x"
> -2))' as z1 COLUMNS (a int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS
> (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int
> PATH '$ ? (@ <= ($"x" + 76))' default -1000 ON EMPTY))
> )) sub;
>
> for one jsonb, it can expand to 7 rows, the above query will return
> around 1.4 million rows.
> i use the above query, and pg_log_backend_memory_contexts in another
> session to check the memory usage.
> didn't find memory over consumed issue.
>
> but I am not sure this is the right way to check the memory consumption.
> --
> begin;
> SELECT sub.*, s.a as s_a FROM s,
> (values(23)) x(x),
> generate_series(13, 13) y,
> JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y
> COLUMNS (
> xx1 int PATH '$.c',
> NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS
> (c int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'),
> NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x"
> -2))' as z1 COLUMNS (a int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS
> (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int
> PATH '$ ? (@ <= ($"x" + 76))' error ON EMPTY))
> )) sub;
> rollback;
>
> only the last row will fail, because of "error ON EMPTY", (1_000_000
> <= 23 + 76) is false.
> I remember the very previous patch, because of error cleanup, it took
> a lot of resources.
> does our current implementation, only the very last row fail, will it
> be easy to clean up the transaction?

I am not sure I understand your concern.  Could you please rephrase
it?  Which previous patch are you referring to and what problem did it
cause with respect to error cleanup?

Per-row memory allocated for each successful output row JSON_TABLE()
doesn't pile up, because it's allocated in a context that is reset
after evaluating each row; see tfuncLoadRows().  But again I may be
misunderstanding your concern.

> the last query error message is:
> `
> ERROR:  no SQL/JSON item
> `
>
> we are in ExecEvalJsonExprPath, can we output it to be:
> `
> ERROR:  after applying json_path "5s", no SQL/JSON item found
> `
> in a json_table query, we can have multiple path_expressions, like the
> above query.
> it's not easy to know applying which path_expression failed.

Hmm, I'm not so sure about mentioning the details of the path because
path names are optional and printing path expression itself is not a
good idea.  Perhaps, we could mention the column name which would
always be there, but we'd then need to add a new field column_name
that's optionally set to JsonFuncExpr and JsonExpr, that is, when they
are being set up for JSON_TABLE() columns.  As shown in the attached.
With the patch you'll get:

ERROR:  no SQL/JSON item found for column "b"

--
Thanks, Amit Langote


v1-0001-JSON_TABLE-mention-column-name-in-the-ON-EMPTY-er.patch
Description: Binary data


Re: remaining sql/json patches

2024-04-08 Thread Amit Langote
On Mon, Apr 8, 2024 at 2:02 PM jian he  wrote:
> On Mon, Apr 8, 2024 at 11:21 AM jian he  wrote:
> >
> > On Mon, Apr 8, 2024 at 12:34 AM jian he  wrote:
> > >
> > > On Sun, Apr 7, 2024 at 9:36 PM Amit Langote  
> > > wrote:
> > > > 0002 needs an expanded commit message but I've run out of energy today.
> > > >
> >
> > other than that, it looks good to me.
>
> one more tiny issue.
> +EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view1;
> +ERROR:  relation "jsonb_table_view1" does not exist
> +LINE 1: EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view1...
> +   ^
> maybe you want
> EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view7;
> at the end of the sqljson_jsontable.sql.
> I guess it will be fine, but the format json explain's out is quite big.
>
> you also need to `drop table s cascade;` at the end of the test?

Pushed after fixing this and other issues.  Thanks a lot for your
careful reviews.

I've marked the CF entry for this as committed now:
https://commitfest.postgresql.org/47/4377/

Let's work on the remaining PLAN clause with a new entry in the next
CF, possibly in a new email thread.

-- 
Thanks, Amit Langote




Re: generic plans and "initial" pruning

2024-04-08 Thread Amit Langote
Hi Andrey,

On Sun, Mar 31, 2024 at 2:03 PM Andrey M. Borodin  wrote:
> > On 6 Dec 2023, at 23:52, Robert Haas  wrote:
> >
> >  I hope that it's at least somewhat useful.
>
> > On 5 Jan 2024, at 15:46, vignesh C  wrote:
> >
> > There is a leak reported
>
> Hi Amit,
>
> this is a kind reminder that some feedback on your patch[0] is waiting for 
> your reply.
> Thank you for your work!

Thanks for moving this to the next CF.

My apologies (especially to Robert) for not replying on this thread
for a long time.

I plan to start working on this soon.

--
Thanks, Amit Langote




Re: remaining sql/json patches

2024-04-07 Thread Amit Langote
On Sun, Apr 7, 2024 at 10:21 PM jian he  wrote:
> On Sun, Apr 7, 2024 at 12:30 PM jian he  wrote:
> >
> > other than that, it looks good to me.
> while looking at it again.
>
> + | NESTED path_opt Sconst
> + COLUMNS '(' json_table_column_definition_list ')'
> + {
> + JsonTableColumn *n = makeNode(JsonTableColumn);
> +
> + n->coltype = JTC_NESTED;
> + n->pathspec = (JsonTablePathSpec *)
> + makeJsonTablePathSpec($3, NULL, @3, -1);
> + n->columns = $6;
> + n->location = @1;
> + $$ = (Node *) n;
> + }
> + | NESTED path_opt Sconst AS name
> + COLUMNS '(' json_table_column_definition_list ')'
> + {
> + JsonTableColumn *n = makeNode(JsonTableColumn);
> +
> + n->coltype = JTC_NESTED;
> + n->pathspec = (JsonTablePathSpec *)
> + makeJsonTablePathSpec($3, $5, @3, @5);
> + n->columns = $8;
> + n->location = @1;
> + $$ = (Node *) n;
> + }
> + ;
> +
> +path_opt:
> + PATH
> + | /* EMPTY */
>   ;
>
> for `NESTED PATH`, `PATH` is optional.
> So for the doc, many places we need to replace `NESTED PATH` to `NESTED 
> [PATH]`?

Thanks for checking.

I've addressed most of your comments in the recent days including
today's.  Thanks for the patches for adding new test cases.  That was
very helpful.

I've changed the recursive structure of JsonTablePlanNextRow().  While
it still may not be perfect, I think it's starting to look good now.

0001 is a patch to fix up get_json_expr_options() so that it now emits
WRAPPER and QUOTES such that they work correctly.

0002 needs an expanded commit message but I've run out of energy today.

-- 
Thanks, Amit Langote


v51-0001-Fix-JsonExpr-deparsing-to-emit-QUOTES-and-WRAPPE.patch
Description: Binary data


v51-0002-JSON_TABLE-Add-support-for-NESTED-columns.patch
Description: Binary data


Re: remaining sql/json patches

2024-04-06 Thread Amit Langote
Hi,

On Sat, Apr 6, 2024 at 3:55 PM jian he  wrote:
> On Sat, Apr 6, 2024 at 2:03 PM Amit Langote  wrote:
> >
> > >
> > > * problem with type "char". the view def  output is not the same as
> > > the select * from v1.
> > >
> > > create or replace view v1 as
> > > SELECT col FROM s,
> > > JSON_TABLE(jsonb '{"d": ["hello", "hello1"]}', '$' as c1
> > > COLUMNS(col "char" path '$.d' without wrapper keep quotes))sub;
> > >
> > > \sv v1
> > > CREATE OR REPLACE VIEW public.v1 AS
> > >  SELECT sub.col
> > >FROM s,
> > > JSON_TABLE(
> > > '{"d": ["hello", "hello1"]}'::jsonb, '$' AS c1
> > > COLUMNS (
> > > col "char" PATH '$."d"'
> > > )
> > > ) sub
> > > one under the hood called JSON_QUERY_OP, another called JSON_VALUE_OP.
> >
> > Hmm, I don't see a problem as long as both are equivalent or produce
> > the same result.  Though, perhaps we could make
> > get_json_expr_options() also deparse JSW_NONE explicitly into "WITHOUT
> > WRAPPER" instead of a blank.  But that's existing code, so will take
> > care of it as part of the above open item.
> >
> > > I will do extensive checking for other types later, so far, other than
> > > these two issues,
> > > get_json_table_columns is pretty solid, I've tried nested columns with
> > > nested columns, it just works.
> >
> > Thanks for checking.
> >
> After applying v50, this type also has some issues.
> CREATE OR REPLACE VIEW t1 as
> SELECT sub.* FROM JSON_TABLE(jsonb '{"d": ["hello", "hello1"]}',
> '$' AS c1 COLUMNS (
> "tsvector0" tsvector path '$.d' without wrapper omit quotes,
> "tsvector1" tsvector path '$.d' without wrapper keep quotes))sub;
> table t1;
>
> return
> tsvector0|tsvector1
> -+-
>  '"hello1"]' '["hello",' | '"hello1"]' '["hello",'
> (1 row)
>
> src5=# \sv t1
> CREATE OR REPLACE VIEW public.t1 AS
>  SELECT tsvector0,
> tsvector1
>FROM JSON_TABLE(
> '{"d": ["hello", "hello1"]}'::jsonb, '$' AS c1
> COLUMNS (
> tsvector0 tsvector PATH '$."d"' OMIT QUOTES,
> tsvector1 tsvector PATH '$."d"'
> )
> ) sub
>
> but
>
>  SELECT tsvector0,
> tsvector1
>FROM JSON_TABLE(
> '{"d": ["hello", "hello1"]}'::jsonb, '$' AS c1
> COLUMNS (
> tsvector0 tsvector PATH '$."d"' OMIT QUOTES,
> tsvector1 tsvector PATH '$."d"'
> )
> ) sub
>
> only return
> tsvector0| tsvector1
> -+---
>  '"hello1"]' '["hello",' |

Yep, we *should* fix get_json_expr_options() to emit KEEP QUOTES and
WITHOUT WRAPPER options so that transformJsonTableColumns() does the
correct thing when you execute the \sv output.  Like this:

diff --git a/src/backend/utils/adt/ruleutils.c
b/src/backend/utils/adt/ruleutils.c
index 283ca53cb5..5a6aabe100 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -8853,9 +8853,13 @@ get_json_expr_options(JsonExpr *jsexpr,
deparse_context *context,
 appendStringInfo(context->buf, " WITH CONDITIONAL WRAPPER");
 else if (jsexpr->wrapper == JSW_UNCONDITIONAL)
 appendStringInfo(context->buf, " WITH UNCONDITIONAL WRAPPER");
+else if (jsexpr->wrapper == JSW_NONE)
+appendStringInfo(context->buf, " WITHOUT WRAPPER");

 if (jsexpr->omit_quotes)
 appendStringInfo(context->buf, " OMIT QUOTES");
+else
+appendStringInfo(context->buf, " KEEP QUOTES");
 }

Will get that pushed tomorrow.  Thanks for the test case.

-- 
Thanks, Amit Langote




Re: remaining sql/json patches

2024-04-06 Thread Amit Langote
On Sat, Apr 6, 2024 at 12:31 PM jian he  wrote:
> On Fri, Apr 5, 2024 at 8:35 PM Amit Langote  wrote:
> > Here's one.  Main changes:
> >
> > * Fixed a bug in get_table_json_columns() which caused nested columns
> > to be deparsed incorrectly, something Jian reported upthread.
> > * Simplified the algorithm in JsonTablePlanNextRow()
> >
> > I'll post another revision or two maybe tomorrow, but posting what I
> > have now in case Jian wants to do more testing.
>
> i am using the upthread view validation function.
> by comparing `execute the view definition` and `select * from the_view`,
> I did find 2 issues.
>
> * problem in transformJsonBehavior, JSON_BEHAVIOR_DEFAULT branch.
> I think we can fix this problem later, since sql/json query function
> already committed?
>
> CREATE DOMAIN jsonb_test_domain AS text CHECK (value <> 'foo');
> normally, we do:
> SELECT JSON_VALUE(jsonb '{"d1": "H"}', '$.a2' returning
> jsonb_test_domain DEFAULT 'foo' ON ERROR);
>
> but parsing back view def, we do:
> SELECT JSON_VALUE(jsonb '{"d1": "H"}', '$.a2' returning
> jsonb_test_domain DEFAULT 'foo'::text::jsonb_test_domain ON ERROR);
>
> then I found the following two queries should not be error out.
> SELECT JSON_VALUE(jsonb '{"d1": "H"}', '$.a2' returning
> jsonb_test_domain DEFAULT 'foo1'::text::jsonb_test_domain ON ERROR);
> SELECT JSON_VALUE(jsonb '{"d1": "H"}', '$.a2' returning
> jsonb_test_domain DEFAULT 'foo1'::jsonb_test_domain ON ERROR);

Yeah, added an open item for this:
https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items#Open_Issues

> 
>
> * problem with type "char". the view def  output is not the same as
> the select * from v1.
>
> create or replace view v1 as
> SELECT col FROM s,
> JSON_TABLE(jsonb '{"d": ["hello", "hello1"]}', '$' as c1
> COLUMNS(col "char" path '$.d' without wrapper keep quotes))sub;
>
> \sv v1
> CREATE OR REPLACE VIEW public.v1 AS
>  SELECT sub.col
>FROM s,
> JSON_TABLE(
> '{"d": ["hello", "hello1"]}'::jsonb, '$' AS c1
> COLUMNS (
> col "char" PATH '$."d"'
> )
> ) sub
> one under the hood called JSON_QUERY_OP, another called JSON_VALUE_OP.

Hmm, I don't see a problem as long as both are equivalent or produce
the same result.  Though, perhaps we could make
get_json_expr_options() also deparse JSW_NONE explicitly into "WITHOUT
WRAPPER" instead of a blank.  But that's existing code, so will take
care of it as part of the above open item.

> I will do extensive checking for other types later, so far, other than
> these two issues,
> get_json_table_columns is pretty solid, I've tried nested columns with
> nested columns, it just works.

Thanks for checking.

-- 
Thanks, Amit Langote




Re: remaining sql/json patches

2024-04-05 Thread Amit Langote
Hi Michael,

On Fri, Apr 5, 2024 at 3:07 PM Michael Paquier  wrote:
> On Fri, Apr 05, 2024 at 09:00:00AM +0300, Alexander Lakhin wrote:
> > Please look at an assertion failure:
> > TRAP: failed Assert("count <= tupdesc->natts"), File: "parse_relation.c", 
> > Line: 3048, PID: 1325146
> >
> > triggered by the following query:
> > SELECT * FROM JSON_TABLE('0', '$' COLUMNS (js int PATH '$')),
> >   COALESCE(row(1)) AS (a int, b int);
> >
> > Without JSON_TABLE() I get:
> > ERROR:  function return row and query-specified return row do not match
> > DETAIL:  Returned row contains 1 attribute, but query expects 2.
>
> I've added an open item on this one.  We need to keep track of all
> that.

We figured out that this is an existing bug unrelated to JSON_TABLE(),
which Alexander reported to -bugs:
https://postgr.es/m/18422-89ca86c8eac52...@postgresql.org

I have moved the item to Older Bugs:
https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items#Live_issues

-- 
Thanks, Amit Langote




Re: remaining sql/json patches

2024-04-05 Thread Amit Langote
On Thu, Apr 4, 2024 at 9:02 PM Amit Langote  wrote:
> I'll post the rebased 0002 tomorrow after addressing your comments.

Here's one.  Main changes:

* Fixed a bug in get_table_json_columns() which caused nested columns
to be deparsed incorrectly, something Jian reported upthread.
* Simplified the algorithm in JsonTablePlanNextRow()

I'll post another revision or two maybe tomorrow, but posting what I
have now in case Jian wants to do more testing.

-- 
Thanks, Amit Langote


v50-0001-JSON_TABLE-Add-support-for-NESTED-columns.patch
Description: Binary data


Re: remaining sql/json patches

2024-04-05 Thread Amit Langote
On Fri, Apr 5, 2024 at 5:00 PM Alexander Lakhin  wrote:
> 05.04.2024 10:09, Amit Langote wrote:
> > Seems like it might be a pre-existing issue, because I can also
> > reproduce the crash with:
>
> That's strange, because I get the error (on master, 6f132ed69).
> With backtrace_functions = 'tupledesc_match', I see
> 2024-04-05 10:48:27.827 MSK client backend[2898632] regress ERROR: function 
> return row and query-specified return row do
> not match
> 2024-04-05 10:48:27.827 MSK client backend[2898632] regress DETAIL: Returned 
> row contains 1 attribute, but query expects 2.
> 2024-04-05 10:48:27.827 MSK client backend[2898632] regress BACKTRACE:
> tupledesc_match at execSRF.c:948:3
> ExecMakeTableFunctionResult at execSRF.c:427:13
> FunctionNext at nodeFunctionscan.c:94:5
> ExecScanFetch at execScan.c:131:10
> ExecScan at execScan.c:180:10
> ExecFunctionScan at nodeFunctionscan.c:272:1
> ExecProcNodeFirst at execProcnode.c:465:1
> ExecProcNode at executor.h:274:9
>   (inlined by) ExecutePlan at execMain.c:1646:10
> standard_ExecutorRun at execMain.c:363:3
> ExecutorRun at execMain.c:305:1
> PortalRunSelect at pquery.c:926:26
> PortalRun at pquery.c:775:8
> exec_simple_query at postgres.c:1282:3
> PostgresMain at postgres.c:4684:27
> BackendMain at backend_startup.c:57:2
> pgarch_die at pgarch.c:847:1
> BackendStartup at postmaster.c:3593:8
> ServerLoop at postmaster.c:1674:6
> main at main.c:184:3
>  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) 
> [0x7f37127f0e40]
> 2024-04-05 10:48:27.827 MSK client backend[2898632] regress STATEMENT:  
> SELECT * FROM COALESCE(row(1)) AS (a int, b int);
>
> That's why I had attributed the failure to JSON_TABLE().
>
> Though SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b 
> int);
> really triggers the assert too.
> Sorry for the noise...

No worries.  Let's start another thread so that this gets more attention.

-- 
Thanks, Amit Langote




Re: remaining sql/json patches

2024-04-05 Thread Amit Langote
Hi Alexander,

On Fri, Apr 5, 2024 at 3:00 PM Alexander Lakhin  wrote:
>
> Hello Amit,
>
> 04.04.2024 15:02, Amit Langote wrote:
> > Pushed after fixing these and a few other issues.  I didn't include
> > the testing function you proposed in your other email.  It sounds
> > useful for testing locally but will need some work before we can
> > include it in the tree.
> >
> > I'll post the rebased 0002 tomorrow after addressing your comments.
>
> Please look at an assertion failure:
> TRAP: failed Assert("count <= tupdesc->natts"), File: "parse_relation.c", 
> Line: 3048, PID: 1325146
>
> triggered by the following query:
> SELECT * FROM JSON_TABLE('0', '$' COLUMNS (js int PATH '$')),
>COALESCE(row(1)) AS (a int, b int);
>
> Without JSON_TABLE() I get:
> ERROR:  function return row and query-specified return row do not match
> DETAIL:  Returned row contains 1 attribute, but query expects 2.

Thanks for the report.

Seems like it might be a pre-existing issue, because I can also
reproduce the crash with:

SELECT * FROM COALESCE(row(1)) AS (a int, b int);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Backtrace:

#0  __pthread_kill_implementation (threadid=281472845250592,
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x806c4334 in __pthread_kill_internal (signo=6,
threadid=) at pthread_kill.c:78
#2  0x8067c73c in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x80669034 in __GI_abort () at abort.c:79
#4  0x00ad9d4c in ExceptionalCondition (conditionName=0xcbb368
"!(tupdesc->natts >= colcount)", errorType=0xcbb278 "FailedAssertion",
fileName=0xcbb2c8 "nodeFunctionscan.c",
lineNumber=379) at assert.c:54
#5  0x0073edec in ExecInitFunctionScan (node=0x293d4ed0,
estate=0x293d51b8, eflags=16) at nodeFunctionscan.c:379
#6  0x00724bc4 in ExecInitNode (node=0x293d4ed0,
estate=0x293d51b8, eflags=16) at execProcnode.c:248
#7  0x0071b1cc in InitPlan (queryDesc=0x292f5d78, eflags=16)
at execMain.c:1006
#8  0x00719f6c in standard_ExecutorStart
(queryDesc=0x292f5d78, eflags=16) at execMain.c:252
#9  0x00719cac in ExecutorStart (queryDesc=0x292f5d78,
eflags=0) at execMain.c:134
#10 0x00945520 in PortalStart (portal=0x29399458, params=0x0,
eflags=0, snapshot=0x0) at pquery.c:527
#11 0x0093ee50 in exec_simple_query (query_string=0x29332d38
"SELECT * FROM COALESCE(row(1)) AS (a int, b int);") at
postgres.c:1175
#12 0x00943cb8 in PostgresMain (argc=1, argv=0x2935d610,
dbname=0x2935d450 "postgres", username=0x2935d430 "amit") at
postgres.c:4297
#13 0x0087e978 in BackendRun (port=0x29356c00) at postmaster.c:4517
#14 0x0087e0bc in BackendStartup (port=0x29356c00) at postmaster.c:4200
#15 0x00879638 in ServerLoop () at postmaster.c:1725
#16 0x00878eb4 in PostmasterMain (argc=3, argv=0x292eeac0) at
postmaster.c:1398
#17 0x00791db8 in main (argc=3, argv=0x292eeac0) at main.c:228

Backtrace looks a bit different with a query similar to yours:

SELECT * FROM generate_series(1, 1), COALESCE(row(1)) AS (a int, b int);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

#0  __pthread_kill_implementation (threadid=281472845250592,
signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x806c4334 in __pthread_kill_internal (signo=6,
threadid=) at pthread_kill.c:78
#2  0x8067c73c in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x80669034 in __GI_abort () at abort.c:79
#4  0x00ad9d4c in ExceptionalCondition (conditionName=0xc903b0
"!(count <= tupdesc->natts)", errorType=0xc8f8c8 "FailedAssertion",
fileName=0xc8f918 "parse_relation.c",
lineNumber=2649) at assert.c:54
#5  0x00649664 in expandTupleDesc (tupdesc=0x293da188,
eref=0x293d7318, count=2, offset=0, rtindex=2, sublevels_up=0,
location=-1, include_dropped=true, colnames=0x0,
colvars=0xc39253c8) at parse_relation.c:2649
#6  0x00648d08 in expandRTE (rte=0x293d7390, rtindex=2,
sublevels_up=0, location=-1, include_dropped=true, colnames=0x0,
colvars=0xc39253c8) at parse_relation.c:2361
#7  0x00849bd0 in build_physical_tlist (root=0x293d5318,
rel=0x293d88e8) at plancat.c:1681
#8  0x00806ad0 in create_scan_plan (root=0x293d5318,
best_path=0x293cd888, flags=0) at createplan.c:605
#9  0x0080666c in create_plan_recurse (root=0x293d5318,
best_path=0x293cd888, flags=0) at createplan

Re: remaining sql/json patches

2024-04-04 Thread Amit Langote
On Wed, Apr 3, 2024 at 11:48 PM jian he  wrote:
> hi.
> +  
> +   json_table is an SQL/JSON function which
> +   queries JSON data
> +   and presents the results as a relational view, which can be accessed as a
> +   regular SQL table. You can only use
> json_table inside the
> +   FROM clause of a SELECT,
> +   UPDATE, DELETE, or
> MERGE
> +   statement.
> +  
>
> the only issue is that MERGE Synopsis don't have
> FROM clause.
> other than that, it's quite correct.
> see following tests demo:
>
> drop table ss;
> create table ss(a int);
> insert into ss select 1;
> delete from ss using JSON_TABLE(jsonb '1', '$' COLUMNS (a int PATH '$'
> ) ERROR ON ERROR) jt where jt.a = 1;
> insert into ss select 2;
> update ss set a = 1 from JSON_TABLE(jsonb '2', '$' COLUMNS (a int PATH
> '$')) jt where jt.a = 2;
> DROP TABLE IF EXISTS target;
> CREATE TABLE target (tid integer, balance integer) WITH
> (autovacuum_enabled=off);
> INSERT INTO target VALUES (1, 10),(2, 20),(3, 30);
> MERGE INTO target USING JSON_TABLE(jsonb '2', '$' COLUMNS (a int PATH
> '$' ) ERROR ON ERROR) source(sid)
> ON target.tid = source.sid
> WHEN MATCHED THEN UPDATE SET balance = 0
> returning *;
> --
>
> +  
> +   To split the row pattern into columns, json_table
> +   provides the COLUMNS clause that defines the
> +   schema of the created view. For each column, a separate path expression
> +   can be specified to be evaluated against the row pattern to get a
> +   SQL/JSON value that will become the value for the specified column in
> +   a given output row.
> +  
> should be "an SQL/JSON".
>
> +
> + Inserts a SQL/JSON value obtained by applying
> + path_expression against the row pattern into
> + the view's output row after coercing it to specified
> + type.
> +
> should be "an SQL/JSON".
>
> "coercing it to specified type"
> should be
> "coercing it to the specified type"?
> ---
> +
> + The value corresponds to whether evaluating the PATH
> + expression yields any SQL/JSON values.
> +
> maybe we can change to
> +
> + The value corresponds to whether applying the
> path_expression
> + expression yields any SQL/JSON values.
> +
> so it looks more consistent with the preceding paragraph.
>
> +
> + Optionally, ON ERROR can be used to specify whether
> + to throw an error or return the specified value to handle structural
> + errors, respectively.  The default is to return a boolean value
> + FALSE.
> +
> we don't need "respectively" here?
>
> + if (jt->on_error &&
> + jt->on_error->btype != JSON_BEHAVIOR_ERROR &&
> + jt->on_error->btype != JSON_BEHAVIOR_EMPTY &&
> + jt->on_error->btype != JSON_BEHAVIOR_EMPTY_ARRAY)
> + ereport(ERROR,
> + errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("invalid ON ERROR behavior"),
> + errdetail("Only EMPTY or ERROR is allowed for ON ERROR in JSON_TABLE()."),
> + parser_errposition(pstate, jt->on_error->location));
>
> errdetail("Only EMPTY or ERROR is allowed for ON ERROR in JSON_TABLE()."),
> maybe change to something like:
> `
> errdetail("Only EMPTY or ERROR is allowed for ON ERROR in the
> top-level JSON_TABLE() ").
> `
> i guess mentioning "top-level" is fine.
> since "top-level", we have 19  appearances in functions-json.html.

Thanks for checking.

Pushed after fixing these and a few other issues.  I didn't include
the testing function you proposed in your other email.  It sounds
useful for testing locally but will need some work before we can
include it in the tree.

I'll post the rebased 0002 tomorrow after addressing your comments.


--
Thanks, Amit Langote




Re: remaining sql/json patches

2024-03-29 Thread Amit Langote
Hi Alvaro,

On Fri, Mar 29, 2024 at 2:04 AM Alvaro Herrera  wrote:
> On 2024-Mar-28, Amit Langote wrote:
>
> > Here's patch 1 for the time being that implements barebones
> > JSON_TABLE(), that is, without NESTED paths/columns and PLAN clause.
> > I've tried to shape the interfaces so that those features can be added
> > in future commits without significant rewrite of the code that
> > implements barebones JSON_TABLE() functionality.  I'll know whether
> > that's really the case when I rebase the full patch over it.
>
> I think this barebones patch looks much closer to something that can be
> committed for pg17, given the current commitfest timeline.  Maybe we
> should just slip NESTED and PLAN to pg18 to focus current efforts into
> getting the basic functionality in 17.  When I looked at the JSON_TABLE
> patch last month, it appeared far too large to be reviewable in
> reasonable time.  The fact that this split now exists gives me hope that
> we can get at least the first part of it.

Thanks for chiming in.  I agree that 0001 looks more manageable.

> (A note that PLAN seems to correspond to separate features T824+T838, so
> leaving that one out would still let us claim T821 "Basic SQL/JSON query
> operators" ... however, the NESTED clause does not appear to be a
> separate SQL feature; in particular it does not appear to correspond to
> T827, though I may be reading the standard wrong.  So if we don't have
> NESTED, apparently we could not claim to support T821.)

I've posted 0002 just now, which shows that adding just NESTED but not
PLAN might be feasible.

-- 
Thanks, Amit Langote




Re: remaining sql/json patches

2024-03-26 Thread Amit Langote
On Wed, Mar 27, 2024 at 12:42 PM jian he  wrote:
> hi.
> I don't fully understand all the code in json_table patch.
> maybe we can split it into several patches,

I'm working on exactly that atm.

> like:
> * no nested json_table_column.
> * nested json_table_column, with PLAN DEFAULT
> * nested json_table_column, with PLAN ( json_table_plan )

Yes, I think it will end up something like this.  I'll try to post the
breakdown tomorrow.

-- 
Thanks, Amit Langote




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-25 Thread Amit Langote
On Mon, Mar 25, 2024 at 7:25 PM Richard Guo  wrote:
> On Mon, Mar 25, 2024 at 5:17 PM Amit Langote  wrote:
>> I've pushed this now.
>>
>> When updating the commit message, I realized that you had made the
>> right call to divide the the changes around not translating the dummy
>> SpecialJoinInfos into a separate patch.  So, I pushed it like that.
>
>
> Sorry I'm late for the party.  I noticed that in commit 6190d828cd the
> comment of init_dummy_sjinfo() mentions the non-existing 'rel1' and
> 'rel2', which should be addressed.

Thanks for catching that.  Oops.

>  Also, I think we can make function
> consider_new_or_clause() call init_dummy_sjinfo() to simplify the code a
> bit.

Indeed.

> Attached is a trivial patch for the fixes.

Will apply shortly.

-- 
Thanks, Amit Langote




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-25 Thread Amit Langote
On Fri, Mar 22, 2024 at 7:48 PM Ashutosh Bapat
 wrote:
> With the later code, semi_rhs_expr is indeed free'able. It wasn't so when I
> wrote the patches. Thanks for updating the comment. I wish we would have
> freeObject(). Nothing that can be done for this patch.

Yes.

> Attached patches
> Squashed your 0001 and 0002 into 0001. I have also reworded the commit 
> message to reflect the latest code changes.
> 0002 has some minor edits. Please feel free to include or reject when 
> committing the patch.

I've pushed this now.

When updating the commit message, I realized that you had made the
right call to divide the the changes around not translating the dummy
SpecialJoinInfos into a separate patch.  So, I pushed it like that.

Thanks for working on this.

-- 
Thanks, Amit Langote




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-22 Thread Amit Langote
Hi Ashutosh,

On Tue, Mar 19, 2024 at 12:47 AM Ashutosh Bapat
 wrote:
> On Mon, Mar 18, 2024 at 5:40 PM Amit Langote  wrote:
>> >>
>> >> Sorry, I should’ve mentioned that I was interested in seeing cpu times to 
>> >> compare the two approaches. Specifically, to see if the palloc / frees 
>> >> add noticeable overhead.
>> >
>> > No problem. Here you go
>> >
>> >  tables |  master  | patched  | perc_change
>> > +--+--+-
>> >   2 |   477.87 |   492.32 |   -3.02
>> >   3 |  1970.83 |  1989.52 |   -0.95
>> >   4 |  6305.01 |  6268.81 |0.57
>> >   5 | 19261.56 | 18684.86 |2.99
>> >
>> > +ve change indicates reduced planning time. It seems that the planning 
>> > time improves as the number of tables increases. But all the numbers are 
>> > well within noise levels and thus may not show any improvement or 
>> > deterioration. If you want, I can repeat the experiment.
>>
>> Thanks.  I also wanted to see cpu time difference between the two
>> approaches of SpecialJoinInfo allocation -- on stack and on heap.  So
>> comparing times with the previous version of the patch using stack
>> allocation and the new one with palloc.  I'm hoping that the new
>> approach is only worse in the noise range.
>
>
> Ah, sorry, I didn't read it carefully. Alvaro made me realise that I have 
> been gathering numbers using assert enabled builds, so they are not that 
> reliable. Here they are with non-assert enabled builds.
>
> planning time (in ms) reported by EXPLAIN.
>  tables |  master  | stack_alloc | perc_change_1 |  palloc  | perc_change_2 | 
> total_perc_change
> +--+-+---+--+---+---
>   2 |338.1 |  333.92 |  1.24 |   332.16 |  0.53 | 
>  1.76
>   3 |  1507.93 | 1475.76 |  2.13 |  1472.79 |  0.20 | 
>  2.33
>   4 |  5099.45 | 4980.35 |  2.34 |   4947.3 |  0.66 | 
>  2.98
>   5 | 15442.64 |15531.94 | -0.58 | 15393.41 |  0.89 | 
>  0.32
>
> stack_alloc = timings with SpecialJoinInfo on stack
> perc_change_1 = percentage change of above wrt master
> palloc = timings with palloc'ed SpecialJoinInfo
> perc_change_2 = percentage change of above wrt timings with stack_alloc
> total_perc_change = percentage change between master and all patches
>
> total change is within noise. Timing with palloc is better than that with 
> SpecialJoinInfo on stack but only marginally. I don't think that means palloc 
> based allocation is better or worse than stack based.
>
> You requested CPU time difference between stack based SpecialJoinInfo and 
> palloc'ed SpecialJoinInfo. Here it is
> tables | stack_alloc |  palloc  | perc_change
> +-+--+-
>   2 |0.438204 | 0.438986 |   -0.18
>   3 |1.224672 | 1.238781 |   -1.15
>   4 |3.511317 | 3.663334 |   -4.33
>   5 |9.283856 | 9.340516 |   -0.61
>
> Yes, there's a consistent degradation albeit within noise levels.
>
> The memory measurements
>  tables | master | with_patch
> ++
>   2 | 26 MB  | 26 MB
>   3 | 93 MB  | 84 MB
>   4 | 277 MB | 235 MB
>   5 | 954 MB | 724 MB
>
> The first row shows the same value because of rounding. The actual values 
> there are 27740432 and 26854704 resp.
>
> Please let me know if you need anything.

Thanks for repeating the benchmark.  So I don't see a problem with
keeping the existing palloc() way of allocating the
SpecialJoinInfos().  We're adding a few cycles by adding
free_child_join_sjinfo() (see my delta patch about the rename), but
the reduction in memory consumption due to it, which is our goal here,
far outweighs what looks like a minor CPU slowdown.

I have squashed 0002, 0003 into 0001, done some changes of my own, and
attached the delta patch as 0002.  I've listed the changes in the
commit message.  Let me know if anything seems amiss.

I'd like to commit 0001 + 0002 + any other changes based on your
comments (if any) sometime early next week.


--
Thanks, Amit Langote
From 9d8c408b3379faab6269421f67d75530f1c4cde2 Mon Sep 17 00:00:00 2001
From: Amit Langote 
Date: Fri, 22 Mar 2024 16:51:48 +0900
Subject: [PATCH v1 2/2] Some fixes

* Revert (IMO) unnecessary modifications of build_child_join_sjinfo().
For example, hunks to rename sjinfo to child_sjinfo seem unnecessary,
because it's clear from the context that they are "child" sjinfos.

* Rename free_child_sjinfo_me

Re: remaining sql/json patches

2024-03-21 Thread Amit Langote
Hi Horiguchi-san,

On Fri, Mar 22, 2024 at 9:51 AM Kyotaro Horiguchi
 wrote:
> At Wed, 20 Mar 2024 21:53:52 +0900, Amit Langote  
> wrote in
> > I'll push 0001 tomorrow.
>
> This patch (v44-0001-Add-SQL-JSON-query-functions.patch) introduced the 
> following new erro message:
>
> +errmsg("can only specify 
> constant, non-aggregate"
> +   " function, 
> or operator expression for"
> +   " DEFAULT"),
>
> I believe that our convention here is to write an error message in a
> single string literal, not split into multiple parts, for better
> grep'ability.

Thanks for the heads up.

My bad, will push a fix shortly.

-- 
Thanks, Amit Langote




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-18 Thread Amit Langote
On Mon, Mar 18, 2024 at 8:57 PM Ashutosh Bapat
 wrote:
> On Mon, Mar 18, 2024 at 5:05 PM Amit Langote  wrote:
>> On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat  
>> wrote:
>>> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote  
>>> wrote:
>>>> Could you please post the numbers with the palloc() / pfree() version?
>>>
>>> Here are they
>>>  tables | master  | patched
>>> +-+-
>>>   2 | 29 MB   | 28 MB
>>>   3 | 102 MB  | 93 MB
>>>   4 | 307 MB  | 263 MB
>>>   5 | 1076 MB | 843 MB
>>>
>>> The numbers look slightly different from my earlier numbers. But they were 
>>> quite old. The patch used to measure memory that time is different from the 
>>> one that we committed. So there's a slight difference in the way we measure 
>>> memory as well.
>>
>>
>> Sorry, I should’ve mentioned that I was interested in seeing cpu times to 
>> compare the two approaches. Specifically, to see if the palloc / frees add 
>> noticeable overhead.
>
> No problem. Here you go
>
>  tables |  master  | patched  | perc_change
> +--+--+-
>   2 |   477.87 |   492.32 |   -3.02
>   3 |  1970.83 |  1989.52 |   -0.95
>   4 |  6305.01 |  6268.81 |0.57
>   5 | 19261.56 | 18684.86 |2.99
>
> +ve change indicates reduced planning time. It seems that the planning time 
> improves as the number of tables increases. But all the numbers are well 
> within noise levels and thus may not show any improvement or deterioration. 
> If you want, I can repeat the experiment.

Thanks.  I also wanted to see cpu time difference between the two
approaches of SpecialJoinInfo allocation -- on stack and on heap.  So
comparing times with the previous version of the patch using stack
allocation and the new one with palloc.  I'm hoping that the new
approach is only worse in the noise range.

-- 
Thanks, Amit Langote




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-18 Thread Amit Langote
On Mon, Mar 18, 2024 at 20:11 Ashutosh Bapat 
wrote:

> Hi Amit,
>
>
> On Fri, Mar 15, 2024 at 11:45 AM Amit Langote 
> wrote:
>
>> > >
>> > > That being said I'm a big fan of using a local variable on stack and
>> > > filling it. I'd probably go with the usual palloc/pfree, because that
>> > > makes it much easier to use - the callers would not be responsible for
>> > > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
>> > > overhead, but with the AllocSet caching I doubt it's measurable.
>> >
>> > You are suggesting that instead of declaring a local variable of type
>> > SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
>> > SpecialJoinInfo which will be freed by free_child_sjinfo()
>> > (free_child_sjinfo_members in the patch). I am fine with that.
>>
>> Agree with Tomas about using makeNode() / pfree().  Having the pfree()
>> kind of makes it extra-evident that those SpecialJoinInfos are
>> transitory.
>>
>
> Attached patch-set
>
> 0001 - original patch as is
> 0002 - addresses your first set of comments
> 0003 - uses palloc and pfree to allocate and deallocate SpecialJoinInfo
> structure.
>
> I will squash both 0002 and 0003 into 0001 once you review those changes
> and are fine with those.
>

Thanks for the new patches.

> > I did put this through check-world on amd64/arm64, with valgrind,
>> > > without any issue. I also tried the scripts shared by Ashutosh in his
>> > > initial message (with some minor fixes, adding MEMORY to explain etc).
>> > >
>> > > The results with the 20240130 patches are like this:
>> > >
>> > >tablesmasterpatched
>> > >   -
>> > > 2  40.8   39.9
>> > > 3 151.7  142.6
>> > > 4 464.0  418.5
>> > > 51663.9 1419.5
>>
>> Could you please post the numbers with the palloc() / pfree() version?
>>
>>
> Here are they
>  tables | master  | patched
> +-+-
>   2 | 29 MB   | 28 MB
>   3 | 102 MB  | 93 MB
>   4 | 307 MB  | 263 MB
>   5 | 1076 MB | 843 MB
>
> The numbers look slightly different from my earlier numbers. But they were
> quite old. The patch used to measure memory that time is different from the
> one that we committed. So there's a slight difference in the way we measure
> memory as well.
>

Sorry, I should’ve mentioned that I was interested in seeing cpu times to
compare the two approaches. Specifically, to see if the palloc / frees add
noticeable overhead.


Re: remaining sql/json patches

2024-03-18 Thread Amit Langote
Himanshu,

On Mon, Mar 18, 2024 at 4:57 PM Himanshu Upadhyaya
 wrote:
> I have tested a nested case  but  why is the negative number allowed in 
> subscript(NESTED '$.phones[-1]'COLUMNS), it should error out if the number is 
> negative.
>
> ‘postgres[170683]=#’SELECT * FROM JSON_TABLE(jsonb '{
> ‘...>’ "id" : "0.234567897890",
> ‘...>’ "name" : { 
> "first":"John", "last":"Doe" 
> },
> ‘...>’ "phones" : [{"type":"home", "number":"555-3762"},
> ‘...>’ {"type":"work", "number":"555-7252", 
> "test":123}]}',
> ‘...>’'$'
> ‘...>’COLUMNS(
> ‘...>’ id numeric(2,2) PATH 'lax $.id',
> ‘...>’ last_name varCHAR(10) PATH 'lax $.name.last', 
> first_name VARCHAR(10) PATH 'lax $.name.first',
> ‘...>’  NESTED '$.phones[-1]'COLUMNS (
> ‘...>’"type" VARCHAR(10),
> ‘...>’"number" VARCHAR(10)
> ‘...>’ )
> ‘...>’  )
> ‘...>’   ) as t;
>   id  | last_name | first_name | type | number
> --+---++--+
>  0.23 | Doe   | Johnnn |  |
> (1 row)

You're not getting an error because the default mode of handling
structural errors in SQL/JSON path expressions is "lax".  If you say
"strict" in the path string, you will get an error:

SELECT * FROM JSON_TABLE(jsonb '{
 "id" : "0.234567897890",
 "name" : {
"first":"John",
"last":"Doe" },
 "phones" : [{"type":"home", "number":"555-3762"},
 {"type":"work", "number":"555-7252", "test":123}]}',
'$'
COLUMNS(
 id numeric(2,2) PATH 'lax $.id',
 last_name varCHAR(10) PATH 'lax $.name.last',
first_name VARCHAR(10) PATH 'lax $.name.first',
  NESTED 'strict $.phones[-1]'COLUMNS (
"type" VARCHAR(10),
"number" VARCHAR(10)
 )
  ) error on error
   ) as t;
ERROR:  jsonpath array subscript is out of bounds

-- 
Thanks, Amit Langote




Re: remaining sql/json patches

2024-03-17 Thread Amit Langote
---
> + Table Function Scan on "JSON_TABLE" json_table_func
> +   Output: id, "int", text
> +   Table Function Call: JSON_TABLE('null'::jsonb, '$[*]' AS 
> json_table_path_0 PASSING 3 AS a, '"foo"'::jsonb AS "b c" COLUMNS (id FOR 
> ORDINALITY, "int" integer PATH '$', text text PATH '$') PLAN 
> (json_table_path_0))
> +(3 rows)
>
> and I'm curious to see what this would output if this was to be run
> without the 0002 patch.  If I understand things correctly, the alias
> would be displayed anyway, meaning 0002 doesn't get us anything.

Patch 0002 came about because old versions of json_table patch were
changing ExplainTargetRel() incorrectly to use rte->tablefunc to get
the function type to set objectname, but rte->tablefunc is NULL
because add_rte_to_flat_rtable() zaps it.  You pointed that out in
[1].

However, we can get the TableFunc to get the function type from the
Plan node instead of the RTE, as follows:

-Assert(rte->rtekind == RTE_TABLEFUNC);
-objectname = "xmltable";
-objecttag = "Table Function Name";
+{
+TableFunc *tablefunc = ((TableFuncScan *) plan)->tablefunc;
+
+Assert(rte->rtekind == RTE_TABLEFUNC);
+switch (tablefunc->functype)
+{
+case TFT_XMLTABLE:
+objectname = "xmltable";
+break;
+case TFT_JSON_TABLE:
+objectname = "json_table";
+break;
+default:
+elog(ERROR, "invalid TableFunc type %d",
+ (int) tablefunc->functype);
+}
+objecttag = "Table Function Name";
+}

So that gets us what we need here.

Given that, 0002 does seem like an overkill and unnecessary, so I'll drop it.

> Please do add a test with EXPLAIN (FORMAT JSON) in 0003.

OK, will do.


--
Thanks, Amit Langote

[1] 
https://www.postgresql.org/message-id/202401181711.qxjxpnl3ohnw%40alvherre.pgsql




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-03-15 Thread Amit Langote
Hi Ashutosh,

On Mon, Feb 19, 2024 at 10:01 PM Ashutosh Bapat
 wrote:
> On Sun, Feb 18, 2024 at 10:55 PM Tomas Vondra
>  wrote:
> >
> > Hi,
> >
> > I took a quick look at this patch today. I certainly agree with the
> > intent to reduce the amount of memory during planning, assuming it's not
> > overly disruptive. And I think this patch is fairly localized and looks
> > sensible.
>
> Thanks for looking at the patch-set.
> >
> > That being said I'm a big fan of using a local variable on stack and
> > filling it. I'd probably go with the usual palloc/pfree, because that
> > makes it much easier to use - the callers would not be responsible for
> > allocating the SpecialJoinInfo struct. Sure, it's a little bit of
> > overhead, but with the AllocSet caching I doubt it's measurable.
>
> You are suggesting that instead of declaring a local variable of type
> SpecialJoinInfo, build_child_join_sjinfo() should palloc() and return
> SpecialJoinInfo which will be freed by free_child_sjinfo()
> (free_child_sjinfo_members in the patch). I am fine with that.

Agree with Tomas about using makeNode() / pfree().  Having the pfree()
kind of makes it extra-evident that those SpecialJoinInfos are
transitory.

> > I did put this through check-world on amd64/arm64, with valgrind,
> > without any issue. I also tried the scripts shared by Ashutosh in his
> > initial message (with some minor fixes, adding MEMORY to explain etc).
> >
> > The results with the 20240130 patches are like this:
> >
> >tablesmasterpatched
> >   -
> > 2  40.8   39.9
> > 3 151.7  142.6
> > 4 464.0  418.5
> > 51663.9 1419.5

Could you please post the numbers with the palloc() / pfree() version?

-- 
Thanks, Amit Langote




Re: remaining sql/json patches

2024-03-12 Thread Amit Langote
Hi Himanshu,

On Tue, Mar 12, 2024 at 6:42 PM Himanshu Upadhyaya
 wrote:
>
> Hi,
>
> wanted to share the below case:
>
> ‘postgres[146443]=#’SELECT JSON_EXISTS(jsonb '{"customer_name": "test", 
> "salary":1000, "department_id":1}', '$.* ? (@== $dept_id && @ == $sal)' 
> PASSING 1000 AS sal, 1 as dept_id);
>  json_exists
> -
>  f
> (1 row)
>
> isn't it supposed to return "true" as json in input is matching with both the 
> condition dept_id and salary?

I think you meant to use || in your condition, not &&, because 1000 != 1.

See:

SELECT JSON_EXISTS(jsonb '{"customer_name": "test", "salary":1000,
"department_id":1}', '$.* ? (@ == $dept_id || @ == $sal)' PASSING 1000
AS sal, 1 as dept_id);
 json_exists
-
 t
(1 row)

Or you could've written the query as:

SELECT JSON_EXISTS(jsonb '{"customer_name": "test", "salary":1000,
"department_id":1}', '$ ? (@.department_id == $dept_id && @.salary ==
$sal)' PASSING 1000 AS sal, 1 as dept_id);
 json_exists
-
 t
(1 row)

Does that make sense?

In any case, JSON_EXISTS() added by the patch here returns whatever
the jsonpath executor returns.  The latter is not touched by this
patch.  PASSING args, which this patch adds, seem to be working
correctly too.

-- 
Thanks, Amit Langote




Re: remaining sql/json patches

2024-03-07 Thread Amit Langote
On Thu, Mar 7, 2024 at 23:14 Alvaro Herrera  wrote:

> On 2024-Mar-07, Tomas Vondra wrote:
>
> > I was experimenting with the v42 patches, and I think the handling of ON
> > EMPTY / ON ERROR clauses may need some improvement.
>
> Well, the 2023 standard says things like
>
>  ::=
>   JSON_VALUE 
>   
>   [  ]
>   [  ON EMPTY ]
>   [  ON ERROR ]
>   
>
> which implies that if you specify it the other way around, it's a syntax
> error.
>
> > I'm not sure what the SQL standard says about this, but it seems other
> > databases don't agree on the order. Is there a particular reason to
> > not allow both orderings?
>
> I vaguely recall that trying to also support the other ordering leads to
> having more rules.


Yeah, I think that was it.  At one point, I removed rules supporting syntax
that wasn’t documented.

Now maybe we do want that because of compatibility
> with other DBMSs, but frankly at this stage I wouldn't bother.


+1.

>


Re: remaining sql/json patches

2024-03-07 Thread Amit Langote
On Thu, Mar 7, 2024 at 22:46 jian he  wrote:

> On Thu, Mar 7, 2024 at 8:06 PM Amit Langote 
> wrote:
> >
> >
> > Indeed.
> >
> > This boils down to the difference in the cast expression chosen to
> > convert the source value to int in the two cases.
> >
> > The case where the source value has no quotes, the chosen cast
> > expression is a FuncExpr for function numeric_int4(), which has no way
> > to suppress errors.  When the source value has quotes, the cast
> > expression is a CoerceViaIO expression, which can suppress the error.
> > The default behavior is to suppress the error and return NULL, so the
> > correct behavior is when the source value has quotes.
> >
> > I think we'll need either:
> >
> > * fix the code in 0001 to avoid getting numeric_int4() in this case,
> > and generally cast functions that don't have soft-error handling
> > support, in favor of using IO coercion.
> > * fix FuncExpr (like CoerceViaIO) to respect SQL/JSON's request to
> > suppress errors and fix downstream functions like numeric_int4() to
> > comply by handling errors softly.
> >
> > I'm inclined to go with the 1st option as we already have the
> > infrastructure in place -- input functions can all handle errors
> > softly.
>
> not sure this is the right way.
> but attached patches solved this problem.
>
> Also, can you share the previous memory-hogging bug issue
> when you are free, I want to know which part I am missing.


Take a look at the json_populate_type() call in ExecEvalJsonCoercion() or
specifically compare the new way of passing its void *cache parameter with
the earlier patches.

>


Re: remaining sql/json patches

2024-03-07 Thread Amit Langote
On Thu, Mar 7, 2024 at 8:13 PM Tomas Vondra
 wrote:
> On 3/7/24 06:18, Himanshu Upadhyaya wrote:

Thanks Himanshu for the testing.

> > On Wed, Mar 6, 2024 at 9:04 PM Tomas Vondra 
> > wrote:
> >>
> >> I'm pretty sure this is the correct & expected behavior. The second
> >> query treats the value as string (because that's what should happen for
> >> values in double quotes).
> >>
> >>  ok, Then why does the below query provide the correct conversion, even if
> > we enclose that in double quotes?
> > ‘postgres[102531]=#’SELECT * FROM JSON_TABLE(jsonb '{
> >  "id" : "1234567890",
> >  "FULL_NAME" : "JOHN DOE"}',
> > '$'
> > COLUMNS(
> >  name varchar(20) PATH 'lax $.FULL_NAME',
> >  id int PATH 'lax $.id'
> >   )
> >)
> > ;
> >name   | id
> > --+
> >  JOHN DOE | 1234567890
> > (1 row)
> >
> > and for bigger input(string) it will leave as empty as below.
> > ‘postgres[102531]=#’SELECT * FROM JSON_TABLE(jsonb '{
> >  "id" : "12345678901",
> >  "FULL_NAME" : "JOHN DOE"}',
> > '$'
> > COLUMNS(
> >  name varchar(20) PATH 'lax $.FULL_NAME',
> >  id int PATH 'lax $.id'
> >   )
> >)
> > ;
> >name   | id
> > --+
> >  JOHN DOE |
> > (1 row)
> >
> > seems it is not something to do with data enclosed in double quotes but
> > somehow related with internal casting it to integer and I think in case of
> > bigger input it is not able to cast it to integer(as defined under COLUMNS
> > as id int PATH 'lax $.id')
> >
> > ‘postgres[102531]=#’SELECT * FROM JSON_TABLE(jsonb '{
> >  "id" : "12345678901",
> >  "FULL_NAME" : "JOHN DOE"}',
> > '$'
> > COLUMNS(
> >  name varchar(20) PATH 'lax $.FULL_NAME',
> >  id int PATH 'lax $.id'
> >   )
> >)
> > ;
> >name   | id
> > --+
> >  JOHN DOE |
> > (1 row)
> > )
> >
> > if it is not able to represent it to integer because of bigger input, it
> > should error out with a similar error message instead of leaving it empty.
> >
> > Thoughts?
> >
>
> Ah, I see! Yes, that's a bit weird. Put slightly differently:
>
> test=# SELECT * FROM JSON_TABLE(jsonb '{"id" : "20"}',
> '$' COLUMNS(id int PATH '$.id'));
>  id
> 
>  20
> (1 row)
>
> Time: 0.248 ms
> test=# SELECT * FROM JSON_TABLE(jsonb '{"id" : "30"}',
> '$' COLUMNS(id int PATH '$.id'));
>  id
> 
>
> (1 row)
>
> Clearly, when converting the string literal into int value, there's some
> sort of error handling that realizes 3B overflows, and returns NULL
> instead. I'm not sure if this is intentional.

Indeed.

This boils down to the difference in the cast expression chosen to
convert the source value to int in the two cases.

The case where the source value has no quotes, the chosen cast
expression is a FuncExpr for function numeric_int4(), which has no way
to suppress errors.  When the source value has quotes, the cast
expression is a CoerceViaIO expression, which can suppress the error.
The default behavior is to suppress the error and return NULL, so the
correct behavior is when the source value has quotes.

I think we'll need either:

* fix the code in 0001 to avoid getting numeric_int4() in this case,
and generally cast functions that don't have soft-error handling
support, in favor of using IO coercion.
* fix FuncExpr (like CoerceViaIO) to respect SQL/JSON's request to
suppress errors and fix downstream functions like numeric_int4() to
comply by handling errors softly.

I'm inclined to go with the 1st option as we already have the
infrastructure in place -- input functions can all handle errors
softly.

For the latter, it uses numeric_int4() which doesn't support
soft-error handling, so throws the error.  With quotes, the


--
Thanks, Amit Langote

--
Thanks, Amit Langote




Re: remaining sql/json patches

2024-03-05 Thread Amit Langote
Hi Tomas,

On Wed, Mar 6, 2024 at 6:30 AM Tomas Vondra
 wrote:
>
> Hi,
>
> I know very little about sql/json and all the json internals, but I
> decided to do some black box testing. I built a large JSONB table
> (single column, ~7GB of data after loading). And then I did a query
> transforming the data into tabular form using JSON_TABLE.
>
> The JSON_TABLE query looks like this:
>
> SELECT jt.* FROM
>   title_jsonb t,
>   json_table(t.info, '$'
> COLUMNS (
>   "id" text path '$."id"',
>   "type" text path '$."type"',
>   "title" text path '$."title"',
>   "original_title" text path '$."original_title"',
>   "is_adult" text path '$."is_adult"',
>   "start_year" text path '$."start_year"',
>   "end_year" text path '$."end_year"',
>   "minutes" text path '$."minutes"',
>   "genres" text path '$."genres"',
>   "aliases" text path '$."aliases"',
>   "directors" text path '$."directors"',
>   "writers" text path '$."writers"',
>   "ratings" text path '$."ratings"',
>   NESTED PATH '$."aliases"[*]'
> COLUMNS (
>   "alias_title" text path '$."title"',
>   "alias_region" text path '$."region"'
> ),
>   NESTED PATH '$."directors"[*]'
> COLUMNS (
>   "director_name" text path '$."name"',
>   "director_birth_year" text path '$."birth_year"',
>   "director_death_year" text path '$."death_year"'
> ),
>   NESTED PATH '$."writers"[*]'
> COLUMNS (
>   "writer_name" text path '$."name"',
>   "writer_birth_year" text path '$."birth_year"',
>   "writer_death_year" text path '$."death_year"'
> ),
>   NESTED PATH '$."ratings"[*]'
> COLUMNS (
>   "rating_average" text path '$."average"',
>   "rating_votes" text path '$."votes"'
> )
> )
>   ) as jt;
>
> again, not particularly complex. But if I run this, it consumes multiple
> gigabytes of memory, before it gets killed by OOM killer. This happens
> even when ran using
>
>   COPY (...) TO '/dev/null'
>
> so there's nothing sent to the client. I did catch memory context info,
> where it looks like this (complete stats attached):
>
> --
> TopMemoryContext: 97696 total in 5 blocks; 13056 free (11 chunks);
>   84640 used
>   ...
>   TopPortalContext: 8192 total in 1 blocks; 7680 free (0 chunks); ...
> PortalContext: 1024 total in 1 blocks; 560 free (0 chunks); ...
>   ExecutorState: 2541764672 total in 314 blocks; 6528176 free
>  (1208 chunks); 2535236496 used
> printtup: 8192 total in 1 blocks; 7952 free (0 chunks); ...
> ...
> ...
> Grand total: 2544132336 bytes in 528 blocks; 7484504 free
>  (1340 chunks); 2536647832 used
> --
>
> I'd say 2.5GB in ExecutorState seems a bit excessive ... Seems there's
> some memory management issue? My guess is we're not releasing memory
> allocated while parsing the JSON or building JSON output.
>
> I'm not attaching the data, but I can provide that if needed - it's
> about 600MB compressed. The structure is not particularly complex, it's
> movie info from [1] combined into a JSON document (one per movie).

Thanks for the report.

Yeah, I'd like to see the data to try to drill down into what's piling
up in ExecutorState.  I want to be sure of if the 1st, query functions
patch, is not implicated in this, because I'd like to get that one out
of the way sooner than later.

-- 
Thanks, Amit Langote




Re: Running the fdw test from the terminal crashes into the core-dump

2024-02-18 Thread Amit Langote
On Mon, Feb 19, 2024 at 4:44 AM Alvaro Herrera  wrote:
> On 2024-Feb-18, Tom Lane wrote:
>
> > So I'd blame this on faulty handling of the zero-partitions case in
> > the RTEPermissionInfo refactoring.  (I didn't bisect to prove that
> > a61b1f748 is exactly where it broke, but I do see that the query
> > successfully does nothing in v15.)
>
> You're right, this is the commit that broke it.  It's unclear to me if
> Amit is available to look at it, so I'll give this a look tomorrow if he
> isn't.

I'll look at this today.

-- 
Thanks, Amit Langote




Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Amit Langote
On Fri, Jan 26, 2024 at 0:15 Tom Lane  wrote:

> Amit Langote  writes:
> >  On Thu, Jan 25, 2024 at 23:57 Tom Lane  wrote:
> >> -1 please.  We should not break that abstraction for the sake
> >> of ignorable warnings on ancient compilers.
>
> > Ignoring the warning was my 1st thought too, because an old discussion I
> > found about the warning was too old (2011).  The style I adopted in my
> > “fix” is used in a few other places too, so I thought I might as well
> > go for it.
>
> Oh, well maybe I'm missing some context.  What comparable places did
> you see?


Sorry, not on my computer right now so can’t paste any code, but I was able
to find a couple of functions (using Google ;) that refer to error_occurred
directly for one reason or another:

process_integer_literal(),
xml_is_document()


Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Amit Langote
 On Thu, Jan 25, 2024 at 23:57 Tom Lane  wrote:

> Amit Langote  writes:
> > On Thu, Jan 25, 2024 at 2:59 PM Richard Guo 
> wrote:
> >> I came across a warning when building master (a044e61f1b) on old GCC
> >> (4.8.5).
>
> > Will apply the attached, which does this:
>
> > -   return BoolGetDatum(!SOFT_ERROR_OCCURRED());
> > +   return BoolGetDatum(!escontext.error_occurred);
>
> -1 please.  We should not break that abstraction for the sake
> of ignorable warnings on ancient compilers.


Ignoring the warning was my 1st thought too, because an old discussion I
found about the warning was too old (2011).  The style I adopted in my
“fix” is used in a few other places too, so I thought I might as well
go for it.

Anyway, I’m fine with reverting.

>


Re: remaining sql/json patches

2024-01-25 Thread Amit Langote
On Thu, Jan 25, 2024 at 6:09 PM Amit Langote  wrote:
> On Wed, Jan 24, 2024 at 10:11 PM Amit Langote  wrote:
> > I still need to take a look at your other report regarding typmod but
> > I'm out of energy today.
>
> The attached updated patch should address one of the concerns --
> JSON_QUERY() should now work appropriately with RETURNING type with
> typmod whether or  OMIT QUOTES is specified.
>
> But I wasn't able to address the problems with RETURNING
> record_type_with_typmod, that is, the following example you shared
> upthread:
>
> create domain char3_domain_not_null as char(3) NOT NULL;
> create domain hello as text not null check (value = 'hello');
> create domain int42 as int check (value = 42);
> create type comp_domain_with_typmod AS (a char3_domain_not_null, b int42);
> select json_value(jsonb'{"rec": "(abcd,42)"}', '$.rec' returning
> comp_domain_with_typmod);
>  json_value
> 
>
> (1 row)
>
> select json_value(jsonb'{"rec": "(abcd,42)"}', '$.rec' returning
> comp_domain_with_typmod error on error);
> ERROR:  value too long for type character(3)
>
> select json_value(jsonb'{"rec": "abcd"}', '$.rec' returning
> char3_domain_not_null error on error);
>  json_value
> 
>  abc
> (1 row)
>
> The problem with returning comp_domain_with_typmod from json_value()
> seems to be that it's using a text-to-record CoerceViaIO expression
> picked from JsonExpr.item_coercions, which behaves differently than
> the expression tree that the following uses:
>
> select ('abcd', 42)::comp_domain_with_typmod;
>row
> --
>  (abc,42)
> (1 row)

Oh, it hadn't occurred to me to check what trying to coerce a "string"
containing the record literal would do:

select '(''abcd'', 42)'::comp_domain_with_typmod;
ERROR:  value too long for type character(3)
LINE 1: select '(''abcd'', 42)'::comp_domain_with_typmod;

which is the same thing as what the JSON_QUERY() and JSON_VALUE() are
running into.  So, it might be fair to think that the error is not a
limitation of the SQL/JSON patch but an underlying behavior that it
has to accept as is.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Amit Langote
On Thu, Jan 25, 2024 at 4:47 PM Richard Guo  wrote:
> On Thu, Jan 25, 2024 at 2:28 PM Amit Langote  wrote:
>>
>> On Thu, Jan 25, 2024 at 2:59 PM Richard Guo  wrote:
>> > I came across a warning when building master (a044e61f1b) on old GCC
>> > (4.8.5).
>> >
>> > jsonfuncs.c: In function ‘jsonb_populate_record_valid’:
>> > ../../../../src/include/nodes/miscnodes.h:53:15: warning: the comparison 
>> > will always evaluate as ‘true’ for the address of ‘escontext’ will never 
>> > be NULL [-Waddress]
>> >   ((escontext) != NULL && IsA(escontext, ErrorSaveContext) && \
>> >^
>> > jsonfuncs.c:2481:23: note: in expansion of macro ‘SOFT_ERROR_OCCURRED’
>> >   return BoolGetDatum(!SOFT_ERROR_OCCURRED());
>> >
>> > This was introduced in commit 1edb3b491b, and can be observed on several
>> > systems in the buildfarm too, such as:
>> > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana=2024-01-25%2004%3A54%3A38=build
>>
>> Thanks for the report.
>>
>> Will apply the attached, which does this:
>>
>> -   return BoolGetDatum(!SOFT_ERROR_OCCURRED());
>> +   return BoolGetDatum(!escontext.error_occurred);
>
> Looks good to me.

Thanks for checking, pushed.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: A compiling warning in jsonb_populate_record_valid

2024-01-24 Thread Amit Langote
Hi,

On Thu, Jan 25, 2024 at 2:59 PM Richard Guo  wrote:
>
> I came across a warning when building master (a044e61f1b) on old GCC
> (4.8.5).
>
> jsonfuncs.c: In function ‘jsonb_populate_record_valid’:
> ../../../../src/include/nodes/miscnodes.h:53:15: warning: the comparison will 
> always evaluate as ‘true’ for the address of ‘escontext’ will never be NULL 
> [-Waddress]
>   ((escontext) != NULL && IsA(escontext, ErrorSaveContext) && \
>^
> jsonfuncs.c:2481:23: note: in expansion of macro ‘SOFT_ERROR_OCCURRED’
>   return BoolGetDatum(!SOFT_ERROR_OCCURRED());
>
> This was introduced in commit 1edb3b491b, and can be observed on several
> systems in the buildfarm too, such as:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana=2024-01-25%2004%3A54%3A38=build

Thanks for the report.

Will apply the attached, which does this:

-   return BoolGetDatum(!SOFT_ERROR_OCCURRED());
+   return BoolGetDatum(!escontext.error_occurred);

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


0001-Silence-compiler-warning-introduced-in-1edb3b491b.patch
Description: Binary data


Re: remaining sql/json patches

2024-01-18 Thread Amit Langote
On Fri, Jan 19, 2024 at 2:11 AM Alvaro Herrera  wrote:
> On 2024-Jan-18, Alvaro Herrera wrote:
>
> > commands/explain.c (Hmm, I think this is a preexisting bug actually)
> >
> > 3893  18 : case T_TableFuncScan:
> > 3894  18 : Assert(rte->rtekind == RTE_TABLEFUNC);
> > 3895  18 : if (rte->tablefunc)
> > 3896   0 : if (rte->tablefunc->functype == 
> > TFT_XMLTABLE)
> > 3897   0 : objectname = "xmltable";
> > 3898 : else/* Must be 
> > TFT_JSON_TABLE */
> > 3899   0 : objectname = "json_table";
> > 3900 : else
> > 3901  18 : objectname = NULL;
> > 3902  18 : objecttag = "Table Function Name";
> > 3903  18 : break;
>
> Indeed -- the problem seems to be that add_rte_to_flat_rtable is
> creating a new RTE and zaps the ->tablefunc pointer for it.  So when
> EXPLAIN goes to examine the struct, there's a NULL pointer there and
> nothing is printed.

Ah yes.

> One simple fix is to change add_rte_to_flat_rtable so that it doesn't
> zero out the tablefunc pointer, but this is straight against what that
> function is trying to do, namely to remove substructure.

Yes.

>  Which means
> that we need to preserve the name somewhere else.  I added a new member
> to RangeTblEntry for this, which perhaps is a little ugly.  So here's
> the patch for that.
>
>  (I also added an alias to one XMLTABLE invocation
> under EXPLAIN, to show what it looks like when an alias is specified.
> Otherwise they're always shown as "XMLTABLE" "xmltable" which is a bit
> dumb).

Thanks for the patch.  Seems alright to me.

> Another possible way out is to decide that we don't want the
> "objectname" to be reported here.  I admit it's perhaps redundant.  In
> this case we'd just remove lines 3896-3899 shown above and let it be
> NULL.

Showing the function's name spelled out in the query (XMLTABLE /
JSON_TABLE) seems fine to me, even though maybe a bit redundant, yes.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2024-01-16 Thread Amit Langote
Hi,

Thought I'd share an update.

I've been going through Jian He's comments (thanks for the reviews!),
most of which affect the last JSON_TABLE() patch and in some cases the
query functions patch (0007).  It seems I'll need to spend a little
more time, especially on the JSON_TABLE() patch, as I'm finding things
to improve other than those mentioned in the comments.

As for the preliminary patches 0001-0006, I'm thinking that it would
be a good idea to get them out of the way sooner rather than waiting
till the main patches are in perfect shape.  I'd like to get them
committed by next week after a bit of polishing, so if anyone would
like to take a look, please let me know.  I'll post a new set
tomorrow.

0007, the query functions patch, also looks close to ready, though I
might need to change a few things in it as I work through the
JSON_TABLE() changes.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-12-14 Thread Amit Langote
On Sat, Dec 9, 2023 at 2:05 PM jian he  wrote:
> Hi.

Thanks for the review.

> function JsonPathExecResult comment needs to be refactored? since it
> changed a lot.

I suppose you meant executeJsonPath()'s comment.  I've added a
description of the new callback function arguments.

On Wed, Dec 13, 2023 at 6:59 PM jian he  wrote:
> Hi. small issues I found...
>
> typo:
> +-- Test mutabilily od query functions

Fixed.

>
> + default:
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("only datetime, bool, numeric, and text types can be casted
> to jsonpath types")));
>
> transformJsonPassingArgs's function: transformJsonValueExpr will make
> the above code unreached.

It's good to have the ereport to catch errors caused by any future changes.

> also based on the `switch (typid)` cases,
> I guess best message would be
> errmsg("only datetime, bool, numeric, text, json, jsonb types can be
> casted to jsonpath types")));

I've rewritten the message to mention the unsupported type.  Maybe the
supported types can go in a DETAIL message.  I might do that later.

> + case JSON_QUERY_OP:
> + jsexpr->wrapper = func->wrapper;
> + jsexpr->omit_quotes = (func->quotes == JS_QUOTES_OMIT);
> +
> + if (!OidIsValid(jsexpr->returning->typid))
> + {
> + JsonReturning *ret = jsexpr->returning;
> +
> + ret->typid = JsonFuncExprDefaultReturnType(jsexpr);
> + ret->typmod = -1;
> + }
> + jsexpr->result_coercion = coerceJsonFuncExprOutput(pstate, jsexpr);
>
> I noticed, if (!OidIsValid(jsexpr->returning->typid)) is the true
> function JsonFuncExprDefaultReturnType may be called twice, not sure
> if it's good or not..

If avoiding the double-calling means that we've to add more conditions
in the code, I'm fine with leaving this as-is.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-12-14 Thread Amit Langote
On Sat, Dec 9, 2023 at 2:05 AM Andrew Dunstan  wrote:
> On 2023-12-08 Fr 11:37, Robert Haas wrote:
> > On Fri, Dec 8, 2023 at 1:59 AM Amit Langote  wrote:
> >> Would it be messy to replace the lookahead approach by whatever's
> >> suiable *in the future* when it becomes necessary to do so?
> > It might be. Changing grammar rules to tends to change corner-case
> > behavior if nothing else. We're best off picking the approach that we
> > think is correct long term.
>
> All this makes me wonder if Alvaro's first suggested solution (adding
> NESTED to the UNBOUNDED precedence level) wouldn't be better after all.

I've done just that in the latest v32.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-12-07 Thread Amit Langote
On Fri, Dec 8, 2023 at 2:19 AM Robert Haas  wrote:
> On Wed, Dec 6, 2023 at 10:26 AM Alvaro Herrera  
> wrote:
> > > I think I'm inclined toward adapting the LA-token fix (attached 0005),
> > > because we've done that before with SQL/JSON constructors patch.
> > > Also, if I understand the concerns that Tom mentioned at [1]
> > > correctly, maybe we'd be better off not assigning precedence to
> > > symbols as much as possible, so there's that too against the approach
> > > #1.
> >
> > Sounds ok to me, but I'm happy for this decision to be overridden by
> > others with more experience in parser code.
>
> In my experience, the lookahead solution is typically suitable when
> the keywords involved aren't used very much in other parts of the
> grammar. I think the situation that basically gets you into trouble is
> if there's some way to have a situation where NESTED shouldn't be
> changed to NESTED_LA when PATH immediately follows. For example, if
> NESTED could be used like DISTINCT in a SELECT query:
>
> SELECT DISTINCT a, b, c FROM whatever
>
> ...then that would be a strong indication in my mind that we shouldn't
> use the lookahead solution, because what if you substitute "path" for
> "a"? Now you have a mess.
>
> I haven't gone over the grammar changes in a lot of detail so I'm not
> sure how much risk there is here. It looks to me like there's some
> syntax that goes NESTED [PATH] 'literal string', and if that were the
> only use of NESTED or PATH then I think we'd be completely fine. I see
> that PATH b_expr also gets added to xmltable_column_option_el, and
> that's a little more worrying, because you don't really want to see
> keywords that are used for special lookahead rules in places where
> they can creep into general expressions, but it seems like it might
> still be OK as long as NESTED doesn't also start to get used in other
> places. If you ever create a situation where NESTED can bump up
> against PATH without wanting that to turn into NESTED_LA PATH, then I
> think it's likely that this whole approach will unravel. As long as we
> don't think that will ever happen, I think it's probably OK. If we do
> think it's going to happen, then we should probably grit our teeth and
> use precedence.

Would it be messy to replace the lookahead approach by whatever's
suiable *in the future* when it becomes necessary to do so?


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-12-07 Thread Amit Langote
On Thu, Dec 7, 2023 at 7:39 PM jian he  wrote:
> based on:
> json_behavior_clause_opt:
> json_behavior ON EMPTY_P
> { $$ = list_make2($1, NULL); }
> | json_behavior ON ERROR_P
> { $$ = list_make2(NULL, $1); }
> | json_behavior ON EMPTY_P json_behavior ON ERROR_P
> { $$ = list_make2($1, $4); }
> | /* EMPTY */
> { $$ = list_make2(NULL, NULL); }
> ;
> so
> + if (func->behavior)
> + {
> + on_empty = linitial(func->behavior);
> + on_error = lsecond(func->behavior);
> + }

Yeah, maybe.

> `if (func->behavior)` will always be true?
> By the way, in the above "{ $$ = list_make2($1, $4); }" what does $4
> refer to? (I don't know gram.y)

$1 and $4 refer to the 1st and 4th symbols in the following:

json_behavior ON EMPTY_P json_behavior ON ERROR_P

So $1 gives the json_behavior (JsonBehavior) node for ON EMPTY and $4
gives that for ON ERROR.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-11-27 Thread Amit Langote
On Fri, Nov 24, 2023 at 9:28 PM Alvaro Herrera  wrote:
> Some quick grepping gave me this table,
>
>  YYLAST  YYNTOKENS  YYNNTS   YYNRULES   YYNSTATES  YYMAXUTOK
> REL9_1_STABLE 69680429 546   22184179666
> REL9_2_STABLE 73834432 546   22614301669
> REL9_3_STABLE 77969437 558   23224471674
> REL9_4_STABLE 79419442 576   23694591679
> REL9_5_STABLE 92495456 612   24904946693
> REL9_6_STABLE 92660459 618   25155006696
> REL_10_STABLE 99601472 653   26635323709
> REL_11_STABLE102007480 668   27285477717
> REL_12_STABLE103948482 667   27245488719
> REL_13_STABLE104224492 673   27605558729
> REL_14_STABLE108111503 676   31595980740
> REL_15_STABLE111091506 688   32066090743
> REL_16_STABLE115435519 706   32836221756
> master   117115521 707   33006255758
> master+v26   121817537 738   34156470774
>
> and the attached chart.  (v26 is with all patches applied, including the
> JSON_TABLE one whose grammar has not yet been fully tweaked.)

Thanks for the chart.

> So, while the jump from v26 is not a trivial one, it seems within
> reasonable bounds.

Agreed.

>  For example, the jump between 13 and 14 looks worse.
> (I do wonder what happened there.)

The following commit sounds like it might be related?

commit 06a7c3154f5bfad65549810cc84f0e3a77b408bf
Author: Tom Lane 
Date:   Fri Sep 18 16:46:26 2020 -0400

Allow most keywords to be used as column labels without requiring AS.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-11-24 Thread Amit Langote
On Thu, Nov 23, 2023 at 4:38 AM Andres Freund  wrote:
> On 2023-11-21 12:52:35 +0900, Amit Langote wrote:
> > version gram.o text bytes  %change  gram.c bytes  %change
> >
> > 9.6 534010 -2108984   -
> > 10  582554 9.09 2258313   7.08
> > 11  584596 0.35 2313475   2.44
> > 12  590957 1.08 2341564   1.21
> > 13  590381-0.09 2357327   0.67
> > 14  600707 1.74 2428841   3.03
> > 15  633180 5.40 2495364   2.73
> > 16  653464 3.20 2575269   3.20
> > 17-sqljson  672800 2.95 2709422   3.97
> >
> > So if we put SQL/JSON (including JSON_TABLE()) into 17, we end up with a 
> > gram.o 2.95% larger than v16, which granted is a somewhat larger bump, 
> > though also smaller than with some of recent releases.
>
> I think it's ok to increase the size if it's necessary increases - but I also
> think we've been a bit careless at times, and that that has made the parser
> slower.  There's probably also some "infrastructure" work we could do combat
> some of the growth too.
>
> I know I triggered the use of the .c bytes and text size, but it'd probably
> more sensible to look at the size of the important tables generated by bison.
> I think the most relevant defines are:
>
> #define YYLAST   117115
> #define YYNTOKENS  521
> #define YYNNTS  707
> #define YYNRULES  3300
> #define YYNSTATES  6255
> #define YYMAXUTOK   758
>
>
> I think a lot of the reason we end up with such a big "state transition" space
> is that a single addition to e.g. col_name_keyword or unreserved_keyword
> increases the state space substantially, because it adds new transitions to so
> many places. We're in quadratic territory, I think.  We might be able to do
> some lexer hackery to avoid that, but not sure.

One thing I noticed when looking at the raw parsing times across
versions is that they improved a bit around v12 and then some in v13:

9.00.60 s
9.60.61 s
10 0.61 s
11 0.63 s
12 0.55 s
13 0.54 s
15 0.57 s
16 0.59 s

I think they might be due to the following commits in v12 and v13 resp.:

commit c64d0cd5ce24a344798534f1bc5827a9199b7a6e
Author: Tom Lane 
Date:   Wed Jan 9 19:47:38 2019 -0500
Use perfect hashing, instead of binary search, for keyword lookup.
...
Discussion: https://postgr.es/m/20190103163340.ga15...@britannica.bec.de

commit 7f380c59f800f7e0fb49f45a6ff7787256851a59
Author: Tom Lane 
Date:   Mon Jan 13 15:04:31 2020 -0500
Reduce size of backend scanner's tables.
...
Discussion:
https://postgr.es/m/cacpnzcvaoa3egvwm5yzhcstx6ratalgnicpcbvocwm8h3xp...@mail.gmail.com

I haven't read the whole discussions there to see if the target(s)
included the metrics you've mentioned though, either directly or
indirectly.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-11-22 Thread Amit Langote
On Wed, Nov 22, 2023 at 3:09 PM Amit Langote  wrote:
> The last line in the chart I sent in the last email now look like this:
>
> 17-sqljson  670262 2.57 2640912   1.34
>
> meaning the gram.o text size changes by 2.57% as opposed to 2.97%
> before your fixes.

Andrew asked off-list what the percent increase is compared to 17dev
HEAD.  It's 1.27% (was 1.66% with the previous version).

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-11-22 Thread Amit Langote
On Fri, Nov 17, 2023 at 6:40 PM Alvaro Herrera  wrote:
> On 2023-Nov-17, Amit Langote wrote:
>
> > On Fri, Nov 17, 2023 at 4:27 PM jian he  wrote:
>
> > > some enum declaration, ending element need an extra comma?
> >
> > Didn't know about the convention to have that comma, but I can see it
> > is present in most enum definitions.
>
> It's new.  See commit 611806cd726f.

I see, thanks.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-11-22 Thread Amit Langote
On Wed, Nov 22, 2023 at 4:37 PM Andres Freund  wrote:
> On 2023-11-22 15:09:36 +0900, Amit Langote wrote:
> > OK, I will keep polishing 0001-0003 with the intent to push it next
> > week barring objections / damning findings.
>
> I don't think the patchset is quite there yet. It's definitely getting closer
> though!  I'll try to do another review next week.

That would be great, thank you.  I'll post an update on Friday.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-11-20 Thread Amit Langote
On Nov 16, 2023, at 17:48, Amit Langote  wrote:
> On Thu, Nov 16, 2023 at 2:11 AM Andres Freund  wrote:
>> On 2023-11-15 22:00:41 +0900, Amit Langote wrote:
>>>> This causes a nontrivial increase in the size of the parser (~5% in an
>>>> optimized build here), I wonder if we can do better.
>>> 
>>> Hmm, sorry if I sound ignorant but what do you mean by the parser here?
>> 
>> gram.o, in an optimized build.
>> 
>>> I can see that the byte-size of gram.o increases by 1.66% after the
>>> above additions  (1.72% with previous versions).
>> 
>> I'm not sure anymore how I measured it, but if you just looked at the total
>> file size, that might not show the full gain, because of debug symbols
>> etc. You can use the size command to look at just the code and data size.
> 
> $ size /tmp/gram.*
>   textdata bss dec hex filename
> 661808   0   0  661808   a1930 /tmp/gram.o.unpatched
> 672800   0   0  672800   a4420 /tmp/gram.o.patched
> 
> That's still a 1.66% increase in the code size:
> 
> (672800 - 661808) / 661808 % = 1.66
> 
> As for gram.c, the increase is a bit larger:
> 
> $ ll /tmp/gram.*
> -rw-rw-r--. 1 amit amit 2605925 Nov 16 16:18 /tmp/gram.c.unpatched
> -rw-rw-r--. 1 amit amit 2709422 Nov 16 16:22 /tmp/gram.c.patched
> 
> (2709422 - 2605925) / 2605925 % = 3.97
> 
>>> I've also checked
>>> using log_parser_stats that there isn't much slowdown in the
>>> raw-parsing speed.
>> 
>> What does "isn't much slowdown" mean in numbers?
> 
> Sure, the benchmark I used measured the elapsed time (using
> log_parser_stats) of parsing a simple select statement (*) averaged
> over 1 repetitions of the same query performed with `psql -c`:
> 
> Unpatched: 0.61 seconds
> Patched: 0.61 seconds
> 
> Here's a look at the perf:
> 
> Unpatched:
>   0.59%  [.] AllocSetAlloc
>   0.51%  [.] hash_search_with_hash_value
>   0.47%  [.] base_yyparse
> 
> Patched:
>   0.63%  [.] AllocSetAlloc
>   0.52%  [.] hash_search_with_hash_value
>   0.44%  [.] base_yyparse
> 
> (*) select 1, 2, 3 from foo where a = 1
> 
> Is there a more relevant benchmark I could use?

Thought I’d share a few more numbers I collected to analyze the parser size 
increase over releases.

* gram.o text bytes is from the output of `size gram.o`.
* compiled with -O3

version gram.o text bytes  %change  gram.c bytes  %change

9.6 534010 -2108984   -
10  582554 9.09 2258313   7.08
11  584596 0.35 2313475   2.44
12  590957 1.08 2341564   1.21
13  590381-0.09 2357327   0.67
14  600707 1.74 2428841   3.03
15  633180 5.40 2495364   2.73
16  653464 3.20 2575269   3.20
17-sqljson  672800 2.95 2709422   3.97

So if we put SQL/JSON (including JSON_TABLE()) into 17, we end up with a gram.o 
2.95% larger than v16, which granted is a somewhat larger bump, though also 
smaller than with some of recent releases.


> --

Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: remaining sql/json patches

2023-11-17 Thread Amit Langote
On Fri, Nov 17, 2023 at 4:27 PM jian he  wrote:
> hi.
> minor issues.

Thanks for checking.

> In transformJsonFuncExpr(ParseState *pstate, JsonFuncExpr *func)
> func.behavior->on_empty->location and
> func.behavior->on_error->location are correct.
> but in ExecInitJsonExpr, jsestate->jsexpr->on_empty->location is -1,
> jsestate->jsexpr->on_error->location is -1.
> Maybe we can preserve these on_empty, on_error token locations in
> transformJsonBehavior.

Sure.

> some enum declaration, ending element need an extra comma?

Didn't know about the convention to have that comma, but I can see it
is present in most enum definitions.

Changed all enums that the patch adds to conform.

> + /*
> + * ExecEvalJsonExprPath() will set this to the address of the step to
> + * use to coerce the result of JsonPath* evaluation to the RETURNING
> + * type.  Also see the description of possible step addresses this
> + * could be set to in the definition of JsonExprState.ZZ
> + */
>
> "ZZ", typo?

Indeed.

Will include the fixes in the next version.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-11-16 Thread Amit Langote
On Thu, Nov 16, 2023 at 2:11 AM Andres Freund  wrote:
> On 2023-11-15 22:00:41 +0900, Amit Langote wrote:
> > > This causes a nontrivial increase in the size of the parser (~5% in an
> > > optimized build here), I wonder if we can do better.
> >
> > Hmm, sorry if I sound ignorant but what do you mean by the parser here?
>
> gram.o, in an optimized build.
>
> > I can see that the byte-size of gram.o increases by 1.66% after the
> > above additions  (1.72% with previous versions).
>
> I'm not sure anymore how I measured it, but if you just looked at the total
> file size, that might not show the full gain, because of debug symbols
> etc. You can use the size command to look at just the code and data size.

$ size /tmp/gram.*
   textdata bss dec hex filename
 661808   0   0  661808   a1930 /tmp/gram.o.unpatched
 672800   0   0  672800   a4420 /tmp/gram.o.patched

That's still a 1.66% increase in the code size:

(672800 - 661808) / 661808 % = 1.66

As for gram.c, the increase is a bit larger:

$ ll /tmp/gram.*
-rw-rw-r--. 1 amit amit 2605925 Nov 16 16:18 /tmp/gram.c.unpatched
-rw-rw-r--. 1 amit amit 2709422 Nov 16 16:22 /tmp/gram.c.patched

(2709422 - 2605925) / 2605925 % = 3.97

> > I've also checked
> > using log_parser_stats that there isn't much slowdown in the
> > raw-parsing speed.
>
> What does "isn't much slowdown" mean in numbers?

Sure, the benchmark I used measured the elapsed time (using
log_parser_stats) of parsing a simple select statement (*) averaged
over 1 repetitions of the same query performed with `psql -c`:

Unpatched: 0.61 seconds
Patched: 0.61 seconds

Here's a look at the perf:

Unpatched:
   0.59%  [.] AllocSetAlloc
   0.51%  [.] hash_search_with_hash_value
   0.47%  [.] base_yyparse

Patched:
   0.63%  [.] AllocSetAlloc
   0.52%  [.] hash_search_with_hash_value
   0.44%  [.] base_yyparse

(*) select 1, 2, 3 from foo where a = 1

Is there a more relevant benchmark I could use?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-11-15 Thread Amit Langote
Hi Erik,

On Thu, Nov 16, 2023 at 13:52 Erik Rijkers  wrote:

> Op 11/15/23 om 14:00 schreef Amit Langote:
> > Hi,
>
> [..]
>
> > Attached updated patch.  The version of 0001 that I posted on Oct 11
> > to add the error-safe version of CoerceViaIO contained many
> > unnecessary bits that are now removed.
> >
> > --
> > Thanks, Amit Langote
> > EDB: http://www.enterprisedb.com
>
>  > [v24-0001-Add-soft-error-handling-to-some-expression-nodes.patch]
>  > [v24-0002-Add-soft-error-handling-to-populate_record_field.patch]
>  > [v24-0003-SQL-JSON-query-functions.patch]
>  > [v24-0004-JSON_TABLE.patch]
>  > [v24-0005-Claim-SQL-standard-compliance-for-SQL-JSON-featu.patch]
>
> Hi Amit,
>
> Here is a statement that seems to gobble up all memory and to totally
> lock up the machine. No ctrl-C - only a power reset gets me out of that.
> It was in one of my tests, so it used to work:
>
> select json_query(
>  jsonb '"[3,4]"'
>, '$[*]' returning bigint[] empty object on error
> );
>
> Can you have a look?


Wow, will look. Thanks.

>


Re: remaining sql/json patches

2023-11-10 Thread Amit Langote
Hi Erik,

On Sat, Nov 11, 2023 at 11:52 Erik Rijkers  wrote:

> Hi,
>
> At the moment, what is the patchset to be tested?  The latest SQL/JSON
> server I have is from September, and it's become unclear to me what
> belongs to the SQL/JSON patchset.  It seems to me cfbot erroneously
> shows green because it successfully compiles later detail-patches (i.e.,
> not the SQL/JSON set itself). Please correct me if I'm wrong and it is
> in fact possible to derive from cfbot a patchset that are the ones to
> use to build the latest SQL/JSON server.


I’ll be posting a new set that addresses Andres’ comments early next week.

>


Re: remaining sql/json patches

2023-10-26 Thread Amit Langote
Hi,

On Thu, Oct 26, 2023 at 9:20 PM Nikita Malakhov  wrote:
>
> Hi,
>
> The main goal was to correctly process invalid queries (as in examples above).
> I'm not sure this could be done in type input functions. I thought that some
> coercions could be checked before evaluating expressions for saving reasons.

I assume by "invalid" you mean queries specifying types in RETURNING
that don't support soft-error handling in their input function.
Adding a check makes sense but its implementation should include a
type cache interface to check whether a given type has error-safe
input handling, possibly as a separate patch.  IOW, the SQL/JSON patch
shouldn't really make a list of types to report as unsupported.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-10-25 Thread Amit Langote
Hi Nikita,

On Thu, Oct 26, 2023 at 2:13 AM Nikita Malakhov  wrote:
> Amit, on previous email, patch #2 - I agree that it is not the best idea to 
> introduce
> new type of logic into the parser, so this logic could be moved to the 
> executor,
> or removed at all. What do you think of these options?

Yes maybe, though I'd first like to have a good answer to why is that
logic necessary at all.  Maybe you think it's better to emit an error
in the SQL/JSON layer of code than in the type input function if it's
unsafe?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: run pgindent on a regular basis / scripted manner

2023-10-23 Thread Amit Langote
On Tue, Oct 24, 2023 at 9:51 Jeff Davis  wrote:

> On Wed, 2023-10-18 at 22:34 +1300, David Rowley wrote:
> > It would be good to learn how many of the committers out of the ones
> > you listed that --enable-indent-checks would have saved from breaking
> > koel.
>
> I'd find that a useful option.


+1.  While I’ve made it part of routine to keep my local work pgindented
since breaking Joel once, an option like this would still be useful.

>


Re: remaining sql/json patches

2023-10-17 Thread Amit Langote
On Mon, Oct 16, 2023 at 5:21 PM Nikita Malakhov  wrote:
>
> Hi!
>
> With the latest set of patches we encountered failure with the following 
> query:
>
> postgres@postgres=# SELECT JSON_QUERY(jsonpath '"aaa"', '$' RETURNING text);
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.
> Time: 11.165 ms
>
> A colleague of mine, Anton Melnikov, proposed the following changes which 
> slightly
> alter coercion functions to process this kind of error correctly.
>
> Please check attached patch set.

Thanks for the patches.

I think I understand patch 1. It makes each of JSON_{QUERY | VALUE |
EXISTS}() use FORMAT JSON for the context item by default, which I
think is the correct behavior.

As for patch 2, maybe the executor part is fine, but I'm not so sure
about the parser part.  Could you please explain why you think the
parser must check error-safety of the target type for allowing IO
coercion for non-ERROR behaviors?

Even if we consider that that's what should be done, it doesn't seem
like a good idea for the parser to implement its own logic for
determining error-safety.  IOW, the parser should really be using some
type cache API.  I thought there might have been a flag in pg_proc
(prosafe) or pg_type (typinsafe), but apparently there isn't.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-10-17 Thread Amit Langote
Hi Anton,

On Tue, Oct 17, 2023 at 4:11 PM Anton A. Melnikov
 wrote:
> On 17.10.2023 07:02, Amit Langote wrote:
>
> > One thing jian he missed during the debugging is that
> > ExecEvalJsonExprCoersion() receives the EMPTY ARRAY value via
> > *op->resvalue/resnull, set by ExecEvalJsonExprBehavior(), because
> > that's the ON EMPTY behavior specified in the constraint.  The bug was
> > that the code in ExecEvalJsonExprCoercion() failed to set val_string
> > to that value ("[]") before passing to InputFunctionCallSafe(), so the
> > latter would assume the input is NULL.
> >
> Thank a lot for this remark!
>
> I tried to dig to the transformJsonOutput() to fix it earlier at the analyze 
> stage,
> but it looks like a rather hard way.

Indeed.  As I said, the problem was a bug in ExecEvalJsonExprCoercion().

>
> Maybe simple in accordance with you note remove the second condition from 
> this line:
> if (jb && JB_ROOT_IS_SCALAR(jb)) ?

Yeah, that's how I would fix it.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-10-16 Thread Amit Langote
On Mon, Oct 16, 2023 at 10:44 PM Anton A. Melnikov
 wrote:
> On 16.10.2023 15:49, jian he wrote:
> > add the following code after ExecEvalJsonExprCoercion if
> > (!InputFunctionCallSafe(...) works, but seems like a hack.
> >
> > if (!val_string)
> > {
> > *op->resnull = true;
> > *op->resvalue = (Datum) 0;
> > }
>
> It seems the constraint should work here:
>
> After
>
> CREATE TABLE test (
> js text,
> i int,
> x jsonb DEFAULT JSON_QUERY(jsonb '[1,2]', '$[*]' WITH WRAPPER)
> CONSTRAINT test_constraint
> CHECK (JSON_QUERY(js::jsonb, '$.a' RETURNING char(5) OMIT 
> QUOTES EMPTY ARRAY ON EMPTY) > 'a')
> );
>
> INSERT INTO test_jsonb_constraints VALUES ('[]');
>
> one expected to see an error like that:
>
> ERROR:  new row for relation "test" violates check constraint 
> "test_constraint"
> DETAIL:  Failing row contains ([], null, [1, 2]).
>
> not "INSERT 0 1"

Yes, the correct thing here is for the constraint to fail.

One thing jian he missed during the debugging is that
ExecEvalJsonExprCoersion() receives the EMPTY ARRAY value via
*op->resvalue/resnull, set by ExecEvalJsonExprBehavior(), because
that's the ON EMPTY behavior specified in the constraint.  The bug was
that the code in ExecEvalJsonExprCoercion() failed to set val_string
to that value ("[]") before passing to InputFunctionCallSafe(), so the
latter would assume the input is NULL.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-10-16 Thread Amit Langote
Hi,

On Mon, Oct 16, 2023 at 5:34 PM Nikita Malakhov  wrote:
>
> Hi,
>
> Also FYI - the following case results in segmentation fault:
>
> postgres@postgres=# CREATE TABLE test_jsonb_constraints (
> js text,
> i int,
> x jsonb DEFAULT JSON_QUERY(jsonb '[1,2]', '$[*]' WITH WRAPPER)
> CONSTRAINT test_jsonb_constraint1
> CHECK (js IS JSON)
> CONSTRAINT test_jsonb_constraint5
> CHECK (JSON_QUERY(js::jsonb, '$.mm' RETURNING char(5) OMIT 
> QUOTES EMPTY ARRAY ON EMPTY) >  'a' COLLATE "C")
> CONSTRAINT test_jsonb_constraint6
> CHECK (JSON_EXISTS(js::jsonb, 'strict $.a' RETURNING int TRUE 
> ON ERROR) < 2)
> );
> CREATE TABLE
> Time: 13.518 ms
> postgres@postgres=# INSERT INTO test_jsonb_constraints VALUES ('[]');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.
> Time: 6.858 ms
> @!>
>
> We're currently looking into this case.

Thanks for the report.  I think I've figured out the problem --
ExecEvalJsonExprCoercion() mishandles the EMPTY ARRAY ON EMPTY case.

I'm reading the other 2 patches...

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-10-11 Thread Amit Langote
On Wed, Oct 11, 2023 at 2:08 PM Amit Langote  wrote:
> On Sat, Oct 7, 2023 at 6:49 AM Andres Freund  wrote:
> > On 2023-09-29 13:57:46 +0900, Amit Langote wrote:
> > > Thanks.  I will push the attached 0001 shortly.
> >
> > Sorry for not looking at this earlier.
>
> Thanks for the review.  Replying here only to your comments on 0001.
>
> > Have you done benchmarking to verify that 0001 does not cause performance
> > regressions? I'd not be suprised if it did.
>
> I found that it indeed did once I benchmarked with something that
> would stress EEOP_IOCOERCE:
>
> do $$
> begin
> for i in 1..2000 loop
> i := i::text;
> end loop; end; $$ language plpgsql;
> DO
>
> Times and perf report:
>
> HEAD:
>
> Time: 1815.824 ms (00:01.816)
> Time: 1818.980 ms (00:01.819)
> Time: 1695.555 ms (00:01.696)
> Time: 1762.022 ms (00:01.762)
>
> --97.49%--exec_stmts
>   |
>   --85.97%--exec_assign_expr
> |
> |--65.56%--exec_eval_expr
> |  |
> |  |--53.71%--ExecInterpExpr
> |  |  |
> |  |  |--14.14%--textin
>
>
> Patched:
>
> Time: 1872.469 ms (00:01.872)
> Time: 1927.371 ms (00:01.927)
> Time: 1910.126 ms (00:01.910)
> Time: 1948.322 ms (00:01.948)
>
> --97.70%--exec_stmts
>   |
>   --88.13%--exec_assign_expr
> |
> |--73.27%--exec_eval_expr
> |  |
> |  |--58.29%--ExecInterpExpr
> |  |  |
> |  |
> |--25.69%--InputFunctionCallSafe
> |  |  |  |
> |  |  |
> |--14.75%--textin
>
> So, yes, putting InputFunctionCallSafe() in the common path may not
> have been such a good idea.
>
> > I'd split the soft-error path into
> > a separate opcode. For JIT it can largely be implemented using the same 
> > code,
> > eliding the check if it's the non-soft path. Or you can just put it into an
> > out-of-line function.
>
> Do you mean putting the execExprInterp.c code for the soft-error path
> (with a new opcode) into an out-of-line function?  That definitely
> makes the JIT version a tad simpler than if the error-checking is done
> in-line.
>
> So, the existing code for EEOP_IOCOERCE in both execExprInterp.c and
> llvmjit_expr.c will remain unchanged.  Also, I can write the code for
> the new opcode such that it doesn't use InputFunctionCallSafe() at
> runtime, but rather passes the ErrorSaveContext directly by putting
> that in the input function's FunctionCallInfo.context and checking
> SOFT_ERROR_OCCURRED() directly.  That will have less overhead.
>
> > I don't like adding more stuff to ExprState. This one seems particularly
> > awkward, because it might be used by more than one level in an expression
> > subtree, which means you really need to save/restore old values when
> > recursing.
>
> Hmm, I'd think that all levels will follow either soft or non-soft
> error mode, so sharing the ErrorSaveContext passed via ExprState
> doesn't look wrong to me.  IOW, there's only one value, not one for
> every level, so there doesn't appear to be any need to have the
> save/restore convention as we have for innermost_domainval et al.
>
> I can see your point that adding another 8 bytes at the end of
> ExprState might be undesirable.  Note though that ExprState.escontext
> is only accessed in the ExecInitExpr phase, but during evaluation.
>
> The alternative to not passing the ErrorSaveContext via ExprState is
> to add a new parameter to ExecInitExprRec() and to functions that call
> it.  The footprint would be much larger though.  Would you rather
> prefer that?
>
> > > @@ -1579,25 +1582,13 @@ ExecInitExprRec(Expr *node, ExprState *state,
> > >
> > >   /* lookup the result type's input function 
> > > */
> > >   scratch.d.iocoerce.finfo_in = 
> > > palloc0(sizeof(FmgrInfo));
> > > - scratch.d.iocoerce.fcinfo_data_in = 
> > > palloc0(SizeForFunctionCallInfo(3));
> > &

Re: remaining sql/json patches

2023-10-10 Thread Amit Langote
Hi Andres,

On Sat, Oct 7, 2023 at 6:49 AM Andres Freund  wrote:
> Hi,
>
> On 2023-09-29 13:57:46 +0900, Amit Langote wrote:
> > Thanks.  I will push the attached 0001 shortly.
>
> Sorry for not looking at this earlier.

Thanks for the review.  Replying here only to your comments on 0001.

> Have you done benchmarking to verify that 0001 does not cause performance
> regressions? I'd not be suprised if it did.

I found that it indeed did once I benchmarked with something that
would stress EEOP_IOCOERCE:

do $$
begin
for i in 1..2000 loop
i := i::text;
end loop; end; $$ language plpgsql;
DO

Times and perf report:

HEAD:

Time: 1815.824 ms (00:01.816)
Time: 1818.980 ms (00:01.819)
Time: 1695.555 ms (00:01.696)
Time: 1762.022 ms (00:01.762)

--97.49%--exec_stmts
  |
  --85.97%--exec_assign_expr
|
|--65.56%--exec_eval_expr
|  |
|  |--53.71%--ExecInterpExpr
|  |  |
|  |  |--14.14%--textin


Patched:

Time: 1872.469 ms (00:01.872)
Time: 1927.371 ms (00:01.927)
Time: 1910.126 ms (00:01.910)
Time: 1948.322 ms (00:01.948)

--97.70%--exec_stmts
  |
  --88.13%--exec_assign_expr
|
|--73.27%--exec_eval_expr
|  |
|  |--58.29%--ExecInterpExpr
|  |  |
|  |
|--25.69%--InputFunctionCallSafe
|  |  |  |
|  |  |
|--14.75%--textin

So, yes, putting InputFunctionCallSafe() in the common path may not
have been such a good idea.

> I'd split the soft-error path into
> a separate opcode. For JIT it can largely be implemented using the same code,
> eliding the check if it's the non-soft path. Or you can just put it into an
> out-of-line function.

Do you mean putting the execExprInterp.c code for the soft-error path
(with a new opcode) into an out-of-line function?  That definitely
makes the JIT version a tad simpler than if the error-checking is done
in-line.

So, the existing code for EEOP_IOCOERCE in both execExprInterp.c and
llvmjit_expr.c will remain unchanged.  Also, I can write the code for
the new opcode such that it doesn't use InputFunctionCallSafe() at
runtime, but rather passes the ErrorSaveContext directly by putting
that in the input function's FunctionCallInfo.context and checking
SOFT_ERROR_OCCURRED() directly.  That will have less overhead.

> I don't like adding more stuff to ExprState. This one seems particularly
> awkward, because it might be used by more than one level in an expression
> subtree, which means you really need to save/restore old values when
> recursing.

Hmm, I'd think that all levels will follow either soft or non-soft
error mode, so sharing the ErrorSaveContext passed via ExprState
doesn't look wrong to me.  IOW, there's only one value, not one for
every level, so there doesn't appear to be any need to have the
save/restore convention as we have for innermost_domainval et al.

I can see your point that adding another 8 bytes at the end of
ExprState might be undesirable.  Note though that ExprState.escontext
is only accessed in the ExecInitExpr phase, but during evaluation.

The alternative to not passing the ErrorSaveContext via ExprState is
to add a new parameter to ExecInitExprRec() and to functions that call
it.  The footprint would be much larger though.  Would you rather
prefer that?

> > @@ -1579,25 +1582,13 @@ ExecInitExprRec(Expr *node, ExprState *state,
> >
> >   /* lookup the result type's input function */
> >   scratch.d.iocoerce.finfo_in = 
> > palloc0(sizeof(FmgrInfo));
> > - scratch.d.iocoerce.fcinfo_data_in = 
> > palloc0(SizeForFunctionCallInfo(3));
> > -
> >   getTypeInputInfo(iocoerce->resulttype,
> > -  , 
> > );
> > +  , 
> > );
> >   fmgr_info(iofunc, 
> > scratch.d.iocoerce.finfo_in);
> >   fmgr_info_set_expr((Node *) node, 
> > scratch.d.iocoerce.finfo_in);
> > - 
> > InitFunc

Re: remaining sql/json patches

2023-10-06 Thread Amit Langote
On Fri, Oct 6, 2023 at 19:01 Alvaro Herrera  wrote:

> On 2023-Oct-06, Amit Langote wrote:
>
> > 2. Assignment of op->d.iocoerce.escontext needed to be changed like this:
> >
> > v_params[4] =
> l_ptr_const(op->d.iocoerce.escontext,
> > -
> > l_ptr(StructErrorSaveContext));
> > + l_ptr(StructNode));
>
> Oh, so you had to go back to using StructNode in order to get this
> fixed?  That's weird.  Is it just because InputFunctionCallSafe is
> defined to take fmNodePtr?  (I still fail to see that a pointer to
> ErrorSaveContext would differ in any material way from a pointer to
> Node).


The difference matters to LLVM’s type system, which considers Node to be a
type with 1 sub-type (struct member) and ErrorSaveContext with 4 sub-types.
It doesn’t seem to understand that both share the first member.


Another think I thought was weird is that it would only crash in LLVM5
> debug and not the other LLVM-enabled animals, but looking closer at the
> buildfarm results, I think that may have been only because you reverted
> too quickly, and phycodorus and petalura didn't actually run with
> 7fbc75b26ed8 before you reverted it.  Dragonet did make a run with it,
> but it's marked as "LLVM optimized" instead of "LLVM debug".  I suppose
> that must be making a difference.


AFAICS, only assert-enabled LLVM builds crash.

>


Re: remaining sql/json patches

2023-10-06 Thread Amit Langote
On Wed, Oct 4, 2023 at 10:26 PM Amit Langote  wrote:
> On Tue, Oct 3, 2023 at 10:11 PM Amit Langote  wrote:
> > On Mon, Oct 2, 2023 at 2:26 PM Amit Langote  wrote:
> > > On Mon, Oct 2, 2023 at 1:24 PM Amit Langote  
> > > wrote:
> > > > Pushed this 30 min ago (no email on -committers yet!) and am looking
> > > > at the following llvm crash reported by buildfarm animal pogona [1]:
> > > > This seems to me to be complaining about the following addition:
> > > >
> > > > +   {
> > > > +   Oid ioparam = op->d.iocoerce.typioparam;
> > > > +   LLVMValueRef v_params[6];
> > > > +   LLVMValueRef v_success;
> > > > +
> > > > +   v_params[0] = 
> > > > l_ptr_const(op->d.iocoerce.finfo_in,
> > > > + 
> > > > l_ptr(StructFmgrInfo));
> > > > +   v_params[1] = v_output;
> > > > +   v_params[2] = l_oid_const(lc, ioparam);
> > > > +   v_params[3] = l_int32_const(lc, -1);
> > > > +   v_params[4] = 
> > > > l_ptr_const(op->d.iocoerce.escontext,
> > > > +
> > > > l_ptr(StructErrorSaveContext));
> > > >
> > > > -   LLVMBuildStore(b, v_retval, v_resvaluep);
> > > > +   /*
> > > > +* InputFunctionCallSafe() will write directly 
> > > > into
> > > > +* *op->resvalue.
> > > > +*/
> > > > +   v_params[5] = v_resvaluep;
> > > > +
> > > > +   v_success = LLVMBuildCall(b, llvm_pg_func(mod,
> > > > "InputFunctionCallSafe"),
> > > > + v_params, 
> > > > lengthof(v_params),
> > > > + 
> > > > "funccall_iocoerce_in_safe");
> > > > +
> > > > +   /*
> > > > +* Return null if InputFunctionCallSafe() 
> > > > encountered
> > > > +* an error.
> > > > +*/
> > > > +   v_resnullp = LLVMBuildICmp(b, LLVMIntEQ, 
> > > > v_success,
> > > > +  l_sbool_const(0), 
> > > > "");
> > > > +   }
> > >
> ...I haven't yet pinpointed down
> which of the LLVM's asserts it is, nor have I been able to walk
> through LLVM source code using gdb to figure what the new code is
> doing wrong.  Maybe I'm still missing a trick or two...

I finally managed to analyze the crash by getting the correct LLVM build.

So the following bits are the culprits:

1. v_output needed to be converted from being reference to a Datum to
be reference to char * as follows before passing to
InputFunctionCallSafe():

-   v_params[1] = v_output;
+   v_params[1] = LLVMBuildIntToPtr(b, v_output,
+
l_ptr(LLVMInt8TypeInContext(lc)),
+   "");

2. Assignment of op->d.iocoerce.escontext needed to be changed like this:

v_params[4] = l_ptr_const(op->d.iocoerce.escontext,
-
l_ptr(StructErrorSaveContext));
+ l_ptr(StructNode));

3. v_success needed to be "zero-extended" to match in type with
whatever s_bool_const() produces, as follows:

+   v_success = LLVMBuildZExt(b, v_success,
TypeStorageBool, "");
v_resnullp = LLVMBuildICmp(b, LLVMIntEQ, v_success,
   l_sbool_const(0), "");

No more crashes with the above fixes.

Attached shows the delta against the patch I reverted.  I'll push the
fixed up version on Monday.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


iocoerce-llvm-fixes.diff
Description: Binary data


Re: remaining sql/json patches

2023-10-04 Thread Amit Langote
On Tue, Oct 3, 2023 at 10:11 PM Amit Langote  wrote:
> On Mon, Oct 2, 2023 at 2:26 PM Amit Langote  wrote:
> > On Mon, Oct 2, 2023 at 1:24 PM Amit Langote  wrote:
> > > Pushed this 30 min ago (no email on -committers yet!) and am looking
> > > at the following llvm crash reported by buildfarm animal pogona [1]:
> > >
> > > #4  0x7f5bceb673d5 in __assert_fail_base (fmt=0x7f5bcecdbdc8
> > > "%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n",
> > > assertion=assertion@entry=0x7f5bc1336419 "(i >= FTy->getNumParams() ||
> > > FTy->getParamType(i) == Args[i]->getType()) && \\"Calling a function
> > > with a bad signature!\\"", file=file@entry=0x7f5bc1336051
> > > "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp",
> > > line=line@entry=299, function=function@entry=0x7f5bc13362af "void
> > > llvm::CallInst::init(llvm::FunctionType *, llvm::Value *,
> > > ArrayRef, ArrayRef, const
> > > llvm::Twine &)") at ./assert/assert.c:92
> > > #5  0x7f5bceb763a2 in __assert_fail (assertion=0x7f5bc1336419 "(i
> > > >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType())
> > > && \\"Calling a function with a bad signature!\\"",
> > > file=0x7f5bc1336051
> > > "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp", line=299,
> > > function=0x7f5bc13362af "void llvm::CallInst::init(llvm::FunctionType
> > > *, llvm::Value *, ArrayRef,
> > > ArrayRef, const llvm::Twine &)") at
> > > ./assert/assert.c:101
> > > #6  0x7f5bc110f138 in llvm::CallInst::init (this=0x557a91f3e508,
> > > FTy=0x557a91ed9ae0, Func=0x557a91f8be88, Args=..., Bundles=...,
> > > NameStr=...) at
> > > /home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp:297
> > > #7  0x7f5bc0fa579d in llvm::CallInst::CallInst
> > > (this=0x557a91f3e508, Ty=0x557a91ed9ae0, Func=0x557a91f8be88,
> > > Args=..., Bundles=..., NameStr=..., InsertBefore=0x0) at
> > > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1934
> > > #8  0x7f5bc0fa538c in llvm::CallInst::Create (Ty=0x557a91ed9ae0,
> > > Func=0x557a91f8be88, Args=..., Bundles=..., NameStr=...,
> > > InsertBefore=0x0) at
> > > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1444
> > > #9  0x7f5bc0fa51f9 in llvm::IRBuilder > > llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> > > FTy=0x557a91ed9ae0, Callee=0x557a91f8be88, Args=..., Name=...,
> > > FPMathTag=0x0) at
> > > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1669
> > > #10 0x7f5bc100edda in llvm::IRBuilder > > llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> > > Callee=0x557a91f8be88, Args=..., Name=..., FPMathTag=0x0) at
> > > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1663
> > > #11 0x7f5bc100714e in LLVMBuildCall (B=0x557a91f9c6a0,
> > > Fn=0x557a91f8be88, Args=0x7ffde6fa0b50, NumArgs=6, Name=0x7f5bc30b648c
> > > "funccall_iocoerce_in_safe") at
> > > /home/bf/src/llvm-project-5/llvm/lib/IR/Core.cpp:2964
> > > #12 0x7f5bc30af861 in llvm_compile_expr (state=0x557a91fbeac0) at
> > > /home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/llvm/llvmjit_expr.c:1373
> > >
> > > This seems to me to be complaining about the following addition:
> > >
> > > +   {
> > > +   Oid ioparam = op->d.iocoerce.typioparam;
> > > +   LLVMValueRef v_params[6];
> > > +   LLVMValueRef v_success;
> > > +
> > > +   v_params[0] = l_ptr_const(op->d.iocoerce.finfo_in,
> > > + l_ptr(StructFmgrInfo));
> > > +   v_params[1] = v_output;
> > > +   v_params[2] = l_oid_const(lc, ioparam);
> > > +   v_params[3] = l_int32_const(lc, -1);
> > > +   v_params[4] = 
> > > l_ptr_const(op->d.iocoerce.escontext,
> > > +
> > > l_ptr(StructErrorSaveContext));
> > >
> > > -   LLVMBuildStore(b, v_retval, v_resvaluep);
> > > +   /*
> > > +* InputFunctionCallSafe() will write directly 
> > > into
> > > +* *op->resvalue.
> > > +   

Re: remaining sql/json patches

2023-10-03 Thread Amit Langote
On Mon, Oct 2, 2023 at 2:26 PM Amit Langote  wrote:
> On Mon, Oct 2, 2023 at 1:24 PM Amit Langote  wrote:
> > Pushed this 30 min ago (no email on -committers yet!) and am looking
> > at the following llvm crash reported by buildfarm animal pogona [1]:
> >
> > #4  0x7f5bceb673d5 in __assert_fail_base (fmt=0x7f5bcecdbdc8
> > "%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n",
> > assertion=assertion@entry=0x7f5bc1336419 "(i >= FTy->getNumParams() ||
> > FTy->getParamType(i) == Args[i]->getType()) && \\"Calling a function
> > with a bad signature!\\"", file=file@entry=0x7f5bc1336051
> > "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp",
> > line=line@entry=299, function=function@entry=0x7f5bc13362af "void
> > llvm::CallInst::init(llvm::FunctionType *, llvm::Value *,
> > ArrayRef, ArrayRef, const
> > llvm::Twine &)") at ./assert/assert.c:92
> > #5  0x7f5bceb763a2 in __assert_fail (assertion=0x7f5bc1336419 "(i
> > >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType())
> > && \\"Calling a function with a bad signature!\\"",
> > file=0x7f5bc1336051
> > "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp", line=299,
> > function=0x7f5bc13362af "void llvm::CallInst::init(llvm::FunctionType
> > *, llvm::Value *, ArrayRef,
> > ArrayRef, const llvm::Twine &)") at
> > ./assert/assert.c:101
> > #6  0x7f5bc110f138 in llvm::CallInst::init (this=0x557a91f3e508,
> > FTy=0x557a91ed9ae0, Func=0x557a91f8be88, Args=..., Bundles=...,
> > NameStr=...) at
> > /home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp:297
> > #7  0x7f5bc0fa579d in llvm::CallInst::CallInst
> > (this=0x557a91f3e508, Ty=0x557a91ed9ae0, Func=0x557a91f8be88,
> > Args=..., Bundles=..., NameStr=..., InsertBefore=0x0) at
> > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1934
> > #8  0x7f5bc0fa538c in llvm::CallInst::Create (Ty=0x557a91ed9ae0,
> > Func=0x557a91f8be88, Args=..., Bundles=..., NameStr=...,
> > InsertBefore=0x0) at
> > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1444
> > #9  0x7f5bc0fa51f9 in llvm::IRBuilder > llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> > FTy=0x557a91ed9ae0, Callee=0x557a91f8be88, Args=..., Name=...,
> > FPMathTag=0x0) at
> > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1669
> > #10 0x7f5bc100edda in llvm::IRBuilder > llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> > Callee=0x557a91f8be88, Args=..., Name=..., FPMathTag=0x0) at
> > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1663
> > #11 0x7f5bc100714e in LLVMBuildCall (B=0x557a91f9c6a0,
> > Fn=0x557a91f8be88, Args=0x7ffde6fa0b50, NumArgs=6, Name=0x7f5bc30b648c
> > "funccall_iocoerce_in_safe") at
> > /home/bf/src/llvm-project-5/llvm/lib/IR/Core.cpp:2964
> > #12 0x7f5bc30af861 in llvm_compile_expr (state=0x557a91fbeac0) at
> > /home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/llvm/llvmjit_expr.c:1373
> >
> > This seems to me to be complaining about the following addition:
> >
> > +   {
> > +   Oid ioparam = op->d.iocoerce.typioparam;
> > +   LLVMValueRef v_params[6];
> > +   LLVMValueRef v_success;
> > +
> > +   v_params[0] = l_ptr_const(op->d.iocoerce.finfo_in,
> > + l_ptr(StructFmgrInfo));
> > +   v_params[1] = v_output;
> > +   v_params[2] = l_oid_const(lc, ioparam);
> > +   v_params[3] = l_int32_const(lc, -1);
> > +   v_params[4] = l_ptr_const(op->d.iocoerce.escontext,
> > +
> > l_ptr(StructErrorSaveContext));
> >
> > -   LLVMBuildStore(b, v_retval, v_resvaluep);
> > +   /*
> > +* InputFunctionCallSafe() will write directly into
> > +* *op->resvalue.
> > +*/
> > +   v_params[5] = v_resvaluep;
> > +
> > +   v_success = LLVMBuildCall(b, llvm_pg_func(mod,
> > "InputFunctionCallSafe"),
> > + v_params, 
> > lengthof(v_params),
> > + 
> > "funccall_iocoerce_in_safe");
> > 

Re: remaining sql/json patches

2023-10-01 Thread Amit Langote
On Mon, Oct 2, 2023 at 1:24 PM Amit Langote  wrote:
> Pushed this 30 min ago (no email on -committers yet!) and am looking
> at the following llvm crash reported by buildfarm animal pogona [1]:
>
> #4  0x7f5bceb673d5 in __assert_fail_base (fmt=0x7f5bcecdbdc8
> "%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n",
> assertion=assertion@entry=0x7f5bc1336419 "(i >= FTy->getNumParams() ||
> FTy->getParamType(i) == Args[i]->getType()) && \\"Calling a function
> with a bad signature!\\"", file=file@entry=0x7f5bc1336051
> "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp",
> line=line@entry=299, function=function@entry=0x7f5bc13362af "void
> llvm::CallInst::init(llvm::FunctionType *, llvm::Value *,
> ArrayRef, ArrayRef, const
> llvm::Twine &)") at ./assert/assert.c:92
> #5  0x7f5bceb763a2 in __assert_fail (assertion=0x7f5bc1336419 "(i
> >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType())
> && \\"Calling a function with a bad signature!\\"",
> file=0x7f5bc1336051
> "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp", line=299,
> function=0x7f5bc13362af "void llvm::CallInst::init(llvm::FunctionType
> *, llvm::Value *, ArrayRef,
> ArrayRef, const llvm::Twine &)") at
> ./assert/assert.c:101
> #6  0x7f5bc110f138 in llvm::CallInst::init (this=0x557a91f3e508,
> FTy=0x557a91ed9ae0, Func=0x557a91f8be88, Args=..., Bundles=...,
> NameStr=...) at
> /home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp:297
> #7  0x7f5bc0fa579d in llvm::CallInst::CallInst
> (this=0x557a91f3e508, Ty=0x557a91ed9ae0, Func=0x557a91f8be88,
> Args=..., Bundles=..., NameStr=..., InsertBefore=0x0) at
> /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1934
> #8  0x7f5bc0fa538c in llvm::CallInst::Create (Ty=0x557a91ed9ae0,
> Func=0x557a91f8be88, Args=..., Bundles=..., NameStr=...,
> InsertBefore=0x0) at
> /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1444
> #9  0x7f5bc0fa51f9 in llvm::IRBuilder llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> FTy=0x557a91ed9ae0, Callee=0x557a91f8be88, Args=..., Name=...,
> FPMathTag=0x0) at
> /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1669
> #10 0x7f5bc100edda in llvm::IRBuilder llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> Callee=0x557a91f8be88, Args=..., Name=..., FPMathTag=0x0) at
> /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1663
> #11 0x7f5bc100714e in LLVMBuildCall (B=0x557a91f9c6a0,
> Fn=0x557a91f8be88, Args=0x7ffde6fa0b50, NumArgs=6, Name=0x7f5bc30b648c
> "funccall_iocoerce_in_safe") at
> /home/bf/src/llvm-project-5/llvm/lib/IR/Core.cpp:2964
> #12 0x7f5bc30af861 in llvm_compile_expr (state=0x557a91fbeac0) at
>
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/llvm/llvmjit_expr.c:1373
>
> This seems to me to be complaining about the following addition:
>
> +   {
> +   Oid ioparam = op->d.iocoerce.typioparam;
> +   LLVMValueRef v_params[6];
> +   LLVMValueRef v_success;
> +
> +   v_params[0] = l_ptr_const(op->d.iocoerce.finfo_in,
> + l_ptr(StructFmgrInfo));
> +   v_params[1] = v_output;
> +   v_params[2] = l_oid_const(lc, ioparam);
> +   v_params[3] = l_int32_const(lc, -1);
> +   v_params[4] =
l_ptr_const(op->d.iocoerce.escontext,
> +
> l_ptr(StructErrorSaveContext));
>
> -   LLVMBuildStore(b, v_retval, v_resvaluep);
> +   /*
> +* InputFunctionCallSafe() will write directly
into
> +* *op->resvalue.
> +*/
> +   v_params[5] = v_resvaluep;
> +
> +   v_success = LLVMBuildCall(b, llvm_pg_func(mod,
> "InputFunctionCallSafe"),
> + v_params,
lengthof(v_params),
> +
 "funccall_iocoerce_in_safe");
> +
> +   /*
> +* Return null if InputFunctionCallSafe()
encountered
> +* an error.
> +*/
> +   v_resnullp = LLVMBuildICmp(b, LLVMIntEQ,
v_success,
> +  l_sbool_const(0), "");
> +   }

Although most animals except pogona looked fine, I've decided to revert the
patch for now.

IIUC, LLVM is complaining that the co

Re: remaining sql/json patches

2023-10-01 Thread Amit Langote
On Fri, Sep 29, 2023 at 1:57 PM Amit Langote  wrote:
> On Thu, Sep 28, 2023 at 8:04 PM Alvaro Herrera  
> wrote:
> > On 2023-Sep-27, Amit Langote wrote:
> > > Maybe the following is better:
> > >
> > > +   /*
> > > +* For expression nodes that support soft errors.  Should be set to 
> > > NULL
> > > +* before calling ExecInitExprRec() if the caller wants errors thrown.
> > > +*/
> > >
> > > ...as in the attached.
> >
> > That's good.
> >
> > > Alvaro, do you think your concern regarding escontext not being in the
> > > right spot in the ExprState struct is addressed?  It doesn't seem very
> > > critical to me to place it in the struct's 1st cacheline, because
> > > escontext is not accessed in performance critical paths such as during
> > > expression evaluation, especially with the latest version.  (It would
> > > get accessed during evaluation with previous versions.)
> > >
> > > If so, I'd like to move ahead with committing it.
> >
> > Yeah, looks OK to me in v21.
>
> Thanks.  I will push the attached 0001 shortly.

Pushed this 30 min ago (no email on -committers yet!) and am looking
at the following llvm crash reported by buildfarm animal pogona [1]:

#0  __pthread_kill_implementation (threadid=,
signo=signo@entry=6, no_tid=no_tid@entry=0) at
./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
#0  __pthread_kill_implementation (threadid=,
signo=signo@entry=6, no_tid=no_tid@entry=0) at
./nptl/pthread_kill.c:44
#1  0x7f5bcebcb15f in __pthread_kill_internal (signo=6,
threadid=) at ./nptl/pthread_kill.c:78
#2  0x7f5bceb7d472 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3  0x7f5bceb674b2 in __GI_abort () at ./stdlib/abort.c:79
#4  0x7f5bceb673d5 in __assert_fail_base (fmt=0x7f5bcecdbdc8
"%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n",
assertion=assertion@entry=0x7f5bc1336419 "(i >= FTy->getNumParams() ||
FTy->getParamType(i) == Args[i]->getType()) && \\"Calling a function
with a bad signature!\\"", file=file@entry=0x7f5bc1336051
"/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp",
line=line@entry=299, function=function@entry=0x7f5bc13362af "void
llvm::CallInst::init(llvm::FunctionType *, llvm::Value *,
ArrayRef, ArrayRef, const
llvm::Twine &)") at ./assert/assert.c:92
#5  0x7f5bceb763a2 in __assert_fail (assertion=0x7f5bc1336419 "(i
>= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType())
&& \\"Calling a function with a bad signature!\\"",
file=0x7f5bc1336051
"/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp", line=299,
function=0x7f5bc13362af "void llvm::CallInst::init(llvm::FunctionType
*, llvm::Value *, ArrayRef,
ArrayRef, const llvm::Twine &)") at
./assert/assert.c:101
#6  0x7f5bc110f138 in llvm::CallInst::init (this=0x557a91f3e508,
FTy=0x557a91ed9ae0, Func=0x557a91f8be88, Args=..., Bundles=...,
NameStr=...) at
/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp:297
#7  0x7f5bc0fa579d in llvm::CallInst::CallInst
(this=0x557a91f3e508, Ty=0x557a91ed9ae0, Func=0x557a91f8be88,
Args=..., Bundles=..., NameStr=..., InsertBefore=0x0) at
/home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1934
#8  0x7f5bc0fa538c in llvm::CallInst::Create (Ty=0x557a91ed9ae0,
Func=0x557a91f8be88, Args=..., Bundles=..., NameStr=...,
InsertBefore=0x0) at
/home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1444
#9  0x7f5bc0fa51f9 in llvm::IRBuilder::CreateCall (this=0x557a91f9c6a0,
FTy=0x557a91ed9ae0, Callee=0x557a91f8be88, Args=..., Name=...,
FPMathTag=0x0) at
/home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1669
#10 0x7f5bc100edda in llvm::IRBuilder::CreateCall (this=0x557a91f9c6a0,
Callee=0x557a91f8be88, Args=..., Name=..., FPMathTag=0x0) at
/home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1663
#11 0x7f5bc100714e in LLVMBuildCall (B=0x557a91f9c6a0,
Fn=0x557a91f8be88, Args=0x7ffde6fa0b50, NumArgs=6, Name=0x7f5bc30b648c
"funccall_iocoerce_in_safe") at
/home/bf/src/llvm-project-5/llvm/lib/IR/Core.cpp:2964
#12 0x7f5bc30af861 in llvm_compile_expr (state=0x557a91fbeac0) at
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/llvm/llvmjit_expr.c:1373
#13 0x557a915992db in jit_compile_expr
(state=state@entry=0x557a91fbeac0) at
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/jit.c:177
#14 0x557a9123071d in ExecReadyExpr
(state=state@entry=0x557a91fbeac0) at
/home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/executor/execExpr.c:880
#15 0x557a912340d7 in ExecBuildProjectionInfo
(targetList=0x557a91fa6b58, econtext=, slot=, parent=parent@entry=0x557a9

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2023-09-28 Thread Amit Langote
On Wed, Sep 27, 2023 at 8:07 PM Ashutosh Bapat
 wrote:
> On Wed, Sep 27, 2023 at 2:30 PM Amit Langote  wrote:
> > +   /*
> > +* But the list of operator OIDs and the list of expressions may be
> > +* referenced somewhere else. Do not free those.
> > +*/
> >
> > I don't see build_child_join_sjinfo() copying any list of OIDs, so I'm
> > not sure what the comment is referring to.  Also, instead of "the list
> > of expressions", it might be more useful to mention semi_rhs_expr,
> > because that's the only one being copied.
>
> list of OID is semi_operators. It's copied as is from parent
> SpecialJoinInfo. But the way it's mentioned in the comment is
> confusing. I have modified the prologue of function to provide a
> general guidance on what can be freed and what should not be. and then
> specific examples. Let me know if this is more clear.

Thanks.  I would've kept the notes about specific members inside the
function.  Also, no need to have a note for pointers that are not
deep-copied to begin with, such as semi_operators.  IOW, something
like the following would have sufficed:

@@ -1735,6 +1735,10 @@ build_child_join_sjinfo(PlannerInfo *root,
SpecialJoinInfo *parent_sjinfo,
 /*
  * free_child_sjinfo_members
  * Free memory consumed by members of a child SpecialJoinInfo.
+ *
+ * Only members that are translated copies of their counterpart in the parent
+ * SpecialJoinInfo are freed here.  However, members that could be referenced
+ * elsewhere are not freed.
  */
 static void
 free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
@@ -1755,10 +1759,7 @@ free_child_sjinfo_members(SpecialJoinInfo *child_sjinfo)
bms_free(child_sjinfo->syn_lefthand);
bms_free(child_sjinfo->syn_righthand);

-   /*
-* But the list of operator OIDs and the list of expressions may be
-* referenced somewhere else. Do not free those.
-    */
+   /* semi_rhs_exprs may be referenced, so don't free. */
 }

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2023-09-27 Thread Amit Langote
On Wed, Sep 27, 2023 at 8:07 PM Ashutosh Bapat
 wrote:
> On Wed, Sep 27, 2023 at 2:30 PM Amit Langote  wrote:
> > Just out of curiosity, is their not being present in join_info_list
> > problematic in some manner, such as missed optimization opportunities
> > for child joins?  I noticed there is a loop over join_info_list in
> > add_paths_to_join_rel(), which does get called for child joinrels.  I
> > know this a bit off-topic for the thread, but thought to ask in case
> > you've already thought about it.
>
> The function has a comment and code to take care of this at the very beginning
> /*
> * PlannerInfo doesn't contain the SpecialJoinInfos created for joins
> * between child relations, even if there is a SpecialJoinInfo node for
> * the join between the topmost parents. So, while calculating Relids set
> * representing the restriction, consider relids of topmost parent of
> * partitions.
> */
> if (joinrel->reloptkind == RELOPT_OTHER_JOINREL)
> joinrelids = joinrel->top_parent_relids;
> else
> joinrelids = joinrel->relids;

Ah, that's accounted for.  Thanks.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2023-09-27 Thread Amit Langote
Hi Ashutosh,

On Thu, Sep 21, 2023 at 1:20 PM Ashutosh Bapat
 wrote:
> On Thu, Sep 21, 2023 at 6:37 AM Amit Langote  wrote:
> > On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat
> >  wrote:
> > > On Wed, Sep 20, 2023 at 5:24 PM Amit Langote  
> > > wrote:
> > > > Just one comment on 0003:
> > > >
> > > > +   /*
> > > > +* Dummy SpecialJoinInfos do not have any translated fields and 
> > > > hence have
> > > > +* nothing to free.
> > > > +*/
> > > > +   if (child_sjinfo->jointype == JOIN_INNER)
> > > > +   return;
> > > >
> > > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)?
> > >
> > > try_partitionwise_join() calls free_child_sjinfo_members()
> > > unconditionally i.e. it will be called even SpecialJoinInfos of INNER
> > > join. Above condition filters those out early. In fact if we convert
> > > into an Assert, in a production build the rest of the code will free
> > > Relids which are referenced somewhere else causing dangling pointers.
> >
> > You're right.  I hadn't realized that the parent SpecialJoinInfo
> > passed to try_partitionwise_join() might itself be a dummy.  I guess I
> > should've read the whole thing before commenting.
>
> Maybe there's something to improve in terms of readability, I don't know.

Here are some comments.

Please merge 0003 into 0002.

+   /*
+* But the list of operator OIDs and the list of expressions may be
+* referenced somewhere else. Do not free those.
+*/

I don't see build_child_join_sjinfo() copying any list of OIDs, so I'm
not sure what the comment is referring to.  Also, instead of "the list
of expressions", it might be more useful to mention semi_rhs_expr,
because that's the only one being copied.

The comment for SpecialJoinInfo mentions that they are stored in
PlannerInfo.join_info_list, but the child SpecialJoinInfos used by
partitionwise joins are not, which I noticed has not been mentioned
anywhere.  Should we make a note of that in the SpecialJoinInfo's
comment?

Just out of curiosity, is their not being present in join_info_list
problematic in some manner, such as missed optimization opportunities
for child joins?  I noticed there is a loop over join_info_list in
add_paths_to_join_rel(), which does get called for child joinrels.  I
know this a bit off-topic for the thread, but thought to ask in case
you've already thought about it.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: generic plans and "initial" pruning

2023-09-25 Thread Amit Langote
On Wed, Sep 6, 2023 at 11:20 PM Robert Haas  wrote:
> On Wed, Sep 6, 2023 at 5:12 AM Amit Langote  wrote:
> > Attached updated patches.  Thanks for the review.
>
> I think 0001 looks ready to commit. I'm not sure that the commit
> message needs to mention future patches here, since this code cleanup
> seems like a good idea regardless, but if you feel otherwise, fair
> enough.

OK, I will remove the mention of future patches.

> On 0002, some questions:
>
> - In ExecEndLockRows, is the call to EvalPlanQualEnd a concern? i.e.
> Does that function need any adjustment?

I think it does with the patch as it stands.  It needs to have an
early exit at the top if parentestate is NULL, which it would be if
EvalPlanQualInit() wasn't called from an ExecInit*() function.

Though, as I answer below your question as to whether there is
actually any need to interrupt all of the ExecInit*() routines,
nothing needs to change in ExecEndLockRows().

> - In ExecEndMemoize, should there be a null-test around
> MemoryContextDelete(node->tableContext) as we have in
> ExecEndRecursiveUnion, ExecEndSetOp, etc.?

Oops, you're right.  Added.

> I wonder how we feel about setting pointers to NULL after freeing the
> associated data structures. The existing code isn't consistent about
> doing that, and making it do so would be a fairly large change that
> would bloat this patch quite a bit. On the other hand, I think it's a
> good practice as a general matter, and we do do it in some ExecEnd
> functions.

I agree that it might be worthwhile to take the opportunity and make
the code more consistent in this regard.  So, I've included those
changes too in 0002.

> On 0003, I have some doubt about whether we really have all the right
> design decisions in detail here:
>
> - Why have this weird rule where sometimes we return NULL and other
> times the planstate? Is there any point to such a coding rule? Why not
> just always return the planstate?
>
> - Is there any point to all of these early exit cases? For example, in
> ExecInitBitmapAnd, why exit early if initialization fails? Why not
> just plunge ahead and if initialization failed the caller will notice
> that and when we ExecEndNode some of the child node pointers will be
> NULL but who cares? The obvious disadvantage of this approach is that
> we're doing a bunch of unnecessary initialization, but we're also
> speeding up the common case where we don't need to abort by avoiding a
> branch that will rarely be taken. I'm not quite sure what the right
> thing to do is here.
>
> - The cases where we call ExecGetRangeTableRelation or
> ExecOpenScanRelation are a bit subtler ... maybe initialization that
> we're going to do later is going to barf if the tuple descriptor of
> the relation isn't what we thought it was going to be. In that case it
> becomes important to exit early. But if that's not actually a problem,
> then we could apply the same principle here also -- don't pollute the
> code with early-exit cases, just let it do its thing and sort it out
> later. Do you know what the actual problems would be here if we didn't
> exit early in these cases?
>
> - Depending on the answers to the above points, one thing we could
> think of doing is put an early exit case into ExecInitNode itself: if
> (unlikely(!ExecPlanStillValid(whatever)) return NULL. Maybe Andres or
> someone is going to argue that that checks too often and is thus too
> expensive, but it would be a lot more maintainable than having similar
> checks strewn throughout the ExecInit* functions. Perhaps it deserves
> some thought/benchmarking. More generally, if there's anything we can
> do to centralize these checks in fewer places, I think that would be
> worth considering. The patch isn't terribly large as it stands, so I
> don't necessarily think that this is a critical issue, but I'm just
> wondering if we can do better. I'm not even sure that it would be too
> expensive to just initialize the whole plan always, and then just do
> one test at the end. That's not OK if the changed tuple descriptor (or
> something else) is going to crash or error out in a funny way or
> something before initialization is completed, but if it's just going
> to result in burning a few CPU cycles in a corner case, I don't know
> if we should really care.

I thought about this some and figured that adding the
is-CachedPlan-still-valid tests in the following places should suffice
after all:

1. In InitPlan() right after the top-level ExecInitNode() calls
2. In ExecInit*() functions of Scan nodes, right after
ExecOpenScanRelation() calls

CachedPlans can only become invalid because of concurrent changes to
the inheritance child tables referenced in the plan.  Only the
following schema modifications of child tables are possible to be
performed conc

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2023-09-20 Thread Amit Langote
On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat
 wrote:
> On Wed, Sep 20, 2023 at 5:24 PM Amit Langote  wrote:
> > Just one comment on 0003:
> >
> > +   /*
> > +* Dummy SpecialJoinInfos do not have any translated fields and hence 
> > have
> > +* nothing to free.
> > +*/
> > +   if (child_sjinfo->jointype == JOIN_INNER)
> > +   return;
> >
> > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)?
>
> try_partitionwise_join() calls free_child_sjinfo_members()
> unconditionally i.e. it will be called even SpecialJoinInfos of INNER
> join. Above condition filters those out early. In fact if we convert
> into an Assert, in a production build the rest of the code will free
> Relids which are referenced somewhere else causing dangling pointers.

You're right.  I hadn't realized that the parent SpecialJoinInfo
passed to try_partitionwise_join() might itself be a dummy.  I guess I
should've read the whole thing before commenting.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2023-09-20 Thread Amit Langote
Hi Ashutosh,

On Wed, Aug 16, 2023 at 2:28 PM Ashutosh Bapat
 wrote:
> On Fri, Aug 4, 2023 at 2:11 PM Ashutosh Bapat
>  wrote:
> >
> > Attached patchset fixing those.
> > 0001 - patch to report planning memory, with to explain.out regression 
> > output fix. We may consider committing this as well.
> > 0002 - with your comment addressed above.
>
> 0003 - Added this patch for handling SpecialJoinInfos for inner joins.
> These SpecialJoinInfos are created on the fly for parent joins. They
> can be created on the fly for child joins as well without requiring
> any translations. Thus they do not require any new memory. This patch
> is intended to be merged into 0002 finally.

I read this thread and have been reading the latest patch.  At first
glance, it seems quite straightforward to me.  I agree with Richard
that pfree()'ing 4 bitmapsets may not be a lot of added overhead.  I
will study the patch a bit more.

Just one comment on 0003:

+   /*
+* Dummy SpecialJoinInfos do not have any translated fields and hence have
+* nothing to free.
+*/
+   if (child_sjinfo->jointype == JOIN_INNER)
+   return;

Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)?

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-09-19 Thread Amit Langote
On Tue, Sep 19, 2023 at 9:00 PM Amit Langote  wrote:
> On Tue, Sep 19, 2023 at 7:37 PM jian he  wrote:
> >  ---
> > https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC
> > >>  When the return value of a function is declared as a polymorphic type, 
> > >> there must be at least one argument position that is also
> > >> polymorphic, and the actual data type(s) supplied for the polymorphic 
> > >> arguments determine the actual result type for that call.
> >
> > select json_query(jsonb'{"a":[{"a":[2,3]},{"a":[4,5]}]}','$.a[*].a?(@<=3)'
> > returning anyrange);
> > should fail. Now it returns NULL. Maybe we can validate it in
> > transformJsonFuncExpr?
> > ---
>
> I'm not sure whether we should make the parser complain about the
> weird types being specified in RETURNING.

Sleeping over this, maybe adding the following to
transformJsonOutput() does make sense?

+   if (get_typtype(ret->typid) == TYPTYPE_PSEUDO)
+   ereport(ERROR,
+   errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+   errmsg("returning pseudo-types is not supported in
SQL/JSON functions"));
+

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-09-19 Thread Amit Langote
On Tue, Sep 19, 2023 at 7:37 PM jian he  wrote:
> On Mon, Sep 18, 2023 at 7:51 PM Erik Rijkers  wrote:
> >
> > and FYI: None of these crashes occur when I leave off the 'WITH WRAPPER'
> > clause.
> >
> > >
> > > Erik
> > >
>
> if specify with wrapper, then default behavior is keep quotes, so
> jexpr->omit_quotes will be false, which make val_string NULL.
> in ExecEvalJsonExprCoercion: InputFunctionCallSafe, val_string is
> NULL, flinfo->fn_strict is true, it will return:  *op->resvalue =
> (Datum) 0. but at the same time  *op->resnull is still false!
>
> if not specify with wrapper, then JsonPathQuery will return NULL.
> (because after apply the path_expression, cannot multiple SQL/JSON
> items)
>
> select json_query(jsonb'{"a":[{"a":3},{"a":[4,5]}]}','$.a[*].a?(@<=3)'
>  returning int4range);
> also make server crash, because default is KEEP QUOTES, so in
> ExecEvalJsonExprCoercion jexpr->omit_quotes will be false.
> val_string will be NULL again as mentioned above.

That's right.

> another funny case:
> create domain domain_int4range int4range;
> select json_query(jsonb'{"a":[{"a":[2,3]},{"a":[4,5]}]}','$.a[*].a?(@<=3)'
>  returning domain_int4range with wrapper);
>
> should I expect it to return  [2,4)  ?

This is what you'll get with v16 that I just posted.

>  ---
> https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC
> >>  When the return value of a function is declared as a polymorphic type, 
> >> there must be at least one argument position that is also
> >> polymorphic, and the actual data type(s) supplied for the polymorphic 
> >> arguments determine the actual result type for that call.
>
> select json_query(jsonb'{"a":[{"a":[2,3]},{"a":[4,5]}]}','$.a[*].a?(@<=3)'
> returning anyrange);
> should fail. Now it returns NULL. Maybe we can validate it in
> transformJsonFuncExpr?
> ---

I'm not sure whether we should make the parser complain about the
weird types being specified in RETURNING.  The NULL you get in the
above example is because of the following error:

select json_query(jsonb'{"a":[{"a":[2,3]},{"a":[4,5]}]}','$.a[*].a?(@<=3)'
returning anyrange error on error);
ERROR:  JSON path expression in JSON_QUERY should return singleton
item without wrapper
HINT:  Use WITH WRAPPER clause to wrap SQL/JSON item sequence into array.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-09-18 Thread Amit Langote
Hi Erik,

On Mon, Sep 18, 2023 at 19:09 Erik Rijkers  wrote:

> Op 9/18/23 om 05:15 schreef Amit Langote:
> > On Sun, Sep 17, 2023 at 3:34 PM Erik Rijkers  wrote:
> >> Op 9/14/23 om 10:14 schreef Amit Langote:
> >>>
> >>>
> >>
> >> Hi Amit,
> >>
> >> Just now I built a v14-patched server and I found this crash:
> >>
> >> select json_query(jsonb '
> >> {
> >> "arr": [
> >>   {"arr": [2,3]}
> >> , {"arr": [4,5]}
> >> ]
> >> }'
> >> , '$.arr[*].arr ? (@ <= 3)' returning anyarray  WITH WRAPPER)
> --crash
> >> ;
> >> server closed the connection unexpectedly
> >>  This probably means the server terminated abnormally
> >>  before or while processing the request.
> >> connection to server was lost
> >
> > Thanks for the report.
> >
> > Attached updated version fixes the crash, but you get an error as is
> > to be expected:
> >
> > select json_query(jsonb '
> > {
> > "arr": [
> >   {"arr": [2,3]}
> > , {"arr": [4,5]}
> > ]
> > }'
> > , '$.arr[*].arr ? (@ <= 3)' returning anyarray  WITH WRAPPER);
> > ERROR:  cannot accept a value of type anyarray
> >
> > unlike when using int[]:
> >
> > select json_query(jsonb '
> > {
> > "arr": [
> >   {"arr": [2,3]}
> > , {"arr": [4,5]}
> > ]
> > }'
> > , '$.arr[*].arr ? (@ <= 3)' returning int[]  WITH WRAPPER);
> >   json_query
> > 
> >   {2,3}
> > (1 row)
> >
>
> Thanks, Amit. Alas, there are more: for 'anyarray' I thought I'd
> substitute 'interval', 'int4range', 'int8range', and sure enough they
> all give similar crashes. Patched with v15:
>
> psql -qX -e << SQL
> select json_query(jsonb'{"a":[{"a":[2,3]},{"a":[4,5]}]}',
>'$.a[*].a?(@<=3)'returning int[] with wrapper --ok
> );
>
> select json_query(jsonb'{"a": [{"a": [2,3]}, {"a": [4,5]}]}',
>'$.a[*].a?(@<=3)'returning interval  with wrapper --crash
> --'$.a[*].a?(@<=3)'returning int4range with wrapper --crash
> --'$.a[*].a?(@<=3)'returning int8range with wrapper --crash
> --'$.a[*].a?(@<=3)'returning numeric[] with wrapper --{2,3} =ok
> --'$.a[*].a?(@<=3)'returning anyarray  with wrapper --fixed
> --'$.a[*].a?(@<=3)'returning anyarray   --null =ok
> --'$.a[*].a?(@<=3)'returning int--null =ok
> --'$.a[*].a?(@<=3)'returning int   with wrapper --error =ok
> --'$.a[*].a?(@<=3)'returning int[] with wrapper -- {2,3} =ok
> );
> SQL
> => server closed the connection unexpectedly, etc
>
> Because those first three tries gave a crash (*all three*), I'm a bit
> worried there may be many more.
>
> I am sorry to be bothering you with these somewhat idiotic SQL
> statements but I suppose somehow it needs to be made more solid.


No, thanks for your testing.  I’ll look into these.

>


Re: dubious warning: FORMAT JSON has no effect for json and jsonb types

2023-08-21 Thread Amit Langote
On Fri, Aug 18, 2023 at 2:59 PM Peter Eisentraut  wrote:
> On 16.08.23 16:59, Merlin Moncure wrote:
> > On Wed, Aug 16, 2023 at 8:55 AM Peter Eisentraut  > <mailto:pe...@eisentraut.org>> wrote:
> >
> > This warning comes from parse_expr.c transformJsonValueExpr() and is
> > triggered for example by the following test case:
> >
> > SELECT JSON_OBJECT('foo': NULL::json FORMAT JSON);
> > WARNING:  FORMAT JSON has no effect for json and jsonb types
> >
> > But I don't see anything in the SQL standard that would require this
> > warning.  It seems pretty clear that FORMAT JSON in this case is
> > implicit and otherwise without effect.
> >
> > Also, we don't have that warning in the output case (RETURNING json
> > FORMAT JSON).
> >
> > Anyone remember why this is here?  Should we remove it?
> >
> >
> > +1 for removing, on the basis that it is not suprising, and would
> > pollute logs for most configurations.
>
> done

+1 and thanks.  May have been there as a debugging aid if anything.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-08-15 Thread Amit Langote
On Tue, Aug 15, 2023 at 5:58 PM jian he  wrote:
> Hi.
> in v11, json_query:
> +The returned data_type has the
> same semantics
> +as for constructor functions like 
> json_objectagg;
> +the default returned type is text.
> +The ON EMPTY clause specifies the behavior if the
> +path_expression yields no value at all; 
> the
> +default when ON ERROR is not specified is
> to return a
> +null value.
>
> the default returned type is jsonb?

You are correct.

> Also in above quoted second last
> line should be ON EMPTY ?

Correct too.

> Other than that, the doc looks good.

Thanks for the review.

I will post a new version after finishing working on a few other
improvements I am working on.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: generic plans and "initial" pruning

2023-08-11 Thread Amit Langote
On Fri, Aug 11, 2023 at 14:31 Amit Langote  wrote:

> On Wed, Aug 9, 2023 at 1:05 AM Robert Haas  wrote:
> > On Tue, Aug 8, 2023 at 10:32 AM Amit Langote 
> wrote:
> > > But should ExecInitNode() subroutines return the partially initialized
> > > PlanState node or NULL on detecting invalidation?  If I'm
> > > understanding how you think this should be working correctly, I think
> > > you mean the former, because if it were the latter, ExecInitNode()
> > > would end up returning NULL at the top for the root and then there's
> > > nothing to pass to ExecEndNode(), so no way to clean up to begin with.
> > > In that case, I think we will need to adjust ExecEndNode() subroutines
> > > to add `if (node->ps.ps_ResultTupleSlot)` in the above code, for
> > > example.  That's something Tom had said he doesn't like very much [1].
> >
> > Yeah, I understood Tom's goal as being "don't return partially
> > initialized nodes."
> >
> > Personally, I'm not sure that's an important goal. In fact, I don't
> > even think it's a desirable one. It doesn't look difficult to audit
> > the end-node functions for cases where they'd fail if a particular
> > pointer were NULL instead of pointing to some real data, and just
> > fixing all such cases to have NULL-tests looks like purely mechanical
> > work that we are unlikely to get wrong. And at least some cases
> > wouldn't require any changes at all.
> >
> > If we don't do that, the complexity doesn't go away. It just moves
> > someplace else. Presumably what we do in that case is have
> > ExecInitNode functions undo any initialization that they've already
> > done before returning NULL. There are basically two ways to do that.
> > Option one is to add code at the point where they return early to
> > clean up anything they've already initialized, but that code is likely
> > to substantially duplicate whatever the ExecEndNode function already
> > knows how to do, and it's very easy for logic like this to get broken
> > if somebody rearranges an ExecInitNode function down the road.
>
> Yeah, I too am not a fan of making ExecInitNode() clean up partially
> initialized nodes.
>
> > Option
> > two is to rearrange the ExecInitNode functions now, to open relations
> > or recurse at the beginning, so that we discover the need to fail
> > before we initialize anything. That restricts our ability to further
> > rearrange the functions in future somewhat, but more importantly,
> > IMHO, it introduces more risk right now. Checking that the ExecEndNode
> > function will not fail if some pointers are randomly null is a lot
> > easier than checking that changing the order of operations in an
> > ExecInitNode function breaks nothing.
> >
> > I'm not here to say that we can't do one of those things. But I think
> > adding null-tests to ExecEndNode functions looks like *far* less work
> > and *way* less risk.
>
> +1
>
> > There's a second issue here, too, which is when we abort ExecInitNode
> > partway through, how do we signal that? You're rightly pointing out
> > here that if we do that by returning NULL, then we don't do it by
> > returning a pointer to the partially initialized node that we just
> > created, which means that we either need to store those partially
> > initialized nodes in a separate data structure as you propose to do in
> > 0001,
> >
> > or else we need to pick a different signalling convention. We
> > could change (a) ExecInitNode to have an additional argument, bool
> > *kaboom, or (b) we could make it return bool and return the node
> > pointer via a new additional argument, or (c) we could put a Boolean
> > flag into the estate and let the function signal failure by flipping
> > the value of the flag.
>
> The failure can already be detected by seeing that
> ExecPlanIsValid(estate) is false.  The question is what ExecInitNode()
> or any of its subroutines should return once it is.  I think the
> following convention works:
>
> Return partially initialized state from ExecInit* function where we
> detect the invalidation after calling ExecInitNode() on a child plan,
> so that ExecEndNode() can recurse to clean it up.
>
> Return NULL from ExecInit* functions where we detect the invalidation
> after opening and locking a relation but before calling ExecInitNode()
> to initialize a child plan if there's one at all.  Even if we may set
> things like ExprContext, TupleTableSlot fields, they are cleaned up
> independently of the plan tree anyway via the cleanup called with
> es_exprcontexts, es_tupleTable, respectively.  I e

Re: initial pruning in parallel append

2023-08-09 Thread Amit Langote
On Wed, Aug 9, 2023 at 9:48 PM Robert Haas  wrote:
> On Wed, Aug 9, 2023 at 6:22 AM Amit Langote  wrote:
> > > > I'm assuming it's not
> > > > too ugly if ExecInitAppend() uses IsParallelWorker() to decide whether
> > > > it should be writing to EState.es_part_prune_results or reading from
> > > > it -- the former if in the leader and the latter in a worker.
> > >
> > > I don't think that's too ugly. I mean you have to have an if statement
> > > someplace.
> >
> > Yes, that makes sense.
> >
> > It's just that I thought maybe I haven't thought hard enough about
> > options before adding a new IsParallelWorker(), because I don't find
> > too many instances of IsParallelWorker() in the generic executor code.
> > I think that's because most parallel worker-specific logic lives in
> > execParallel.c or in Exec*Worker() functions outside that file, so the
> > generic code remains parallel query agnostic as much as possible.
>
> Oh, actually, there is an issue here. IsParallelWorker() is not the
> right test. Imagine that there's a parallel query which launches some
> workers, and one of those calls a user-defined function which again
> uses parallelism, launching more workers. This may not be possible
> today, I don't really remember, but the test should be "am I a
> parallel worker with respect to this plan?" not "am I a parallel
> worker at all?".Not quite sure what the best way to code that is. If
> we could just test whether we have a ParallelWorkerContext, it would
> be easy...

I checked enough to be sure that IsParallelWorker() is reliable at the
time of ExecutorStart() / ExecInitNode() in ParallelQueryMain() in a
worker.   However, ParallelWorkerContext is not available at that
point.  Here's the relevant part of ParallelQueryMain():

/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);

/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
if (DsaPointerIsValid(fpes->param_exec))
{
char   *paramexec_space;

paramexec_space = dsa_get_address(area, fpes->param_exec);
RestoreParamExecParams(paramexec_space, queryDesc->estate);
}
pwcxt.toc = toc;
pwcxt.seg = seg;
ExecParallelInitializeWorker(queryDesc->planstate, );

BTW, we do also use IsParallelWorker() in ExecGetRangeTableRelation()
which also probably only runs during ExecInitNode(), same as
ExecInitPartitionPruning() that this patch adds it to.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: initial pruning in parallel append

2023-08-09 Thread Amit Langote
On Tue, Aug 8, 2023 at 11:16 PM Robert Haas  wrote:
> On Tue, Aug 8, 2023 at 2:58 AM Amit Langote  wrote:
> > Or we could consider something like the patch I mentioned in my 1st
> > email.  The idea there was to pass the pruning result via a separate
> > channel, not the DSM chunk linked into the PlanState tree.  To wit, on
> > the leader side, ExecInitParallelPlan() puts the serialized
> > List-of-Bitmapset into the shm_toc with a dedicated PARALLEL_KEY,
> > alongside PlannedStmt, ParamListInfo, etc.  The List-of-Bitmpaset is
> > initialized during the leader's ExecInitNode().  On the worker side,
> > ExecParallelGetQueryDesc() reads the List-of-Bitmapset string and puts
> > the resulting node into the QueryDesc, that ParallelQueryMain() then
> > uses to do ExecutorStart() which copies the pointer to
> > EState.es_part_prune_results.  ExecInitAppend() consults
> > EState.es_part_prune_results and uses the Bitmapset from there, if
> > present, instead of performing initial pruning.
>
> This also seems reasonable.
>
> > I'm assuming it's not
> > too ugly if ExecInitAppend() uses IsParallelWorker() to decide whether
> > it should be writing to EState.es_part_prune_results or reading from
> > it -- the former if in the leader and the latter in a worker.
>
> I don't think that's too ugly. I mean you have to have an if statement
> someplace.

Yes, that makes sense.

It's just that I thought maybe I haven't thought hard enough about
options before adding a new IsParallelWorker(), because I don't find
too many instances of IsParallelWorker() in the generic executor code.
I think that's because most parallel worker-specific logic lives in
execParallel.c or in Exec*Worker() functions outside that file, so the
generic code remains parallel query agnostic as much as possible.

> > If we
> > are to go with this approach we will need to un-revert ec386948948c,
> > which moved PartitionPruneInfo nodes out of Append/MergeAppend nodes
> > to a List in PlannedStmt (copied into EState.es_part_prune_infos),
> > such that es_part_prune_results mirrors es_part_prune_infos.
>
> The comment for the revert (which was
> 5472743d9e8583638a897b47558066167cc14583) points to
> https://www.postgresql.org/message-id/4191508.1674157...@sss.pgh.pa.us
> as the reason, but it's not very clear to me why that email led to
> this being reverted. In any event, I agree that if we go with your
> idea to pass this via a separate PARALLEL_KEY, unreverting that patch
> seems to make sense, because otherwise I think we don't have a fast
> way to find the nodes that contain the state that we care about.

OK, I've attached the unreverted patch that adds
EState.es_part_prune_infos as 0001.

0002 adds EState.es_part_prune_results.   Parallel query leader stores
the bitmapset of initially valid subplans by performing initial
pruning steps contained in a given PartitionPruneInfo into that list
at the same index as the PartitionPruneInfo's index in
es_part_prune_infos.  ExecInitParallelPlan() serializes
es_part_prune_results and stores it in the DSM.  A worker initializes
es_part_prune_results in its own EState by reading the leader's value
from the DSM and for each PartitionPruneInfo in its own copy of
EState.es_part_prune_infos, gets the set of initially valid subplans
by referring to es_part_prune_results in lieu of performing initial
pruning again.

Should workers, as Tom says, instead do the pruning and cross-check
the result to give an error if it doesn't match the leader's?  The
error message can't specifically point out which, though a user would
at least know that they have functions in their database with wrong
volatility markings.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v1-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pla.patch
Description: Binary data


v1-0002-Share-initial-pruning-result-between-parallel-que.patch
Description: Binary data


Re: generic plans and "initial" pruning

2023-08-08 Thread Amit Langote
On Tue, Aug 8, 2023 at 12:36 AM Robert Haas  wrote:
> On Thu, Aug 3, 2023 at 4:37 AM Amit Langote  wrote:
> > Here's a patch set where the refactoring to move the ExecutorStart()
> > calls to be closer to GetCachedPlan() (for the call sites that use a
> > CachedPlan) is extracted into a separate patch, 0002.  Its commit
> > message notes an aspect of this refactoring that I feel a bit nervous
> > about -- needing to also move the CommandCounterIncrement() call from
> > the loop in PortalRunMulti() to PortalStart() which now does
> > ExecutorStart() for the PORTAL_MULTI_QUERY case.
>
> I spent some time today reviewing 0001. Here are a few thoughts and
> notes about things that I looked at.

Thanks for taking a look at this.

> First, I wondered whether it was really adequate for ExecEndPlan() to
> just loop over estate->es_plan_nodes and call it good. Put
> differently, is it possible that we could ever have more than one
> relevant EState, say for a subplan or an EPQ execution or something,
> so that this loop wouldn't cover everything? I found nothing to make
> me think that this is a real danger.

Check.

> Second, I wondered whether the ordering of cleanup operations could be
> an issue. Right now, a node can position cleanup code before, after,
> or both before and after recursing to child nodes, whereas with this
> design change, the cleanup code will always be run before recursing to
> child nodes.

Because a node is appended to es_planstate_nodes at the end of
ExecInitNode(), child nodes get added before their parent nodes.  So
the children are cleaned up first.

> Here, I think we have problems. Both ExecGather and
> ExecEndGatherMerge intentionally clean up the children before the
> parent, so that the child shutdown happens before
> ExecParallelCleanup(). Based on the comment and commit
> acf555bc53acb589b5a2827e65d655fa8c9adee0, this appears to be
> intentional, and you can sort of see why from looking at the stuff
> that happens in ExecParallelCleanup(). If the instrumentation data
> vanishes before the child nodes have a chance to clean things up,
> maybe EXPLAIN ANALYZE won't reflect that instrumentation any more. If
> the DSA vanishes, maybe we'll crash if we try to access it. If we
> actually reach DestroyParallelContext(), we're just going to start
> killing the workers. None of that sounds like what we want.
>
> The good news, of a sort, is that I think this might be the only case
> of this sort of problem. Most nodes recurse at the end, after doing
> all the cleanup, so the behavior won't change. Moreover, even if it
> did, most cleanup operations look pretty localized -- they affect only
> the node itself, and not its children. A somewhat interesting case is
> nodes associated with subplans. Right now, because of the coding of
> ExecEndPlan, nodes associated with subplans are all cleaned up at the
> very end, after everything that's not inside of a subplan. But with
> this change, they'd get cleaned up in the order of initialization,
> which actually seems more natural, as long as it doesn't break
> anything, which I think it probably won't, since as I mention in most
> cases node cleanup looks quite localized, i.e. it doesn't care whether
> it happens before or after the cleanup of other nodes.
>
> I think something will have to be done about the parallel query stuff,
> though. I'm not sure exactly what. It is a little weird that Gather
> and Gather Merge treat starting and killing workers as a purely
> "private matter" that they can decide to handle without the executor
> overall being very much aware of it. So maybe there's a way that some
> of the cleanup logic here could be hoisted up into the general
> executor machinery, that is, first end all the nodes, and then go
> back, and end all the parallelism using, maybe, another list inside of
> the estate. However, I think that the existence of ExecShutdownNode()
> is a complication here -- we need to make sure that we don't break
> either the case where that happen before overall plan shutdown, or the
> case where it doesn't.

Given that children are closed before parent, the order of operations
in ExecEndGather[Merge] is unchanged.

> Third, a couple of minor comments on details of how you actually made
> these changes in the patch set. Personally, I would remove all of the
> "is closed separately" comments that you added. I think it's a
> violation of the general coding principle that you should make the
> code look like it's always been that way. Sure, in the immediate
> future, people might wonder why you don't need to recurse, but 5 or 10
> years from now that's just going to be clutter. Second, in the cases
> where the ExecEndNode functions end up completely empty, I would
&

Re: initial pruning in parallel append

2023-08-08 Thread Amit Langote
On Tue, Aug 8, 2023 at 12:53 AM Robert Haas  wrote:
> On Mon, Aug 7, 2023 at 10:25 AM Amit Langote  wrote:
> > Note we’re talking here about “initial” pruning that occurs during 
> > ExecInitNode(). Workers are only launched during ExecGather[Merge]() which 
> > thereafter do ExecInitNode() on their copy of the the plan tree.  So if we 
> > are to pass the pruning results for cross-checking, it will have to be from 
> > the leader to workers.
>
> That doesn't seem like a big problem because there aren't many node
> types that do pruning, right? I think we're just talking about Append
> and MergeAppend, or something like that, right?

MergeAppend can't be parallel-aware atm, so only Append.

> You just need the
> ExecWhateverEstimate function to budget some DSM space to store the
> information, which can basically just be a bitmap over the set of
> child plans, and the ExecWhateverInitializeDSM copies the information
> into that DSM space, and ExecWhateverInitializeWorker() copies the
> information from the shared space back into the local node (or maybe
> just points to it, if the representation is sufficiently compatible).
> I feel like this is an hour or two's worth of coding, unless I'm
> missing something, and WAY easier than trying to reason about what
> happens if expression evaluation isn't as stable as we'd like it to
> be.

OK, I agree that we'd better share the pruning result between the
leader and workers.

I hadn't thought about putting the pruning result into Append's DSM
(ParallelAppendState), which is what you're describing IIUC.  I looked
into it, though I'm not sure if it can be made to work given the way
things are on the worker side, or at least not without some
reshuffling of code in ParallelQueryMain().  The pruning result will
have to be available in ExecInitAppend, but because the worker reads
the DSM only after finishing the plan tree initialization, it won't.
Perhaps, we can integrate ExecParallelInitializeWorker()'s
responsibilities into ExecutorStart() / ExecInitNode() somehow?

So change the ordering of the following code in ParallelQueryMain():

/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
ExecutorStart(queryDesc, fpes->eflags);

/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
if (DsaPointerIsValid(fpes->param_exec))
{
char   *paramexec_space;

paramexec_space = dsa_get_address(area, fpes->param_exec);
RestoreParamExecParams(paramexec_space, queryDesc->estate);
}
pwcxt.toc = toc;
pwcxt.seg = seg;
ExecParallelInitializeWorker(queryDesc->planstate, );

Looking inside ExecParallelInitializeWorker():

static bool
ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
{
if (planstate == NULL)
return false;

switch (nodeTag(planstate))
{
case T_SeqScanState:
if (planstate->plan->parallel_aware)
ExecSeqScanInitializeWorker((SeqScanState *) planstate, pwcxt);

I guess that'd mean putting the if (planstate->plan->parallel_aware)
block seen here at the end of ExecInitSeqScan() and so on.

Or we could consider something like the patch I mentioned in my 1st
email.  The idea there was to pass the pruning result via a separate
channel, not the DSM chunk linked into the PlanState tree.  To wit, on
the leader side, ExecInitParallelPlan() puts the serialized
List-of-Bitmapset into the shm_toc with a dedicated PARALLEL_KEY,
alongside PlannedStmt, ParamListInfo, etc.  The List-of-Bitmpaset is
initialized during the leader's ExecInitNode().  On the worker side,
ExecParallelGetQueryDesc() reads the List-of-Bitmapset string and puts
the resulting node into the QueryDesc, that ParallelQueryMain() then
uses to do ExecutorStart() which copies the pointer to
EState.es_part_prune_results.  ExecInitAppend() consults
EState.es_part_prune_results and uses the Bitmapset from there, if
present, instead of performing initial pruning.  I'm assuming it's not
too ugly if ExecInitAppend() uses IsParallelWorker() to decide whether
it should be writing to EState.es_part_prune_results or reading from
it -- the former if in the leader and the latter in a worker.  If we
are to go with this approach we will need to un-revert ec386948948c,
which moved PartitionPruneInfo nodes out of Append/MergeAppend nodes
to a List in PlannedStmt (copied into EState.es_part_prune_infos),
such that es_part_prune_results mirrors es_part_prune_infos.

Thoughts?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: initial pruning in parallel append

2023-08-07 Thread Amit Langote
On Mon, Aug 7, 2023 at 22:29 Tom Lane  wrote:

> Robert Haas  writes:
> > ... Second, we've generally made a
> > decision up until now that we don't want to have a hard dependency on
> > stable functions actually being stable. If they aren't, and for
> > example you're using index expressions, your queries may return wrong
> > answers, but you won't get weird internal error messages, and you
> > won't get a crash. I think the bar for this feature is the same.
>
> Yeah, I agree --- wrong answers may be acceptable in such a case, but
> crashes or unintelligible error messages aren't.  There are practical
> reasons for being tolerant here, notably that it's not that easy
> for users to get their volatility markings right.
>
> In the case at hand, I think that means that allowing workers to do
> pruning would require hardening the parallel append code against the
> situation where their pruning results vary.  Maybe, instead of passing
> the pruning results *down*, we could pass them *up* to the leader and
> then throw an error if they're different?


Note we’re talking here about “initial” pruning that occurs during
ExecInitNode(). Workers are only launched during ExecGather[Merge]() which
thereafter do ExecInitNode() on their copy of the the plan tree.  So if we
are to pass the pruning results for cross-checking, it will have to be from
the leader to workers.

> --
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: remaining sql/json patches

2023-08-04 Thread Amit Langote
Hi,

On Fri, Aug 4, 2023 at 19:01 Erik Rijkers  wrote:

> Op 7/21/23 om 12:33 schreef Amit Langote:
> >
> > Thanks for taking a look.
> >
>
> Hi Amit,
>
> Is there any chance to rebase the outstanding SQL/JSON patches, (esp.
> json_query)?


Yes, working on it.  Will post a WIP shortly.

> --
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: remaining sql/json patches

2023-07-28 Thread Amit Langote
Hello,

On Thu, Jul 27, 2023 at 6:36 PM Shinoda, Noriyoshi (HPE Services Japan
- FSIP)  wrote:
> Hi,
> Thank you for developing such a great feature. The attached patch formats the 
> documentation like any other function definition:
> - Added right parenthesis to json function calls.
> - Added  to json functions.
> - Added a space to the 'expression' part of the json_scalar function.
> - Added a space to the 'expression' part of the json_serialize function.

Thanks for checking and the patch.  Will push shortly.

> It seems that the three functions added this time do not have tuples in the 
> pg_proc catalog. Is it unnecessary?

Yes.  These are not functions that get pg_proc entries, but SQL
constructs that *look like* functions, similar to XMLEXISTS(), etc.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: In Postgres 16 BETA, should the ParseNamespaceItem have the same index as it's RangeTableEntry?

2023-07-26 Thread Amit Langote
Hello,

On Fri, Jul 21, 2023 at 5:05 AM Farias de Oliveira
 wrote:
>
> Hello,
>
> Thank you for the help guys and I'm so sorry for my late response. Indeed, 
> the error relies on the ResultRelInfo. In GetResultRTEPermissionInfo() 
> function, it does a checking on the relinfo->ri_RootResultRelInfo variable. I 
> believe that it should go inside this scope:
>
>
> if (relinfo->ri_RootResultRelInfo)
> {
> /*
> * For inheritance child result relations (a partition routing target
> * of an INSERT or a child UPDATE target), this returns the root
> * parent's RTE to fetch the RTEPermissionInfo because that's the only
> * one that has one assigned.
> */
> rti = relinfo->ri_RootResultRelInfo->ri_RangeTableIndex;
> }
>
> The relinfo contains:
>
> {type = T_ResultRelInfo, ri_RangeTableIndex = 5, ri_RelationDesc = 
> 0x7f44e3308cc8, ri_NumIndices = 0, ri_IndexRelationDescs = 0x0, 
> ri_IndexRelationInfo = 0x0, ri_RowIdAttNo = 0,
>   ri_extraUpdatedCols = 0x0, ri_projectNew = 0x0, ri_newTupleSlot = 0x0, 
> ri_oldTupleSlot = 0x0, ri_projectNewInfoValid = false, ri_TrigDesc = 0x0, 
> ri_TrigFunctions = 0x0,
>   ri_TrigWhenExprs = 0x0, ri_TrigInstrument = 0x0, ri_ReturningSlot = 0x0, 
> ri_TrigOldSlot = 0x0, ri_TrigNewSlot = 0x0, ri_FdwRoutine = 0x0, ri_FdwState 
> = 0x0,
>   ri_usesFdwDirectModify = false, ri_NumSlots = 0, ri_NumSlotsInitialized = 
> 0, ri_BatchSize = 0, ri_Slots = 0x0, ri_PlanSlots = 0x0, ri_WithCheckOptions 
> = 0x0,
>   ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0, ri_GeneratedExprsI 
> = 0x0, ri_GeneratedExprsU = 0x0, ri_NumGeneratedNeededI = 0, 
> ri_NumGeneratedNeededU = 0,
>   ri_returningList = 0x0, ri_projectReturning = 0x0, 
> ri_onConflictArbiterIndexes = 0x0, ri_onConflict = 0x0, ri_matchedMergeAction 
> = 0x0, ri_notMatchedMergeAction = 0x0,
>   ri_PartitionCheckExpr = 0x0, ri_ChildToRootMap = 0x0, 
> ri_ChildToRootMapValid = false, ri_RootToChildMap = 0x0, 
> ri_RootToChildMapValid = false, ri_RootResultRelInfo = 0x0,
>   ri_PartitionTupleSlot = 0x0, ri_CopyMultiInsertBuffer = 0x0, 
> ri_ancestorResultRels = 0x0}
>
> Since relinfo->ri_RootResultRelInfo = 0x0, the rti will have no value and 
> Postgres will interpret that the ResultRelInfo must've been created only for 
> filtering triggers and the relation is not being inserted into.
> The relinfo variable is created with the create_entity_result_rel_info() 
> function:
>
> ResultRelInfo *create_entity_result_rel_info(EState *estate, char *graph_name,
>  char *label_name)
> {
> RangeVar *rv;
> Relation label_relation;
> ResultRelInfo *resultRelInfo;
>
> ParseState *pstate = make_parsestate(NULL);
>
> resultRelInfo = palloc(sizeof(ResultRelInfo));
>
> if (strlen(label_name) == 0)
> {
> rv = makeRangeVar(graph_name, AG_DEFAULT_LABEL_VERTEX, -1);
> }
> else
> {
> rv = makeRangeVar(graph_name, label_name, -1);
> }
>
> label_relation = parserOpenTable(pstate, rv, RowExclusiveLock);
>
> // initialize the resultRelInfo
> InitResultRelInfo(resultRelInfo, label_relation,
>   list_length(estate->es_range_table), NULL,
>   estate->es_instrument);
>
> // open the parse state
> ExecOpenIndices(resultRelInfo, false);
>
> free_parsestate(pstate);
>
> return resultRelInfo;
> }
>
> In this case, how can we get the relinfo->ri_RootResultRelInfo to store the 
> appropriate data?

Your function doesn't seem to have access to the ModifyTableState
node, so setting ri_RootResultRelInfo to the correct ResultRelInfo
node does not seem doable.

As I suggested in my previous reply, please check if passing 0 (not
list_length(estate->es_range_table)) for the 3rd argument
InitResultRelInfo() fixes the problem and gives the correct result.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-07-26 Thread Amit Langote
On Fri, Jul 21, 2023 at 7:33 PM Amit Langote  wrote:
> On Fri, Jul 21, 2023 at 1:02 AM Alvaro Herrera  
> wrote:
> > On 2023-Jul-21, Amit Langote wrote:
> >
> > > I’m thinking of pushing 0001 and 0002 tomorrow barring objections.
> >
> > 0001 looks reasonable to me.  I think you asked whether to squash that
> > one with the other bugfix commit for the same code that you already
> > pushed to master; I think there's no point in committing as separate
> > patches, because the first one won't show up in the git_changelog output
> > as a single entity with the one in 16, so it'll just be additional
> > noise.
>
> OK, pushed 0001 to HEAD and b6e1157e7d + 0001 to 16.
>
> > I've looked at 0002 at various points in time and I think it looks
> > generally reasonable.  I think your removal of a couple of newlines
> > (where originally two appear in sequence) is unwarranted; that the name
> > to_json[b]_worker is ugly for exported functions (maybe "datum_to_json"
> > would be better, or you may have better ideas);
>
> Went with datum_to_json[b].  Created a separate refactoring patch for
> this, attached as 0001.
>
> Created another refactoring patch for the hunks related to renaming of
> a nonterminal in gram.y, attached as 0002.
>
> > and that the omission of
> > the stock comment in the new stanzas in FigureColnameInternal() is
> > strange.
>
> Yes, fixed.
>
> >  But I don't have anything serious.  Do add some ecpg tests ...
>
> Added.
>
> > Also, remember to pgindent and bump catversion, if you haven't already.
>
> Will do.  Wasn't sure myself whether the catversion should be bumped,
> but I suppose it must be because ruleutils.c has changed.
>
> Attaching latest patches.  Will push 0001, 0002, and 0003 on Monday to
> avoid worrying about the buildfarm on a Friday evening.

And pushed.

Will post the remaining patches after addressing jian he's comments.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: [feature]COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns

2023-07-26 Thread Amit Langote
Hello,

On Thu, Jul 20, 2023 at 4:06 PM Zhang Mingli  wrote:
>
> Hi,
>
> On Jul 7, 2023 at 18:00 +0800, Damir Belyalov , wrote:
>
>
> V2 patch still have some errors when apply file doc/src/sgml/ref/copy.sgml, 
> rebased and fixed it in V3 path.
> Thanks a lot for review.
>
> I have updated https://commitfest.postgresql.org/43/3896/ to staus Ready for 
> Committer, thanks again.

I've looked at this patch and it looks mostly fine, though I do not
intend to commit it myself; perhaps Andrew will.

A few minor things to improve:

+  If * is specified, it will be applied in all columns.
...
+  If * is specified, it will be applied in all columns.

Please write "it will be applied in" as "the option will be applied to".

+   boolforce_notnull_all;  /* FORCE_NOT_NULL * */
...
+   boolforce_null_all; /* FORCE_NULL * */

Like in the comment for force_quote, please add a "?" after * in the
above comments.

+   if (cstate->opts.force_notnull_all)
+   MemSet(cstate->opts.force_notnull_flags, true, num_phys_attrs
* sizeof(bool));
...
+   if (cstate->opts.force_null_all)
+   MemSet(cstate->opts.force_null_flags, true, num_phys_attrs *
sizeof(bool));

While I am not especially opposed to using this 1-line variant to set
the flags array, it does mean that there are now different styles
being used for similar code, because force_quote_flags uses a for
loop:

if (cstate->opts.force_quote_all)
{
int i;

for (i = 0; i < num_phys_attrs; i++)
cstate->opts.force_quote_flags[i] = true;
}

Perhaps we could fix the inconsistency by changing the force_quote_all
code to use MemSet() too.  I'll defer whether to do that to Andrew's
judgement.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-07-20 Thread Amit Langote
On Thu, Jul 20, 2023 at 17:19 Amit Langote  wrote:

> On Wed, Jul 19, 2023 at 5:17 PM Amit Langote 
> wrote:
> > On Wed, Jul 19, 2023 at 12:53 AM Alvaro Herrera 
> wrote:
> > > On 2023-Jul-18, Amit Langote wrote:
> > >
> > > > Attached updated patches.  In 0002, I removed the mention of the
> > > > RETURNING clause in the JSON(), JSON_SCALAR() documentation, which I
> > > > had forgotten to do in the last version which removed its support in
> > > > code.
> > >
> > > > I think 0001 looks ready to go.  Alvaro?
> > >
> > > It looks reasonable to me.
> >
> > Thanks for taking another look.
> >
> > I will push this tomorrow.
>
> Pushed.
>
> > > > Also, I've been wondering if it isn't too late to apply the following
> > > > to v16 too, so as to make the code look similar in both branches:
> > >
> > > Hmm.
> > >
> > > > 785480c953 Pass constructName to transformJsonValueExpr()
> > >
> > > I think 785480c953 can easily be considered a bugfix on 7081ac46ace8,
> so
> > > I agree it's better to apply it to 16.
> >
> > OK.
>
> Pushed to 16.
>
> > > > b6e1157e7d Don't include CaseTestExpr in JsonValueExpr.formatted_expr
> > >
> > > I feel a bit uneasy about this one.  It seems to assume that
> > > formatted_expr is always set, but at the same time it's not obvious
> that
> > > it is.  (Maybe this aspect just needs some more commentary).
> >
> > Hmm, I agree that the comments about formatted_expr could be improved
> > further, for which I propose the attached.  Actually, staring some
> > more at this, I'm inclined to change makeJsonValueExpr() to allow
> > callers to pass it the finished 'formatted_expr' rather than set it by
> > themselves.
> >
> > >  I agree
> > > that it would be better to make both branches identical, because if
> > > there's a problem, we are better equipped to get a fix done to both.
> > >
> > > As for the removal of makeCaseTestExpr(), I agree -- of the six callers
> > > of makeNode(CastTestExpr), only two of them would be able to use the
> new
> > > function, so it doesn't look of general enough usefulness.
> >
> > OK, so you agree with back-patching this one too, though perhaps only
> > after applying something like the aforementioned patch.
>
> I looked at this some more and concluded that it's fine to think that
> all JsonValueExpr nodes leaving the parser have their formatted_expr
> set.  I've updated the commentary some more in the patch attached as
> 0001.
>
> Rebased SQL/JSON patches also attached.  I've fixed the JSON_TABLE
> syntax synopsis in the documentation as mentioned in my other email.


I’m thinking of pushing 0001 and 0002 tomorrow barring objections.

> --
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Re: remaining sql/json patches

2023-07-20 Thread Amit Langote
Hello,

On Thu, Jul 20, 2023 at 10:35 AM jian he  wrote:
> On Tue, Jul 18, 2023 at 5:11 PM Amit Langote  wrote:
> > > Op 7/17/23 om 07:00 schreef jian he:
> > > > hi.
> > > > seems there is no explanation about, json_api_common_syntax in
> > > > functions-json.html
> > > >
> > > > I can get json_query full synopsis from functions-json.html as follows:
> > > > json_query ( context_item, path_expression [ PASSING { value AS
> > > > varname } [, ...]] [ RETURNING data_type [ FORMAT JSON [ ENCODING UTF8
> > > > ] ] ] [ { WITHOUT | WITH { CONDITIONAL | [UNCONDITIONAL] } } [ ARRAY ]
> > > > WRAPPER ] [ { KEEP | OMIT } QUOTES [ ON SCALAR STRING ] ] [ { ERROR |
> > > > NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON EMPTY ]
> > > > [ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression }
> > > > ON ERROR ])
> > > >
> > > > seems doesn't  have a full synopsis for json_table? only partial one
> > > > by  one explanation.
> >
> > I looked through the history of the docs portion of the patch and it
> > looks like the synopsis for JSON_TABLE(...) used to be there but was
> > taken out during one of the doc reworks [1].
> >
> > I've added it back in the patch as I agree that it would help to have
> > it.   Though, I am not totally sure where I've put it is the right
> > place for it.  JSON_TABLE() is a beast that won't fit into the table
> > that JSON_QUERY() et al are in, so maybe that's how it will have to
> > be?  I have no better idea.
>
> attached screenshot render json_table syntax almost plain html. It looks fine.

Thanks for checking.

> based on syntax, then I am kind of confused with following 2 cases:
> --1
> SELECT * FROM JSON_TABLE(jsonb '1', '$'
> COLUMNS (a int PATH 'strict $.a' default 1  ON EMPTY default 2 on error)
> ERROR ON ERROR) jt;
>
> --2
> SELECT * FROM JSON_TABLE(jsonb '1', '$'
> COLUMNS (a int PATH 'strict $.a' default 1  ON EMPTY default 2 on error)) 
> jt;
>
> the first one should yield syntax error?

No.  Actually, the synopsis missed the optional ON ERROR clause that
can appear after COLUMNS(...).  Will fix it.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-07-19 Thread Amit Langote
On Wed, Jul 19, 2023 at 5:17 PM Amit Langote  wrote:
> On Wed, Jul 19, 2023 at 12:53 AM Alvaro Herrera  
> wrote:
> > On 2023-Jul-18, Amit Langote wrote:
> > > b6e1157e7d Don't include CaseTestExpr in JsonValueExpr.formatted_expr
> >
> > I feel a bit uneasy about this one.  It seems to assume that
> > formatted_expr is always set, but at the same time it's not obvious that
> > it is.  (Maybe this aspect just needs some more commentary).
>
> Hmm, I agree that the comments about formatted_expr could be improved
> further, for which I propose the attached.  Actually, staring some
> more at this, I'm inclined to change makeJsonValueExpr() to allow
> callers to pass it the finished 'formatted_expr' rather than set it by
> themselves.

Hmm, after looking some more, it may not be entirely right that
formatted_expr is always set in the code paths that call
transformJsonValueExpr().  Will look at this some more tomorrow.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-07-19 Thread Amit Langote
On Wed, Jul 19, 2023 at 12:53 AM Alvaro Herrera  wrote:
> On 2023-Jul-18, Amit Langote wrote:
>
> > Attached updated patches.  In 0002, I removed the mention of the
> > RETURNING clause in the JSON(), JSON_SCALAR() documentation, which I
> > had forgotten to do in the last version which removed its support in
> > code.
>
> > I think 0001 looks ready to go.  Alvaro?
>
> It looks reasonable to me.

Thanks for taking another look.

I will push this tomorrow.

> > Also, I've been wondering if it isn't too late to apply the following
> > to v16 too, so as to make the code look similar in both branches:
>
> Hmm.
>
> > 785480c953 Pass constructName to transformJsonValueExpr()
>
> I think 785480c953 can easily be considered a bugfix on 7081ac46ace8, so
> I agree it's better to apply it to 16.

OK.

> > b6e1157e7d Don't include CaseTestExpr in JsonValueExpr.formatted_expr
>
> I feel a bit uneasy about this one.  It seems to assume that
> formatted_expr is always set, but at the same time it's not obvious that
> it is.  (Maybe this aspect just needs some more commentary).

Hmm, I agree that the comments about formatted_expr could be improved
further, for which I propose the attached.  Actually, staring some
more at this, I'm inclined to change makeJsonValueExpr() to allow
callers to pass it the finished 'formatted_expr' rather than set it by
themselves.

>  I agree
> that it would be better to make both branches identical, because if
> there's a problem, we are better equipped to get a fix done to both.
>
> As for the removal of makeCaseTestExpr(), I agree -- of the six callers
> of makeNode(CastTestExpr), only two of them would be able to use the new
> function, so it doesn't look of general enough usefulness.

OK, so you agree with back-patching this one too, though perhaps only
after applying something like the aforementioned patch.  Just to be
sure, would the good practice in this case be to squash the fixup
patch into b6e1157e7d before back-patching?


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


0001-Code-review-for-commit-b6e1157e7d.patch
Description: Binary data


  1   2   3   4   5   6   7   8   9   10   >