CheckMyDatabase some error messages in two lines.

2024-06-09 Thread jian he
hi.
https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-FORMATTING
"Don't end a message with a newline."


accidentally, I found some error messages in the function
CheckMyDatabase spread into two lines.
so i try to consolidate them into one line.
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0805398e..d73343e4 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -408,15 +408,13 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
 		ereport(FATAL,
 (errmsg("database locale is incompatible with operating system"),
- errdetail("The database was initialized with LC_COLLATE \"%s\", "
-		   " which is not recognized by setlocale().", collate),
+ errdetail("The database was initialized with LC_COLLATE \"%s\", which is not recognized by setlocale().", collate),
  errhint("Recreate the database with another locale or install the missing locale.")));
 
 	if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL)
 		ereport(FATAL,
 (errmsg("database locale is incompatible with operating system"),
- errdetail("The database was initialized with LC_CTYPE \"%s\", "
-		   " which is not recognized by setlocale().", ctype),
+ errdetail("The database was initialized with LC_CTYPE \"%s\", which is not recognized by setlocale().", ctype),
  errhint("Recreate the database with another locale or install the missing locale.")));
 
 	if (strcmp(ctype, "C") == 0 ||
@@ -490,12 +488,9 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 			ereport(WARNING,
 	(errmsg("database \"%s\" has a collation version mismatch",
 			name),
-	 errdetail("The database was created using collation version %s, "
-			   "but the operating system provides version %s.",
+	 errdetail("The database was created using collation version %s, but the operating system provides version %s.",
 			   collversionstr, actual_versionstr),
-	 errhint("Rebuild all objects in this database that use the default collation and run "
-			 "ALTER DATABASE %s REFRESH COLLATION VERSION, "
-			 "or build PostgreSQL with the right library version.",
+	 errhint("Rebuild all objects in this database that use the default collation and run ALTER DATABASE %s REFRESH COLLATION VERSION or build PostgreSQL with the right library version.",
 			 quote_identifier(name;
 	}
 


Re: altering a column's collation leaves an invalid foreign key

2024-06-07 Thread jian he
On Sat, Jun 8, 2024 at 4:12 AM Tom Lane  wrote:
>
> jian he  writes:
> >> * in TryReuseForeignKey, we can pass the information that our primary
> >> key old collation is nondeterministic
> >> and old collation != new collation to the foreign key constraint.
>
> I have a basic question about this: why are we allowing FKs to be
> based on nondeterministic collations at all?  ISTM that that breaks
> the assumption that there is exactly one referenced row for any
> referencing row.
>

for FKs nondeterministic,
I think that would require the PRIMARY KEY collation to not be
indeterministic also.

for example:
CREATE COLLATION ignore_accent_case (provider = icu, deterministic =
false, locale = 'und-u-ks-level1');
DROP TABLE IF EXISTS fktable, pktable;
CREATE TABLE pktable (x text COLLATE ignore_accent_case PRIMARY KEY);
CREATE TABLE fktable (x text REFERENCES pktable on update cascade on
delete cascade);
INSERT INTO pktable VALUES ('A');
INSERT INTO fktable VALUES ('a');
INSERT INTO fktable VALUES ('A');
update pktable set x  = 'Å';
table fktable;



if FK is nondeterministic, then it looks PK more like FK.
the following example, one FK row is referenced by two PK rows.

DROP TABLE IF EXISTS fktable, pktable;
CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY);
CREATE TABLE fktable (x text COLLATE ignore_accent_case REFERENCES
pktable on update cascade on delete cascade);
INSERT INTO pktable VALUES ('A'), ('Å');
INSERT INTO fktable VALUES ('A');

begin; delete from pktable where x = 'Å'; TABLE fktable; rollback;
begin; delete from pktable where x = 'A'; TABLE fktable; rollback;




Re: using __func__ to locate and distinguish some error messages

2024-06-07 Thread jian he
On Fri, Jun 7, 2024 at 4:28 PM Alvaro Herrera  wrote:
>
> On 2024-Jun-07, jian he wrote:
>
> > so when it actually happens, it cannot quickly locate which function
> > where the error has happened.
> > maybe under certain conditions (e.g. certain build type or certain
> > log_min_messages),
> > we can also print out the function name by using gcc __func__.
>
> That information is already in the error data, so you don't need it in
> the message text.  You can change your log_error_verbosity if you want
> it to show up in the log; in psql you can use \errverbose to have it
> shown to you after the error is thrown, or you can use
>   \pset VERBOSITY verbose
> to have it printed for every error message.  Tools other than psql would
> need their own specific ways to display those.
>

Thanks for pointing this out.




using __func__ to locate and distinguish some error messages

2024-06-07 Thread jian he
hi.

we have 450 appearance of
`cache lookup failed .*`

we have 141 appearance of
`could not open file .*`

so when it actually happens, it cannot quickly locate which function
where the error has happened.
maybe under certain conditions (e.g. certain build type or certain
log_min_messages),
we can also print out the function name by using gcc __func__.

or we can just do like:
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for relation %u %s",
RelationGetRelid(rel), __func__);

given that these errors are very unlikely to happen, if it happens,
printing out the function name seems not that inversive?




Re: altering a column's collation leaves an invalid foreign key

2024-06-07 Thread jian he
On Sat, Apr 13, 2024 at 9:13 PM jian he  wrote:
>
> > > > Here is a patch implementing this. It was a bit more fuss than I 
> > > > expected, so maybe someone has a
> > > > better way.
> I think I found a simple way.
>
> the logic is:
> * ATExecAlterColumnType changes one column once at a time.
> * one column can only have one collation. so we don't need  to store a
> list of collation oid.
> * ATExecAlterColumnType we can get the new collation (targetcollid)
> and original collation info.
> * RememberAllDependentForRebuilding  will check the column dependent,
> whether this column is referenced by a foreign key or not information
> is recorded.
> so AlteredTableInfo->changedConstraintOids have the primary key and
> foreign key oids.
> * ATRewriteCatalogs will call ATPostAlterTypeCleanup (see the comments
> in ATRewriteCatalogs)
> * for tab->changedConstraintOids (foreign key, primary key) will call
> ATPostAlterTypeParse, so
> for foreign key (con->contype == CONSTR_FOREIGN)  will call 
> TryReuseForeignKey.
> * in TryReuseForeignKey, we can pass the information that our primary
> key old collation is nondeterministic
> and old collation != new collation to the foreign key constraint.
> so foreign key can act accordingly at ATAddForeignKeyConstraint 
> (old_check_ok).
>
>
> based on the above logic, I add one bool in struct AlteredTableInfo,
> one bool in struct Constraint.
> bool in AlteredTableInfo is for storing it, later passing it to struct
> Constraint.
> we need bool in Constraint because ATAddForeignKeyConstraint needs it.

I refactored the comments.
also added some extra tests hoping to make it more bullet proof, maybe
it's redundant.
From 9a2e73231e037c7555449d8790885c1feb4f3ed0 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 7 Jun 2024 14:32:55 +0800
Subject: [PATCH v4 1/1] Revalidate PK-FK ties when PK collation is changed

With deterministic collations, we check the PK-FK ties by looking at binary
equality. But nondeterministic collation can allow non-identical values
to be considered equal, e.g. 'a' and 'A' when case-insensitive.
ALTER COLUMN .. SET DATA TYPE make
some references that used to be valid may not be anymore.

for `ALTER COLUMN .. SET DATA TYPE`:
If the changed key column is part of the primary key columns,
then under the following conditions we need to revalidate the PK-FK ties
* the PK new collation is different from the old
* the old collation is nondeterministic.

we need to revalidate the PK-FK ties.
discussion: https://postgr.es/m/78d824e0-b21e-480d-a252-e4b84bc2c24b%40illuminatedcomputing.com
---
 src/backend/commands/tablecmds.c  | 61 +++
 src/include/nodes/parsenodes.h|  3 +-
 .../regress/expected/collate.icu.utf8.out | 18 ++
 src/test/regress/sql/collate.icu.utf8.sql | 16 +
 4 files changed, 85 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7b6c69b7..77337b6f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -199,6 +199,7 @@ typedef struct AlteredTableInfo
 	/* Objects to rebuild after completing ALTER TYPE operations */
 	List	   *changedConstraintOids;	/* OIDs of constraints to rebuild */
 	List	   *changedConstraintDefs;	/* string definitions of same */
+	bool		verify_new_collation; /* T if we need recheck the new collation */
 	List	   *changedIndexOids;	/* OIDs of indexes to rebuild */
 	List	   *changedIndexDefs;	/* string definitions of same */
 	char	   *replicaIdentityIndex;	/* index to reset as REPLICA IDENTITY */
@@ -572,12 +573,13 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab,
    LOCKMODE lockmode);
 static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
  char *cmd, List **wqueue, LOCKMODE lockmode,
- bool rewrite);
+ bool rewrite,
+ bool verify_new_collation);
 static void RebuildConstraintComment(AlteredTableInfo *tab, AlterTablePass pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
-static void TryReuseForeignKey(Oid oldId, Constraint *con);
+static void TryReuseForeignKey(Oid oldId, Constraint *con, bool verify_new_collation);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
 	 List *options, LOCKMODE lockmode);
 static void change_owner_fix_column_acls(Oid relationOid,
@@ -9857,6 +9859,15 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	old_pfeqop_item);
 		}
 		if (old_check_ok)
+		{
+			/*
+			 * collation_recheck is true then we need to
+			 * revalidate the foreign key and primary key value ties.
+			 * see also TryReuseForeignKey.
+			*/
+			ol

Re: cannot drop intarray extension

2024-06-06 Thread jian he
On Mon, Jun 3, 2024 at 12:14 PM jian he  wrote:
>
> hi.
>
>  setup
> drop table if exist test__int cascade;
> create extension intarray;
>
> CREATE TABLE test__int( a int[] );
> CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 
> 1));
> drop extension intarray cascade;
> NOTICE:  drop cascades to index text_idx
> 2024-06-03 11:53:32.629 CST [41165] ERROR:  cache lookup failed for
> function 17758
> 2024-06-03 11:53:32.629 CST [41165] STATEMENT:  drop extension intarray 
> cascade;
> ERROR:  cache lookup failed for function 17758
>
> 
> backtrace info:
> index_getprocinfo
> #0  index_opclass_options (indrel=0x7faeca727b58, attnum=1,
> attoptions=94372901674408, validate=false)
> at 
> ../../Desktop/pg_src/src4/postgres/src/backend/access/index/indexam.c:1034
> #1  0x55d4e63a79cb in RelationGetIndexAttOptions
> (relation=0x7faeca727b58, copy=false)
> at 
> ../../Desktop/pg_src/src4/postgres/src/backend/utils/cache/relcache.c:5872
> #2  0x55d4e639d72d in RelationInitIndexAccessInfo 
> (relation=0x7faeca727b58)
> at 
> ../../Desktop/pg_src/src4/postgres/src/backend/utils/cache/relcache.c:1569
> #3  0x55d4e639c5ac in RelationBuildDesc (targetRelId=24582, insertIt=true)
> at 
> ../../Desktop/pg_src/src4/postgres/src/backend/utils/cache/relcache.c:1207
> #4  0x55d4e639e9ce in RelationIdGetRelation (relationId=24582)
> at 
> ../../Desktop/pg_src/src4/postgres/src/backend/utils/cache/relcache.c:2115
> #5  0x55d4e5a412fd in relation_open (relationId=24582, lockmode=8)
> at 
> ../../Desktop/pg_src/src4/postgres/src/backend/access/common/relation.c:58
> #6  0x55d4e5ae6a06 in index_open (relationId=24582, lockmode=8)
> at 
> ../../Desktop/pg_src/src4/postgres/src/backend/access/index/indexam.c:137
> #7  0x55d4e5be61b8 in index_drop (indexId=24582, concurrent=false,
> concurrent_lock_mode=false)
> at ../../Desktop/pg_src/src4/postgres/src/backend/catalog/index.c:2156
> 
> i guess it's because we first dropped the function g_intbig_options

in this context, the index "text_idx" has a normal dependency with pg_opclass.
but `drop extension intarray cascade;`,
CASCADE means that we drop the pg_opclass and pg_opclass's inner dependency
first, then drop the index.

while drop index (sub functions
RelationGetIndexAttOptions,index_opclass_options, index_getprocinfo)
requires that pg_opclass and its inner dependencies (namely
g_intbig_options,  g_int_options) are not dropped first.


in deleteObjectsInList, under certain conditions trying to sort the to
be deleted object list
by just using sort_object_addresses seems to work,
but it looks like a hack.
maybe the proper fix would be in findDependentObjects.
From 8deefb638df270cf26e5649b1a99f218474821fa Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 7 Jun 2024 11:25:03 +0800
Subject: [PATCH v1 1/1] trying to resolve drop extension deletion order

sometimes, drop extension cascade cannot resolve the internal deletion order correctly.
e.g.
drop table if exist test__int cascade;
create extension intarray;
CREATE TABLE test__int( a int[] );
CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 1));
drop extension intarray cascade;


the index "text_idx" only have a normal dependency with pg_opclass.
even though the index can be dropped separately without affecting the "pg_opclass".

but CASCADE means that we drop the pg_opclass and pg_opclass's inner dependency
first, then drop the index.

while drop index (sub functions RelationGetIndexAttOptions,index_opclass_options, index_getprocinfo)
requires that pg_opclass and its inner dependencies are not dropped first.

Resorting the deleted objects in deleteObjectsInList using sort_object_addresses seems like a hack.
but it works for now.

discussion: https://www.postgresql.org/message-id/CACJufxEspPKC7oxVLci7oUddUmcAGNKJnWWSD7-B03bGtT9gDg%40mail.gmail.com
---
 src/backend/catalog/dependency.c  | 33 +++
 src/backend/catalog/pg_shdepend.c |  2 +-
 src/backend/commands/dropcmds.c   |  2 +-
 src/backend/commands/indexcmds.c  |  2 +-
 src/backend/commands/tablecmds.c  | 13 +++-
 src/include/catalog/dependency.h  |  2 +-
 6 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index d4b5b2ad..d0c2454b 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -183,7 +183,7 @@ static void DeleteInitPrivs(const ObjectAddress *object);
  */
 static void
 deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
-	int flags)
+	int flags, bool sort_objects)
 {
 	int			i;
 
@@ -213,6 +213,8 @@ deleteObjectsInList(ObjectAddresses

Re: ON ERROR in json_query and the like

2024-06-03 Thread jian he
On Tue, May 28, 2024 at 5:29 PM Markus Winand  wrote:
>
> Hi!
>
> I’ve noticed two “surprising” (to me) behaviors related to
> the “ON ERROR” clause of the new JSON query functions in 17beta1.
>
> 1. JSON parsing errors are not subject to ON ERROR
>Apparently, the functions expect JSONB so that a cast is implied
>when providing TEXT. However, the errors during that cast are
>not subject to the ON ERROR clause.
>
>17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
>ERROR:  invalid input syntax for type json
>DETAIL:  Token "invalid" is invalid.
>CONTEXT:  JSON data, line 1: invalid
>
>Oracle DB and Db2 (LUW) both return NULL in that case.
>
>I had a look on the list archive to see if that is intentional but
>frankly speaking these functions came a long way. In case it is
>intentional it might be worth adding a note to the docs.

previous versions require SQL/JSON query function's context_item to
explicitly cast to jsonb,
if it is not it will error out.

previous version the following query will have a error
select json_value(text '"1"' , 'strict $[*]' DEFAULT 9 ON ERROR);

now it only requires that (context_item) casting to jsonb successfully.
I raise this issue separately at [1]

[1] 
https://www.postgresql.org/message-id/CACJufxGWJTa-b0WjNH15otih42PA7SF%2Be7LbkAb0gThs7ojT5Q%40mail.gmail.com

> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY
>
>17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
> a
>
> []
>(1 row)
>
>As NULL ON EMPTY is implied, it should give the same result as
>explicitly adding NULL ON EMPTY:
>

I vaguely remember, we stumbled on ON ERROR, ON EMPTY several times.
i don't have a standard, but the doc seems not explicit enough for the
above example.

in json_query, maybe we can rephrase like:

The ON EMPTY clause specifies the behavior if evaluating
path_expression yields no value at all. The default when ON EMPTY is
not specified
and ON ERROR not specified is to return a null value.

The ON ERROR clause specifies the behavior if an error occurs when
evaluating path_expression, including evaluation yields no value at
all and ON EMPTY is not specified, the operation to coerce the result
value to the output type, or during the execution of ON EMPTY behavior
(that is caused by empty result of path_expression evaluation). The
default when ON ERROR is not specified is to return a null value.



>17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON 
> ERROR) a;
> a
>---
>
>(1 row)
>
>Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
>on the other hand returns NULL for both queries.
>
>I don’t think that PostgreSQL should follow Oracle DB's suit here
>but again, in case this is intentional it should be made explicit
>in the docs.


`
SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
`
I think these sentences addressed the above query.
<<<
or during the execution of ON EMPTY behavior (that is caused by empty
result of path_expression evaluation).
<<<
As you can see, in this context, "execution of ON EMPTY behavior"
works fine, successfully returned null,
so `EMPTY ARRAY ON ERROR` part was ignored.




SQL/JSON query functions context_item doc entry and type requirement

2024-06-03 Thread jian he
hi
based on gram.y and function transformJsonValueExpr.

gram.y:
| JSON_QUERY '('
json_value_expr ',' a_expr json_passing_clause_opt
json_returning_clause_opt
json_wrapper_behavior
json_quotes_clause_opt
json_behavior_clause_opt
')'

| JSON_EXISTS '('
json_value_expr ',' a_expr json_passing_clause_opt
json_on_error_clause_opt
')'

| JSON_VALUE '('
json_value_expr ',' a_expr json_passing_clause_opt
json_returning_clause_opt
json_behavior_clause_opt
')'

json_format_clause_opt contains:
| FORMAT_LA JSON
{
$$ = (Node *) makeJsonFormat(JS_FORMAT_JSON, JS_ENC_DEFAULT, @1);
}


That means, all the context_item can specify "FORMAT JSON" options,
in the meantime, do we need to update these functions
synopsis/signature in the doc?

some examples:
create table a(b jsonb);
create table a1(b int4range);
select json_value(b format json, 'strict $[*]' DEFAULT 9 ON ERROR) from a;
select json_value(b format json, 'strict $[*]' DEFAULT 9 ON ERROR) from a1;
select json_value(text '"1"' format json, 'strict $[*]' DEFAULT 9 ON ERROR);


transformJsonValueExpr

/* Try to coerce to the target type. */
coerced = coerce_to_target_type(pstate, expr, exprtype,
targettype, -1,
COERCION_EXPLICIT,
COERCE_EXPLICIT_CAST,
location);

based on the function transformJsonValueExpr and subfunction
coerce_to_target_type,
for SQL/JSON query functions (JSON_EXISTS, JSON_QUERY, and JSON_VALUE)
the context_item requirement is any data type that not error out while
explicitly casting to jsonb in coerce_to_target_type.

I played around with it, I think these types can be used in context_item.
{char,text,bpchar,character varying } and these types of associated domains.
bytea data type too, but need specify "ENCODING UTF8".
e.g.
select json_value(bytea '"1"' format json ENCODING UTF8, 'strict $[*]'
DEFAULT 9 ON ERROR);


Maybe we can add some brief explanation in this para to explain more
about "context_item"
{
SQL/JSON functions JSON_EXISTS(), JSON_QUERY(), and JSON_VALUE()
described in Table 9.52 can be used to query JSON documents. Each of
these functions apply a path_expression (the query) to a context_item
(the document); see Section 9.16.2 for more details on what
path_expression can contain.
}




Re: cannot drop intarray extension

2024-06-02 Thread jian he
On Mon, Jun 3, 2024 at 12:14 PM jian he  wrote:
>
> hi.
>
>  setup
> drop table if exist test__int cascade;
> create extension intarray;
>
> CREATE TABLE test__int( a int[] );
> CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 
> 1));
> drop extension intarray cascade;
> NOTICE:  drop cascades to index text_idx
> 2024-06-03 11:53:32.629 CST [41165] ERROR:  cache lookup failed for
> function 17758
> 2024-06-03 11:53:32.629 CST [41165] STATEMENT:  drop extension intarray 
> cascade;
> ERROR:  cache lookup failed for function 17758
>

> 
extension (ltree, pg_trgm)  also have the same problem.

drop table if exists t2 cascade;
CREATE EXTENSION ltree;
CREATE TABLE t2 (t ltree);
create index tstidx on t2 using gist (t gist_ltree_ops(siglen=4));
drop extension ltree cascade;

drop table if exists t3 cascade;
CREATE EXTENSION pg_trgm;
CREATE TABLE t3(t text COLLATE "C");
create index trgm_idx on t3 using gist (t gist_trgm_ops(siglen=1));
drop extension pg_trgm cascade;

> 
extension hstore work as expected, no error.

drop table if exists t1 cascade;
create extension hstore;
CREATE TABLE t1 (h hstore);
create index hidx on t1 using gist(h gist_hstore_ops(siglen=1));
drop extension hstore cascade;

on the master branch. i didn't test on other branches.




cannot drop intarray extension

2024-06-02 Thread jian he
hi.

 setup
drop table if exist test__int cascade;
create extension intarray;

CREATE TABLE test__int( a int[] );
CREATE INDEX text_idx on test__int using gist (a gist__intbig_ops(siglen = 1));
drop extension intarray cascade;
NOTICE:  drop cascades to index text_idx
2024-06-03 11:53:32.629 CST [41165] ERROR:  cache lookup failed for
function 17758
2024-06-03 11:53:32.629 CST [41165] STATEMENT:  drop extension intarray cascade;
ERROR:  cache lookup failed for function 17758


backtrace info:
index_getprocinfo
#0  index_opclass_options (indrel=0x7faeca727b58, attnum=1,
attoptions=94372901674408, validate=false)
at 
../../Desktop/pg_src/src4/postgres/src/backend/access/index/indexam.c:1034
#1  0x55d4e63a79cb in RelationGetIndexAttOptions
(relation=0x7faeca727b58, copy=false)
at 
../../Desktop/pg_src/src4/postgres/src/backend/utils/cache/relcache.c:5872
#2  0x55d4e639d72d in RelationInitIndexAccessInfo (relation=0x7faeca727b58)
at 
../../Desktop/pg_src/src4/postgres/src/backend/utils/cache/relcache.c:1569
#3  0x55d4e639c5ac in RelationBuildDesc (targetRelId=24582, insertIt=true)
at 
../../Desktop/pg_src/src4/postgres/src/backend/utils/cache/relcache.c:1207
#4  0x55d4e639e9ce in RelationIdGetRelation (relationId=24582)
at 
../../Desktop/pg_src/src4/postgres/src/backend/utils/cache/relcache.c:2115
#5  0x55d4e5a412fd in relation_open (relationId=24582, lockmode=8)
at 
../../Desktop/pg_src/src4/postgres/src/backend/access/common/relation.c:58
#6  0x55d4e5ae6a06 in index_open (relationId=24582, lockmode=8)
at ../../Desktop/pg_src/src4/postgres/src/backend/access/index/indexam.c:137
#7  0x55d4e5be61b8 in index_drop (indexId=24582, concurrent=false,
concurrent_lock_mode=false)
at ../../Desktop/pg_src/src4/postgres/src/backend/catalog/index.c:2156

i guess it's because we first dropped the function g_intbig_options
then later we need it.




Re: POC: GROUP BY optimization

2024-06-02 Thread jian he
On Fri, May 31, 2024 at 8:12 AM Alexander Korotkov  wrote:
>
> I've revised some grammar including the sentence you've proposed.
>

-static List *groupclause_apply_groupingset(PlannerInfo *root, List *force);
+static List *preprocess_groupclause(PlannerInfo *root, List *force);

changing preprocess_groupclause the second argument
from "force" to "gset" would be more intuitive, I think.


`elog(ERROR, "Order of group-by clauses doesn't correspond incoming
sort order");`

I think this error message makes people wonder what "incoming sort order" is.
BTW, "correspond", generally people use  "correspond to".


I did some minor cosmetic changes, mainly changing foreach to foreach_node.
Please check the attachment.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6a1fb8e..801d9f6d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2852,15 +2852,13 @@ preprocess_groupclause(PlannerInfo *root, List *force)
 {
 	Query	   *parse = root->parse;
 	List	   *new_groupclause = NIL;
-	ListCell   *sl;
-	ListCell   *gl;
+	ListCell	*gl;
 
 	/* For grouping sets, we need to force the ordering */
-	if (force)
+	if (force != NIL)
 	{
-		foreach(sl, force)
+		foreach_int(ref, force)
 		{
-			Index		ref = lfirst_int(sl);
 			SortGroupClause *cl = get_sortgroupref_clause(ref, parse->groupClause);
 
 			new_groupclause = lappend(new_groupclause, cl);
@@ -2879,10 +2877,8 @@ preprocess_groupclause(PlannerInfo *root, List *force)
 	 *
 	 * This code assumes that the sortClause contains no duplicate items.
 	 */
-	foreach(sl, parse->sortClause)
+	foreach_node(SortGroupClause, sc, parse->sortClause)
 	{
-		SortGroupClause *sc = lfirst_node(SortGroupClause, sl);
-
 		foreach(gl, parse->groupClause)
 		{
 			SortGroupClause *gc = lfirst_node(SortGroupClause, gl);
@@ -2908,10 +2904,8 @@ preprocess_groupclause(PlannerInfo *root, List *force)
 	 * implemented using incremental sort.  Also, give up if there are any
 	 * non-sortable GROUP BY items, since then there's no hope anyway.
 	 */
-	foreach(gl, parse->groupClause)
+	foreach_node(SortGroupClause,gc, parse->groupClause)
 	{
-		SortGroupClause *gc = lfirst_node(SortGroupClause, gl);
-
 		if (list_member_ptr(new_groupclause, gc))
 			continue;			/* it matched an ORDER BY item */
 		if (!OidIsValid(gc->sortop))	/* give up, GROUP BY can't be sorted */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index af8de421..2ba297c1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -426,7 +426,7 @@ struct PlannerInfo
 	 * items to be proven redundant, implying that there is only one group
 	 * containing all the query's rows.  Hence, if you want to check whether
 	 * GROUP BY was specified, test for nonempty parse->groupClause, not for
-	 * nonempty processed_groupClause.  Optimiser chooses specific order of
+	 * nonempty processed_groupClause.  Optimizer chooses specific order of
 	 * group-by clauses during the upper paths generation process, attempting
 	 * to use different strategies to minimize number of sorts or engage
 	 * incremental sort.  See preprocess_groupclause() and


Re: meson and check-tests

2024-06-01 Thread jian he
On Sun, Jun 2, 2024 at 4:48 AM Tristan Partin  wrote:
>
> On Fri May 31, 2024 at 12:02 PM CDT, Ashutosh Bapat wrote:
> > Hi Tristan,
> > Using make I can run only selected tests under src/test/regress using
> > TESTS=... make check-tests. I didn't find any way to do that with meson.
> > meson test --suite regress runs all the regression tests.
> >
> > We talked this off-list at the conference. It seems we have to somehow
> > avoid passing pg_regress --schedule argument and instead pass the list of
> > tests. Any idea how to do that?
>
> I think there are 2 solutions to this.
>
> 1. Avoid passing --schedule by default, which doesn't sound like a great
>solution.
>
> 2. Teach pg_regress to ignore the --schedule option if specific tests
>are passed instead.
>
> 3. Add a --no-schedule option to pg_regress which would override the
>previously added --schedule option.
>
> I personally prefer 2 or 3.
>
> 2: meson test -C build regress/regress --test-args my_specific_test
> 3: meson test -C build regress/regress --test-args "--no-schedule 
> my_specific_test"
>
> Does anyone have an opinion?
>

if each individual sql file can be tested separately, will
`test: test_setup`?
be aslo executed as a prerequisite?



in
# --
# The first group of parallel tests
# --
test: boolean char name varchar text int2 int4 int8 oid float4 float8
bit numeric txid uuid enum money rangetypes pg_lsn regproc

# --
# The second group of parallel tests
# multirangetypes depends on rangetypes
# multirangetypes shouldn't run concurrently with type_sanity
# --
test: strings md5 numerology point lseg line box path polygon circle
date time timetz timestamp timestamptz interval inet macaddr macaddr8
multirangetypes


If we can test each individual sql file via meson, that would be great.
but based on the above comments, we need to preserve the specified order.
like:
TEST=rangetypes, multirangetypes

will first run rangetypes then multirangetypes.


can we tag/name each src/test/regress/parallel_schedule groups
like:
group1:test: boolean char name varchar text int2 int4 int8 oid float4
float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
group2:test: strings md5 numerology point lseg line box path polygon
circle date time timetz timestamp timestamptz interval inet macaddr
macaddr8 multirangetypes

Then, we can test each individual group.
TEST=group1, group2.




Re: remaining sql/json patches

2024-05-27 Thread jian he
On Mon, May 20, 2024 at 7:51 PM Amit Langote  wrote:
>
> Hi Thom,
>>
> > and I think we need to either remove the leading "select" keyword, or 
> > uppercase it in the examples.
> >
> > For example (on 
> > https://www.postgresql.org/docs/devel/functions-json.html#SQLJSON-QUERY-FUNCTIONS):
> >
> > json_exists ( context_item, path_expression [ PASSING { value AS varname } 
> > [, ...]] [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ])
> >
> > Returns true if the SQL/JSON path_expression applied to the context_item 
> > using the PASSING values yields any items.
> > The ON ERROR clause specifies the behavior if an error occurs; the default 
> > is to return the boolean FALSE value. Note that if the path_expression is 
> > strict and ON ERROR behavior is ERROR, an error is generated if it yields 
> > no items.
> > Examples:
> > select json_exists(jsonb '{"key1": [1,2,3]}', 'strict $.key1[*] ? (@ > 2)') 
> > → t
> > select json_exists(jsonb '{"a": [1,2,3]}', 'lax $.a[5]' ERROR ON ERROR) → f
> > select json_exists(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' ERROR ON ERROR) →
> > ERROR:  jsonpath array subscript is out of bounds
> >
> > Examples are more difficult to read when keywords appear to be at the same 
> > level as predicates.  Plus other examples within tables on the same page 
> > don't start with "select", and further down, block examples uppercase 
> > keywords.  Either way, I don't like it as it is.
>
> I agree that the leading SELECT should be removed from these examples.
> Also, the function names should be capitalized both in the syntax
> description and in the examples, even though other functions appearing
> on this page aren't.
>
> > Separate from this, I think these tables don't scan well (see json_query 
> > for an example of what I'm referring to).  There is no clear separation of 
> > the syntax definition, the description, and the example. This is more a 
> > matter for the website mailing list, but I'm expressing it here to check 
> > whether others agree.
>
> Hmm, yes, I think I forgot to put  around the syntax like
> it's done for a few other functions listed on the page.
>
> How about the attached?  Other than the above points, it removes the
>  tags from the description text of each function to turn it into
> a single paragraph, because the multi-paragraph style only seems to
> appear in this table and it's looking a bit weird now.  Though it's
> also true that the functions in this table have the longest
> descriptions.
>

 Note that scalar strings returned by json_value
 always have their quotes removed, equivalent to specifying
-OMIT QUOTES in json_query.
+OMIT QUOTES in JSON_QUERY.

"Note that scalar strings returned by json_value"
should be
"Note that scalar strings returned by JSON_VALUE"


generally  section no need indentation?

you removed  tag for description of JSON_QUERY, JSON_VALUE, JSON_EXISTS.
JSON_EXISTS is fine, but for
JSON_QUERY, JSON_VALUE, the description section is very long.
splitting it to 2 paragraphs should be better than just a single paragraph.

since we are in the top level table section: 
so there will be no ambiguity of what we are referring to.
one para explaining what this function does, and its return value,
one para having a detailed explanation should be just fine?




Re: struct RelOptInfo member relid comments

2024-05-23 Thread jian he
On Fri, May 24, 2024 at 11:14 AM Tom Lane  wrote:
>
> jian he  writes:
> > imho, the above comment is not very helpful.
> > we should say more about what kind of information relid says about a base 
> > rel?
>
> "Relid" is defined at the very top of the file:
>
> /*
>  * Relids
>  *  Set of relation identifiers (indexes into the rangetable).
>  */
> typedef Bitmapset *Relids;
>
> Repeating that everyplace the term "relid" appears would not be
> tremendously helpful.
>

`Index relid;`
is a relation identifier, a base rel's rangetable index.

Does the above description make sense?


BTW, I found several occurrences of "base+OJ", but I cannot find the
explanation of "OJ" or the "OJ" Acronyms.




struct RelOptInfo member relid comments

2024-05-23 Thread jian he
hi

typedef struct RelOptInfo
{

/*
* information about a base rel (not set for join rels!)
*/
Index relid;
...
}

imho, the above comment is not very helpful.
we should say more about what kind of information relid says about a base rel?

I don't know much about RelOptInfo, that's why I ask.




Re: POC: GROUP BY optimization

2024-05-23 Thread jian he
On Mon, May 20, 2024 at 4:54 PM jian he  wrote:
>
>
> The behavior is still the same as the master.
> meaning that below quoted queries are still using "Presorted Key: x".
>
> > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
> > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
> > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
> > > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
> > > the above part will use:
> > >->  Incremental Sort
> > >   Sort Key: x, $, $, $
> > >   Presorted Key: x
> > >   ->  Index Scan using t1_x_y_idx on t1
>
>
> > 2. Could you try to find the reason?
> the following are my understanding, it may be wrong...
>

I think
my previous thread only explained that two paths have the same cost,
the planner chooses the one that was first added into the pathlist.

but didn't explain why Incremental Sort path, presorted by one key and
presorted by two keys
yield the same cost.

if (rel->tuples > 0)
{
/*
* Clamp to size of rel, or size of rel / 10 if multiple Vars. The
* fudge factor is because the Vars are probably correlated but we
* don't know by how much.  We should never clamp to less than the
* largest ndistinct value for any of the Vars, though, since
* there will surely be at least that many groups.
*/
double clamp = rel->tuples;

if (relvarcount > 1)
{
clamp *= 0.1;
if (clamp < relmaxndistinct)
{
clamp = relmaxndistinct;
/* for sanity in case some ndistinct is too large: */
if (clamp > rel->tuples)
clamp = rel->tuples;
}
}
if (reldistinct > clamp)
reldistinct = clamp;
...
}

i think,
the above code[0] snippet within function estimate_num_groups
makes Incremental Sort Path not pickup the optimal presorted keys.

see original problem thread: [1]

CREATE TABLE t1 AS
SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;


EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
will generate 2 Incremental Sort path, one: "Presorted Key: x",
another one: "Presorted Key: x,y".
The first Incremental Sort path is added first.
The function estimate_num_groups returned value (input_groups) is the
main key to
calculate the cost of Incremental Sort path!

But here, the estimate_num_groups function returned the same
value: 10 for
"Presorted Key: x" and "Presorted Key: x,y".
then later cost_incremental_sort returns the same cost for "Presorted
Key: x" and "Presorted Key: x,y".

(line refers to src/backend/utils/adt/selfuncs line number).
why "Presorted Key: x,y" return 10 in estimate_num_groups:
line 3667 assign clamp to 100.0, because rel->tuples is 100.
line 3671 clamp *= 0.1; make clamp because 10.
line 3680, 3681  `if (reldistinct > clamp) branch` make reldistinct
from 100 to 10.
line 3733 make  `numdistinct *= reldistinct;` makes numdistinct because 10.



If I change the total number of rows in a relation, or change the
distinct number of y values
then the planner will use "Incremental Sort Presorted Key: x,y".
for example:

CREATE TABLE t10 AS
SELECT (i % 10)::numeric AS x,(i % 11)::int8 AS y,'abc' || i % 10
AS z, i::float4 AS w FROM generate_series(1, 1E2) AS i;
CREATE INDEX t10_x_y_idx ON t10 (x, y);
ANALYZE t10;

The above setup will make the following query using "Incremental Sort
Presorted Key: x,y".
EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,z,y,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,w,y,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,z,w,y;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t10 GROUP BY x,w,z,y;


summary:
* the regression test setup at [2] can make some cases not pickup the
best optimal
Incremental Sort path.

we use this test in src/test/regress/sql/aggregates.sql
-- Engage incremental sort
EXPLAIN (COSTS OFF)
SELECT count(*) FROM btg GROUP BY z, y, w, x;

but if we use:
EXPLAIN (COSTS OFF)
SELECT count(*) FROM btg GROUP BY x, z, w, y;
then the plan is not the best.


* The regression test setup makes estimate_num_groups code logic
return the same value.
therefore make Incremental Sort presort by one key, two keys yield the
same cost.
the cost_incremental_sort will return the same cost.

* 
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/aggregates.sql#n1194
maybe we can change:

CREATE TABLE btg AS SELECT
  i % 10 AS x,
  i % 10 AS y,
  'abc' || i % 10 AS z,
  i AS w
FROM generate_series(1, 100) AS i;

to

CREATE TABLE btg AS SELECT
  i % 10 AS x,

Re: in parentesis is not usual on DOCs

2024-05-22 Thread jian he
On Wed, May 22, 2024 at 7:14 PM Peter Eisentraut  wrote:
>
> On 20.05.24 02:00, jian he wrote:
> >> removing parentheses means we need to rephrase this sentence?
> >> So I come up with the following rephrase:
> >>
> >> The context_item specifies the input data to query, the
> >> path_expression is a JSON path expression defining the query,
> >> json_path_name is an optional name for the path_expression. The
> >> optional PASSING clause can provide data values to the
> >> path_expression.
> >
> > Based on this, write a simple patch.
>
> Your patch kind of messes up the indentation of the text you are
> changing.  Please check that.

please check attached.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 17c44bc3..485f1822 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18956,10 +18956,10 @@ where json_table_column is:
 
 
 
- The input data to query (context_item),
- the JSON path expression defining the query (path_expression)
- with an optional name (json_path_name), and an
- optional PASSING clause, which can provide data values
+ The context_item specifies the input data to query,
+ the path_expression is a JSON path expression defining the query,
+ json_path_name is an optional name for the path_expression.
+ The optional PASSING clause can provide data values
  to the path_expression.  The result of the input
  data evaluation using the aforementioned elements is called the
  row pattern, which is used as the source for row


doc regexp_replace replacement string \n does not explained properly

2024-05-20 Thread jian he
hi.

https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
<<

Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-20 Thread jian he
On Mon, May 20, 2024 at 5:35 AM Jonathan S. Katz  wrote:
>
> On 5/15/24 9:45 PM, Jonathan S. Katz wrote:
> > Hi,
> >
> > Attached is a copy of the PostgreSQL 17 Beta 1 release announcement
> > draft. This contains a user-facing summary of some of the features that
> > will be available in the Beta, as well as a call to test. I've made an
> > effort to group them logically around the different workflows they affect.
> >
> > A few notes:
> >
> > * The section with the features is not 80-char delimited. I will do that
> > before the final copy
> >
> > * There is an explicit callout that we've added in the SQL/JSON features
> > that were previously reverted in PG15. I want to ensure we're
> > transparent about that, but also use it as a hook to get people testing.
> >
> > When reviewing:
> >
> > * Please check for correctness of feature descriptions, keeping in mind
> > this is targeted for a general audience
> >
> > * Please indicate if you believe there's a notable omission, or if we
> > should omit any of these callouts
> >
> > * Please indicate if a description is confusing - I'm happy to rewrite
> > to ensure it's clearer.
> >
> > Please provide feedback no later than Wed 2024-05-22 18:00 UTC. As the
> > beta release takes some extra effort, I want to ensure all changes are
> > in with time to spare before release day.
>
> Thanks for all the feedback to date. Please see the next revision.
> Again, please provide feedback no later than 2024-05-22 18:00 UTC.
>
> Thanks,
>
> Jonathan
>

release note (https://momjian.us/pgsql_docs/release-17.html)
is
"Add jsonpath methods to convert JSON values to other JSON data types
(Jeevan Chalke)"


>> Additionally, PostgreSQL 17 adds more functionality to its `jsonpath` 
>> implementation, including the ability to convert JSON values to different 
>> data types.
so, I am not sure this is 100% correct.

Maybe we can rephrase it like:

>> Additionally, PostgreSQL 17 adds more functionality to its `jsonpath` 
>> implementation, including the ability to convert JSON values to other JSON 
>> data types.




Re: POC: GROUP BY optimization

2024-05-20 Thread jian he
On Thu, May 16, 2024 at 3:47 PM Andrei Lepikhov
 wrote:
>
> On 24.04.2024 13:25, jian he wrote:
> > hi.
> > I found an interesting case.
> >
> > CREATE TABLE t1 AS
> >SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
> > z, i::int4 AS w
> >FROM generate_series(1, 100) AS i;
> > CREATE INDEX t1_x_y_idx ON t1 (x, y);
> > ANALYZE t1;
> > SET enable_hashagg = off;
> > SET enable_seqscan = off;
> >
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
> > the above part will use:
> >->  Incremental Sort
> >   Sort Key: x, $, $, $
> >   Presorted Key: x
> >   ->  Index Scan using t1_x_y_idx on t1
> >
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z;
> >
> > these will use:
> >->  Incremental Sort
> >   Sort Key: x, y, $, $
> >   Presorted Key: x, y
> >   ->  Index Scan using t1_x_y_idx on t1
> >
> > I guess this is fine, but not optimal?
> It looks like a bug right now - in current implementation we don't
> differentiate different orders. So:
> 1. Applying all the patches from the thread which I proposed as an
> answer to T.Lane last rebuke - does behavior still the same?.

I've applied these 4 patches in  this thread.
final_improvements.patch
minor_comment.patch
get_useful_group_keys_orderings.patch
group_by_asymmetry.diff

The behavior is still the same as the master.
meaning that below quoted queries are still using "Presorted Key: x".

> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
> > EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
> > the above part will use:
> >->  Incremental Sort
> >   Sort Key: x, $, $, $
> >   Presorted Key: x
> >   ->  Index Scan using t1_x_y_idx on t1


> 2. Could you try to find the reason?
the following are my understanding, it may be wrong...

function get_useful_group_keys_orderings, may generate alternative
pathkeys and group clauses.
for different PathKeyInfo
sub functions within add_paths_to_grouping_rel:
make_ordered_path, create_agg_path will yield the same cost.
function add_path (within add_paths_to_grouping_rel) will add these
paths (different PathKeyInfos) in a *sequential* order.

then later in the function set_cheapest.
`foreach(p, parent_rel->pathlist)`  will iterate these different paths
in a sequential order.
so in set_cheapest if 2 path costs are the same, then the first path
residing in parent_rel->pathlist wins.


fo a concrete example:
CREATE TABLE t1 AS
SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::float4 AS w FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;
EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY x,z,y,w;

add_paths_to_grouping_rel will do the following in a sequential order.
A. generate and add original PathKeyInfo(groupby x,z,y,w)
B. generate and add alternative PathKeyInfo(groupby x,y,z,w). (because
of the index path)
C. can_hash is true, generate a HashAgg Path (because of
enable_hashagg is set to off, so this path the cost is very high)

As mentioned previously,
both A and B can use Incremental Sort, set_cheapest will choose the A
instead of B,
because A step generated path **first** satisfies increment sort.




Re: Underscore in positional parameters?

2024-05-19 Thread jian he
On Sun, May 19, 2024 at 10:43 PM Erik Wienhold  wrote:
>
> On 2024-05-19 07:00 +0200, Alexander Lakhin wrote:
> > I encountered anomalies that you address with this patch too.
> > And I can confirm that it fixes most cases, but there is another one:
> > SELECT $3 \bind 'foo' \g
> > ERROR:  invalid memory alloc request size 12
> >
> > Maybe you would find this worth fixing as well.
>
> Yes, that error message is not great.  In variable_paramref_hook we
> check paramno > INT_MAX/sizeof(Oid) when in fact MaxAllocSize/sizeof(Oid)
> is the more appropriate limit to avoid that unspecific alloc size error.
>
> Fixed in v4 with a separate patch because it's unrelated to the param
> number parsing.  But it fits nicely into the broader issue on the upper
> limit for param numbers.  Note that $268435455 is still the largest
> possible param number ((2^30-1)/4) and that we just return a more
> user-friendly error message for params beyond that limit.
>

hi, one minor issue:

/* Check parameter number is in range */
if (paramno <= 0 || paramno > MaxAllocSize / sizeof(Oid))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_PARAMETER),
errmsg("there is no parameter $%d", paramno),
parser_errposition(pstate, pref->location)));

if paramno <= 0 then "there is no parameter $%d" makes sense to me.

but, if paramno > 0 why not just say, we can only allow MaxAllocSize /
sizeof(Oid) number of parameters.




Re: in parentesis is not usual on DOCs

2024-05-19 Thread jian he
On Thu, May 16, 2024 at 12:14 PM jian he  wrote:
>
> On Wed, May 15, 2024 at 8:34 PM Daniel Gustafsson  wrote:
> >
> > > On 15 May 2024, at 14:04, Marcos Pegoraro  wrote:
> >
> > > Why (context_item), (path_expression) and (json_path_name) are inside a 
> > > parentheses ? This is not usual when explaining any other feature.
> >
> > Agreed, that's inconsisent with how for example json_table_column is 
> > documented
> > in the next list item under COLUMNS.  Unless objected to I will remove these
> > parenthesis.
> >
>
> >>  The input data to query (context_item), the JSON path expression defining 
> >> the query (path_expression) with an optional name (json_path_name)
>
> i think the parentheses is for explaining that
> context_item refers "The input data to query";
> path_expression refers "the JSON path expression defining the query";
> json_path_name refers to "an optional name";
>
>


> removing parentheses means we need to rephrase this sentence?
> So I come up with the following rephrase:
>
> The context_item specifies the input data to query, the
> path_expression is a JSON path expression defining the query,
> json_path_name is an optional name for the path_expression. The
> optional PASSING clause can provide data values to the
> path_expression.

Based on this, write a simple patch.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 17c44bc3..f9047a8e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18956,10 +18956,10 @@ where json_table_column is:
 
 
 
- The input data to query (context_item),
- the JSON path expression defining the query (path_expression)
- with an optional name (json_path_name), and an
- optional PASSING clause, which can provide data values
+The context_item specifies the input data to query,
+the path_expression is a JSON path expression defining the query,
+json_path_name is an optional name for the path_expression.
+The optional PASSING clause can provide data values
  to the path_expression.  The result of the input
  data evaluation using the aforementioned elements is called the
  row pattern, which is used as the source for row


Re: First draft of PG 17 release notes

2024-05-17 Thread jian he
On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html
>
> It will be improved until the final release.  The item count is 188,
> which is similar to recent releases:
>

This thread mentioned performance.
actually this[1] refactored some interval aggregation related functions,
which will make these two aggregate function: avg(interval), sum(interval)
run faster, especially avg(interval).
see [2].
well, I guess, this is a kind of niche performance improvement to be
mentioned separately.


these 3 items need to be removed, because of
https://git.postgresql.org/cgit/postgresql.git/commit/?id=8aee330af55d8a759b2b73f5a771d9d34a7b887f

>> Add stratnum GiST support function (Paul A. Jungwirth)

>> Allow foreign keys to reference WITHOUT OVERLAPS primary keys (Paul A. 
>> Jungwirth)
>> The keyword PERIOD is used for this purpose.

>> Allow PRIMARY KEY and UNIQUE constraints to use WITHOUT OVERLAPS for 
>> non-overlapping exclusion constraints (Paul A. Jungwirth)


[1] 
https://git.postgresql.org/cgit/postgresql.git/commit/?id=519fc1bd9e9d7b408903e44f55f83f6db30742b7
[2] 
https://www.postgresql.org/message-id/CAEZATCUJ0xjyQUL7SHKxJ5a%2BDm5pjoq-WO3NtkDLi6c76rh58w%40mail.gmail.com




remove Todo item: Allow infinite intervals just like infinite timestamps

2024-05-17 Thread jian he
hi.

https://wiki.postgresql.org/wiki/Todo
Dates and Times[edit]
Allow infinite intervals just like infinite timestamps
https://www.postgresql.org/message-id/4eb095c8.1050...@agliodbs.com

this done at
https://git.postgresql.org/cgit/postgresql.git/commit/?id=519fc1bd9e9d7b408903e44f55f83f6db30742b7

Should we remove this item?




Re: First draft of PG 17 release notes

2024-05-16 Thread jian he
On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html
>

>> Add jsonpath methods to convert JSON values to different data types (Jeevan 
>> Chalke)
>> The jsonpath methods are .bigint(), .boolean(), .date(), .decimal([precision 
>> [, scale]]), .integer(), .number(), .string(), .time(), .time_tz(), 
>> .timestamp(), and .timestamp_tz().

I think it's slightly incorrect.

for example:
select jsonb_path_query('"2023-08-15"', '$.date()');
I think it does is trying to validate json value "2023-08-15" can be a
date value, if so, print json string out, else error out.


"convert JSON values to different data types"
meaning that we are converting json values to another data type, date?




Re: in parentesis is not usual on DOCs

2024-05-15 Thread jian he
On Wed, May 15, 2024 at 8:34 PM Daniel Gustafsson  wrote:
>
> > On 15 May 2024, at 14:04, Marcos Pegoraro  wrote:
>
> > Why (context_item), (path_expression) and (json_path_name) are inside a 
> > parentheses ? This is not usual when explaining any other feature.
>
> Agreed, that's inconsisent with how for example json_table_column is 
> documented
> in the next list item under COLUMNS.  Unless objected to I will remove these
> parenthesis.
>

>>  The input data to query (context_item), the JSON path expression defining 
>> the query (path_expression) with an optional name (json_path_name)

i think the parentheses is for explaining that
context_item refers "The input data to query";
path_expression refers "the JSON path expression defining the query";
json_path_name refers to "an optional name";



removing parentheses means we need to rephrase this sentence?
So I come up with the following rephrase:

The context_item specifies the input data to query, the
path_expression is a JSON path expression defining the query,
json_path_name is an optional name for the path_expression. The
optional PASSING clause can provide data values to the
path_expression.




Re: First draft of PG 17 release notes

2024-05-15 Thread jian he
On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html
>

>> Add local I/O block read/write timing statistics columns of 
>> pg_stat_statement (Nazir Bilal Yavuz)
>> The new columns are "local_blk_read_time" and "local_blk_write_time".
here, "pg_stat_statement" should be "pg_stat_statements"?


>> Add optional fourth parameter to pg_stat_statements_reset() to allow for the 
>> resetting of only min/max statistics (Andrei Zubkov)
>> This parameter defaults to "false".
here, "parameter",  should be "argument"?

maybe
>> Add optional fourth boolean argument (minmax_only) to 
>> pg_stat_statements_reset() to allow for the resetting of only min/max 
>> statistics (Andrei Zubkov)
>> This argument defaults to "false".

in section: E.1.2. Migration to Version 17

>> Rename I/O block read/write timing statistics columns of pg_stat_statement 
>> (Nazir Bilal Yavuz)
>> This renames "blk_read_time" to "shared_blk_read_time", and "blk_write_time" 
>> to "shared_blk_write_time".

"pg_stat_statement" should be "pg_stat_statements"?

also, we only mentioned, pg_stat_statements some columns name changed
in "E.1.2. Migration to Version 17"
but if you look at the release note pg_stat_statements section,
we added a bunch of columns, which are far more incompatibile than
just colunm name changes.

not sure we need add these in section "E.1.2. Migration to Version 17"




Re: explain format json, unit for serialize and memory are different.

2024-05-14 Thread jian he
On Wed, May 15, 2024 at 10:13 AM David Rowley  wrote:
>
> On Wed, 15 May 2024 at 13:44, jian he  wrote:
> >"Shared Hit Blocks": 0,
> >"Shared Read Blocks": 0,
> >"Shared Dirtied Blocks": 0,
> >"Shared Written Blocks": 0,
> >"Local Hit Blocks": 0,
> >"Local Read Blocks": 0,
> >"Local Dirtied Blocks": 0,
> >"Local Written Blocks": 0,
> >"Temp Read Blocks": 0,
> >"Temp Written Blocks": 0
> >
> > these information duplicated for json key "Serialization" and json key
> > "Planning"
> > i am not sure this is intended?
>
> Looks ok to me.  Buffers used during planning are independent from the
> buffers used when outputting rows to the client.
>

looking at serializeAnalyzeReceive.
I am not sure which part of serializeAnalyzeReceive will update pgBufferUsage.

I am looking for an example where this information under json key
"Serialization" is not zero.
So far I have tried:

create table s(a text);
insert into s select repeat('a', 1024) from generate_series(1,1024);
explain (format json, analyze, wal, buffers, memory, serialize, timing
off)  select * from s;




Re: explain format json, unit for serialize and memory are different.

2024-05-14 Thread jian he
explain (format json, analyze, wal, buffers, memory, serialize) insert
into tenk1 select * from tenk1 limit 1;
  QUERY PLAN
---
 [
   {
 "Plan": {
   "Node Type": "ModifyTable",
   "Operation": "Insert",
   "Parallel Aware": false,
   "Async Capable": false,
   "Relation Name": "tenk1",
   "Alias": "tenk1",
   "Startup Cost": 0.00,
   "Total Cost": 0.04,
   "Plan Rows": 0,
   "Plan Width": 0,
   "Actual Startup Time": 0.030,
   "Actual Total Time": 0.030,
   "Actual Rows": 0,
   "Actual Loops": 1,
   "Shared Hit Blocks": 3,
   "Shared Read Blocks": 0,
   "Shared Dirtied Blocks": 0,
   "Shared Written Blocks": 0,
   "Local Hit Blocks": 0,
   "Local Read Blocks": 0,
   "Local Dirtied Blocks": 0,
   "Local Written Blocks": 0,
   "Temp Read Blocks": 0,
   "Temp Written Blocks": 0,
   "WAL Records": 1,
   "WAL FPI": 0,
   "WAL Bytes": 299,
   "Plans": [
 {
   "Node Type": "Limit",
   "Parent Relationship": "Outer",
   "Parallel Aware": false,
   "Async Capable": false,
   "Startup Cost": 0.00,
   "Total Cost": 0.04,
   "Plan Rows": 1,
   "Plan Width": 244,
   "Actual Startup Time": 0.011,
   "Actual Total Time": 0.011,
   "Actual Rows": 1,
   "Actual Loops": 1,
   "Shared Hit Blocks": 2,
   "Shared Read Blocks": 0,
   "Shared Dirtied Blocks": 0,
   "Shared Written Blocks": 0,
   "Local Hit Blocks": 0,
   "Local Read Blocks": 0,
   "Local Dirtied Blocks": 0,
   "Local Written Blocks": 0,
   "Temp Read Blocks": 0,
   "Temp Written Blocks": 0,
   "WAL Records": 0,
   "WAL FPI": 0,
   "WAL Bytes": 0,
   "Plans": [
 {
   "Node Type": "Seq Scan",
   "Parent Relationship": "Outer",
   "Parallel Aware": false,
   "Async Capable": false,
   "Relation Name": "tenk1",
   "Alias": "tenk1_1",
   "Startup Cost": 0.00,
   "Total Cost": 445.00,
   "Plan Rows": 1,
   "Plan Width": 244,
   "Actual Startup Time": 0.009,
   "Actual Total Time": 0.009,
   "Actual Rows": 1,
   "Actual Loops": 1,
   "Shared Hit Blocks": 2,
   "Shared Read Blocks": 0,
   "Shared Dirtied Blocks": 0,
   "Shared Written Blocks": 0,
   "Local Hit Blocks": 0,
   "Local Read Blocks": 0,
   "Local Dirtied Blocks": 0,
   "Local Written Blocks": 0,
   "Temp Read Blocks": 0,
   "Temp Written Blocks": 0,
   "WAL Records": 0,
   "WAL FPI": 0,
   "WAL Bytes": 0
 }
   ]
 }
   ]
 },
 "Planning": {
   "Shared Hit Blocks": 0,
   "Shared Read Blocks": 0,
   "Shared Dirtied Blocks": 0,
   "Shared Written Blocks": 0,
   "Local Hit Blocks": 0,
   "Local Read Blocks": 0,
   "Local Dirtied Blocks": 0,
   "Local Written Blocks": 0,
   "Temp Read Blocks": 0,
   "Temp Written Blocks": 0,
   "Memory Used": 68080,
   "Memory Allocated": 131072
 },
 "Planning Time": 0.659,
 "Triggers": [
 ],
 "Serialization": {
   "Time": 0.000,
   "Output Volume": 0,
   "Format": "text",
   "Shared Hit Blocks": 0,
   "Shared Read Blocks": 0,
   "Shared Dirtied Blocks": 0,
   "Shared Written Blocks": 0,
   "Local Hit Blocks": 0,
   "Local Read Blocks": 0,
   "Local Dirtied Blocks": 0,
   "Local Written Blocks": 0,
   "Temp Read Blocks": 0,
   "Temp Written Blocks": 0
 },
 "Execution Time": 0.065
   }
 ]

---
   "Shared Hit Blocks": 0,
   "Shared Read Blocks": 0,
   "Shared Dirtied Blocks": 0,
   "Shared Written Blocks": 0,
   "Local Hit Blocks": 0,
   "Local Read Blocks": 0,
   "Local Dirtied Blocks": 0,
   "Local Written Blocks": 0,
   "Temp Read Blocks": 0,
   "Temp Written Blocks": 0

these information duplicated for json key "Serialization" and json key
"Planning"
i am not sure this is intended?




Re: explain format json, unit for serialize and memory are different.

2024-05-14 Thread jian he
On Tue, May 14, 2024 at 6:33 PM David Rowley  wrote:
>
> On Tue, 14 May 2024 at 18:16, David Rowley  wrote:
> > I think for v17, we should consider adding a macro to explain.c to
> > calculate the KB from bytes.  There are other inconsistencies that it
> > would be good to address. We normally round up to the nearest kilobyte
> > with (bytes + 1023) / 1024, but if you look at what 06286709e did, it
> > seems to be rounding to the nearest without any apparent justification
> > as to why.  It does (metrics->bytesSent + 512) / 1024.
> >
> > show_memory_counters() could be modified to use the macro and show
> > kilobytes rather than bytes.
>
> Here's a patch for that.
>
> I checked the EXPLAIN SERIALIZE thread and didn't see any mention of
> the + 512 thing. It seems Tom added it just before committing and no
> patch ever made it to the mailing list with + 512. The final patch on
> the list is in [1].
>
> For the EXPLAIN MEMORY part, the bytes vs kB wasn't discussed.  The
> closest the thread came to that was what Abhijit mentioned in [2].



static void
show_memory_counters(ExplainState *es, const MemoryContextCounters
*mem_counters)
{
if (es->format == EXPLAIN_FORMAT_TEXT)
{
ExplainIndentText(es);
appendStringInfo(es->str,
"Memory: used=%zukB  allocated=%zukB",
BYTES_TO_KILOBYTES(mem_counters->totalspace - mem_counters->freespace),
BYTES_TO_KILOBYTES(mem_counters->totalspace));
appendStringInfoChar(es->str, '\n');
}
else
{
ExplainPropertyInteger("Memory Used", "bytes",
   mem_counters->totalspace - mem_counters->freespace,
   es);
ExplainPropertyInteger("Memory Allocated", "bytes",
   mem_counters->totalspace, es);
}
}

the "else" branch, also need to apply BYTES_TO_KILOBYTES marco?
otherwise, it's inconsistent?

> I also adjusted some inconsistencies around spaces between the digits
> and kB.  In other places in EXPLAIN we write "100kB" not "100 kB".  I
> see we print times with a space ("Execution Time: 1.719 ms"), so we're
> not very consistent overall, but since the EXPLAIN MEMORY is new, it
> makes sense to change it now to be aligned to the other kB stuff in
> explain.c
>
> The patch does change a long into an int64 in show_hash_info(). I
> wondered if that part should be backpatched.  It does not seem very
> robust to me to divide a Size by 1024 and expect it to fit into a
> long. With MSVC 64 bit, sizeof(Size) == 8 and sizeof(long) == 4.  I
> understand work_mem is limited to 2GB on that platform, but it does
> not seem like a good reason to use a long.
>

I also checked output
from function show_incremental_sort_group_info and show_sort_info,
the "kB" usage is consistent.




Bibliography section, some references cannot be found

2024-05-13 Thread jian he
hi.

while reading this[1],
<< More information about partial indexes can be found in [ston89b],
[olson93], and [seshadri95].
I googled around, still cannot find [olson93] related pdf or html link.


in [2],
I found out
[ong90] “A Unified Framework for Version Modeling Using Production
Rules in a Database System”. L. Ong and J. Goh. ERL Technical
Memorandum M90/33. University of California. Berkeley, California.
April, 1990.
related link is
https://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/1466.html


an idea:
In case these external links in
(https://www.postgresql.org/docs/current/biblio.html) become dead
links,
we can send these external pdf or html files in the mailing list for
archive purposes.

[1] https://www.postgresql.org/docs/current/indexes-partial.html
[2] https://www.postgresql.org/docs/current/biblio.html




Re: SQL:2011 application time

2024-05-13 Thread jian he
On Tue, May 14, 2024 at 7:30 AM Paul Jungwirth
 wrote:
>
> On 5/13/24 03:11, Peter Eisentraut wrote:
> > It looks like we missed some of these fundamental design questions early 
> > on, and it might be too
> > late now to fix them for PG17.
> >
> > For example, the discussion on unique constraints misses that the question 
> > of null values in unique
> > constraints itself is controversial and that there is now a way to change 
> > the behavior.  So I
> > imagine there is also a selection of possible behaviors you might want for 
> > empty ranges.
> > Intuitively, I don't think empty ranges are sensible for temporal unique 
> > constraints.  But anyway,
> > it's a bit late now to be discussing this.
> >
> > I'm also concerned that if ranges have this fundamental incompatibility 
> > with periods, then the plan
> > to eventually evolve this patch set to support standard periods will also 
> > have as-yet-unknown problems.
> >
> > Some of these issues might be design flaws in the underlying mechanisms, 
> > like range types and
> > exclusion constraints.  Like, if you're supposed to use this for scheduling 
> > but you can use empty
> > ranges to bypass exclusion constraints, how is one supposed to use this?  
> > Yes, a check constraint
> > using isempty() might be the right answer.  But I don't see this documented 
> > anywhere.
> >
> > On the technical side, adding an implicit check constraint as part of a 
> > primary key constraint is
> > quite a difficult implementation task, as I think you are discovering.  I'm 
> > just reminded about how
> > the patch for catalogued not-null constraints struggled with linking these 
> > not-null constraints to
> > primary keys correctly.  This sounds a bit similar.
> >
> > I'm afraid that these issues cannot be resolved in good time for this 
> > release, so we should revert
> > this patch set for now.
>
> I think reverting is a good idea. I'm not really happy with the CHECK 
> constraint solution either.
> I'd be happy to have some more time to rework this for v18.
>
> A couple alternatives I'd like to explore:
>
> 1. Domain constraints instead of a CHECK constraint. I think this is probably 
> worse, and I don't
> plan to spend much time on it, but I thought I'd mention it in case someone 
> else thought otherwise.
>
> 2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' 
> is true. But 'empty'
> with anything else could still be false (or not). That operator would prevent 
> duplicates in an
> exclusion constraint. This also means we could support more types than just 
> ranges & multiranges. I
> need to think about whether this combines badly with existing operators, but 
> if not it has a lot of
> promise. If anything it might be *less* contradictory, because it fits better 
> with 'empty' @>
> 'empty', which we say is true.
>
thanks for the idea, I roughly played around with it, seems doable.
but the timing seems not good, reverting is a good idea.


I also checked the commit. 6db4598fcb82a87a683c4572707e522504830a2b
+
+/*
+ * Returns the btree number for equals, otherwise invalid.
+ */
+Datum
+gist_stratnum_btree(PG_FUNCTION_ARGS)
+{
+ StrategyNumber strat = PG_GETARG_UINT16(0);
+
+ switch (strat)
+ {
+ case RTEqualStrategyNumber:
+ PG_RETURN_UINT16(BTEqualStrategyNumber);
+ case RTLessStrategyNumber:
+ PG_RETURN_UINT16(BTLessStrategyNumber);
+ case RTLessEqualStrategyNumber:
+ PG_RETURN_UINT16(BTLessEqualStrategyNumber);
+ case RTGreaterStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterStrategyNumber);
+ case RTGreaterEqualStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterEqualStrategyNumber);
+ default:
+ PG_RETURN_UINT16(InvalidStrategy);
+ }
+}
the comments seem not right?




explain format json, unit for serialize and memory are different.

2024-05-13 Thread jian he
hi.

explain(analyze, format json, serialize, memory, costs  off, Timing
off) select * from tenk1;
   QUERY PLAN
-
 [
   {
 "Plan": {
   "Node Type": "Seq Scan",
   "Parallel Aware": false,
   "Async Capable": false,
   "Relation Name": "tenk1",
   "Alias": "tenk1",
   "Actual Rows": 1,
   "Actual Loops": 1
 },
 "Planning": {
   "Memory Used": 23432,
   "Memory Allocated": 65536
 },
 "Planning Time": 0.290,
 "Triggers": [
 ],
 "Serialization": {
   "Output Volume": 1143,
   "Format": "text"
 },
 "Execution Time": 58.814
   }
 ]

explain(analyze, format text, serialize, memory, costs  off, Timing
off) select * from tenk1;
QUERY PLAN
---
 Seq Scan on tenk1 (actual rows=1 loops=1)
 Planning:
   Memory: used=23432 bytes  allocated=65536 bytes
 Planning Time: 0.289 ms
 Serialization: output=1143kB  format=text
 Execution Time: 58.904 ms

under format json,  "Output Volume": 1143,
1143 is kiB unit, and is not the same as "Memory Used" or "Memory
Allocated" byte unit.

Do we need to convert it to byte for the non-text format option for EXPLAIN?




Re: SQL:2011 application time

2024-05-11 Thread jian he
On Mon, May 6, 2024 at 11:01 AM jian he  wrote:
>
> On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth
>  wrote:
> >
> > On 4/30/24 09:24, Robert Haas wrote:
> > > Peter, could you have a look at
> > > http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com
> > > and express an opinion about whether each of those proposals are (a)
> > > good or bad ideas and (b) whether they need to be fixed for the
> > > current release?
> >
> > Here are the same patches but rebased. I've added a fourth which is my 
> > progress on adding the CHECK
> > constraint. I don't really consider it finished though, because it has 
> > these problems:
> >
> > - The CHECK constraint should be marked as an internal dependency of the 
> > PK, so that you can't drop
> > it, and it gets dropped when you drop the PK. I don't see a good way to tie 
> > the two together though,
> > so I'd appreciate any advice there. They are separate AlterTableCmds, so 
> > how do I get the
> > ObjectAddress of both constraints at the same time? I wanted to store the 
> > PK's ObjectAddress on the
> > Constraint node, but since ObjectAddress isn't a Node it doesn't work.
> >
>
> hi.
> I hope I understand the problem correctly.
> my understanding is that we are trying to solve a corner case:
> create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS));
> insert into t values ('[1,2]','empty'), ('[1,2]','empty');
>


but we still not yet address for cases like:
create table t10(a int4range, b int4range, unique (a, b WITHOUT OVERLAPS));
insert into t10 values ('[1,2]','empty'), ('[1,2]','empty');

one table can have more than one temporal unique constraint,
for each temporal unique constraint adding a check isempty constraint
seems not easy.

for example:
CREATE TABLE t (
id int4range,
valid_at daterange,
parent_id int4range,
CONSTRAINT t1 unique (id, valid_at WITHOUT OVERLAPS),
CONSTRAINT t2 unique (parent_id, valid_at WITHOUT OVERLAPS),
CONSTRAINT t3 unique (valid_at, id WITHOUT OVERLAPS),
CONSTRAINT t4 unique (parent_id, id WITHOUT OVERLAPS),
CONSTRAINT t5 unique (id, parent_id WITHOUT OVERLAPS),
CONSTRAINT t6 unique (valid_at, parent_id WITHOUT OVERLAPS)
);
add 6 check isempty constraints for table "t"  is challenging.

so far, I see the challenging part:
* alter table alter column data type does not drop previous check
isempty constraint, and will also add a check isempty constraint,
so overall it will add more check constraints.
* adding more check constraints needs a  way to resolve naming collisions.

Maybe we can just mention that the special 'empty' range value makes
temporal unique constraints not "unique".

also we can make sure that
FOREIGN KEY can only reference primary keys, not unique temporal constraints.
so the unique temporal constraints not "unique" implication is limited.
I played around with it, we can error out these cases in the function
transformFkeyCheckAttrs.




Re: First draft of PG 17 release notes

2024-05-09 Thread jian he
On Thu, May 9, 2024 at 11:12 PM Bruce Momjian  wrote:
>
> On Thu, May  9, 2024 at 07:49:55PM +0800, jian he wrote:
> > On Thu, May 9, 2024 at 6:53 PM jian he  wrote:
> > >
> > > On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
> > > > I have committed the first draft of the PG 17 release notes;  you can
> > > > see the results here:
> > > >
> > > > https://momjian.us/pgsql_docs/release-17.html
> >

E.1.3.1.5. Privileges
Add per-table GRANT permission MAINTAIN to control maintenance
operations (Nathan Bossart)

The operations are VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZE VIEW,
CLUSTER, and LOCK TABLE.

Add user-grantable role pg_maintain to control maintenance operations
(Nathan Bossart)

The operations are VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZE VIEW,
CLUSTER, and LOCK TABLE.

Allow roles with pg_monitor privileges to execute pg_current_logfile()
(Pavlo Golub, Nathan Bossart)
---
should be "REFRESH MATERIALIZED VIEW"?

also
"Allow roles with pg_monitor privileges to execute
pg_current_logfile() (Pavlo Golub, Nathan Bossart)"
"pg_monitor" is a predefined role, so technically, "with pg_monitor
privileges" is not correct?
--
Add function XMLText() to convert text to a single XML text node (Jim Jones)

XMLText()
should be
xmltext()
--
Add function to_regtypemod() to return the typemod of a string (David
Wheeler, Erik Wienhold)
I think this description does not mean the same thing as the doc[1]

[1] 
https://www.postgresql.org/docs/devel/functions-info.html#FUNCTIONS-INFO-CATALOG
--

Allow GROUP BY columns to be internally ordered to match ORDER BY
(Andrei Lepikhov, Teodor Sigaev)
This can be disabled using server variable enable_group_by_reordering.

Probably
`This can be disabled by setting the server variable
enable_group_by_reordering to false`.




Re: First draft of PG 17 release notes

2024-05-09 Thread jian he
On Thu, May 9, 2024 at 6:53 PM jian he  wrote:
>
> On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html

< Add columns to pg_stats to report range histogram information (Egor
Rogov, Soumyadeep Chakraborty)
I think this applies to range type and multi range type, "range
histogram information" seems not very clear to me.
So maybe:
< Add columns to pg_stats to report range-type histogram information
(Egor Rogov, Soumyadeep Chakraborty)



Display length and bounds histograms in pg_stats
< Add new COPY option "ON_ERROR ignore" to discard error rows (Damir
Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He, Jian He, Yugo
Nagata)
duplicate name.




Re: First draft of PG 17 release notes

2024-05-09 Thread jian he
> <<
> Allow ALTER OPERATOR to set more optimization attributes (Tommy Pavlicek)
> This is useful for extensions.
> <<

sorry,  I mean
<<
Allow the creation of hash indexes on ltree columns (Tommy Pavlicek)
This also enables hash join and hash aggregation on ltree columns.
<<

better description would be:
<<
Add hash support functions and hash opclass for contrib/ltree (Tommy Pavlicek)
This also enables hash join and hash aggregation on ltree columns.
<<




Re: First draft of PG 17 release notes

2024-05-09 Thread jian he
On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html
>

* Add function pg_buffercache_evict() to allow shared buffer eviction
(Palak Chaturvedi, Thomas Munro)
* This is useful for testing.

this should put it on the section
< E.1.3.11. Additional Modules
?

Then I found out official release notes don't have  attributes,
so it doesn't matter?



<<
Allow ALTER OPERATOR to set more optimization attributes (Tommy Pavlicek)
This is useful for extensions.
<<
I think this commit title "Add hash support functions and hash opclass
for contrib/ltree."
 from [1] is more descriptive.
i am not 100% sure of the meaning of "This is useful for extensions."



[1] 
https://git.postgresql.org/cgit/postgresql.git/commit/?id=485f0aa85995340fb62113448c992ee48dc6fff1




Re: First draft of PG 17 release notes

2024-05-09 Thread jian he
On Thu, May 9, 2024 at 12:04 PM Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html
>

another potential incompatibilities issue:
ALTER TABLE DROP PRIMARY KEY

see:
https://www.postgresql.org/message-id/202404181849.6frtmajobe27%40alvherre.pgsql




Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread jian he
hi,

SELECT table_name, column_name, is_updatable
  FROM information_schema.columns
 WHERE table_name LIKE E'r_\\_view%'
 ORDER BY table_name, ordinal_position;

at d1d286d83c0eed695910cb20d970ea9bea2e5001,
this query in src/test/regress/sql/updatable_views.sql
makes regress tests fail. maybe other query also,
but this is the first one that invokes the server crash.


src3=# SELECT table_name, column_name, is_updatable
  FROM information_schema.columns
 WHERE table_name LIKE E'r_\\_view%'
 ORDER BY table_name, ordinal_position;
TRAP: failed Assert("bms_is_valid_set(a)"), File:
"../../Desktop/pg_src/src3/postgres/src/backend/nodes/bitmapset.c",
Line: 515, PID: 158266
postgres: jian src3 [local] SELECT(ExceptionalCondition+0x106)[0x5579188c0b6f]
postgres: jian src3 [local] SELECT(bms_is_member+0x56)[0x5579183581c7]
postgres: jian src3 [local]
SELECT(join_clause_is_movable_to+0x72)[0x557918439711]
postgres: jian src3 [local] SELECT(+0x73e26c)[0x5579183a126c]
postgres: jian src3 [local] SELECT(create_index_paths+0x3b8)[0x55791839d0ce]
postgres: jian src3 [local] SELECT(+0x719b4d)[0x55791837cb4d]
postgres: jian src3 [local] SELECT(+0x719400)[0x55791837c400]
postgres: jian src3 [local] SELECT(+0x718e90)[0x55791837be90]
postgres: jian src3 [local] SELECT(make_one_rel+0x187)[0x55791837bac5]
postgres: jian src3 [local] SELECT(query_planner+0x4e8)[0x5579183d2dc2]
postgres: jian src3 [local] SELECT(+0x7734ad)[0x5579183d64ad]
postgres: jian src3 [local] SELECT(subquery_planner+0x14b9)[0x5579183d57e4]
postgres: jian src3 [local] SELECT(standard_planner+0x365)[0x5579183d379e]
postgres: jian src3 [local] SELECT(planner+0x81)[0x5579183d3426]
postgres: jian src3 [local] SELECT(pg_plan_query+0xbb)[0x5579186100c8]
postgres: jian src3 [local] SELECT(pg_plan_queries+0x11a)[0x5579186102de]
postgres: jian src3 [local] SELECT(+0x9ad8f1)[0x5579186108f1]
postgres: jian src3 [local] SELECT(PostgresMain+0xd4a)[0x557918618603]
postgres: jian src3 [local] SELECT(+0x9a76a8)[0x55791860a6a8]
postgres: jian src3 [local]
SELECT(postmaster_child_launch+0x14d)[0x5579184d3430]
postgres: jian src3 [local] SELECT(+0x879c28)[0x5579184dcc28]
postgres: jian src3 [local] SELECT(+0x875278)[0x5579184d8278]
postgres: jian src3 [local] SELECT(PostmasterMain+0x205f)[0x5579184d7837]


version

 PostgreSQL 17devel_debug_build on x86_64-linux, compiled by gcc-11.4.0, 64-bit


meson config:
-Dpgport=5458 \
-Dplperl=enabled \
-Dplpython=enabled \
-Dssl=openssl \
-Dldap=enabled \
-Dlibxml=enabled \
-Dlibxslt=enabled \
-Duuid=e2fs \
-Dzstd=enabled \
-Dlz4=enabled \
-Dcassert=true \
-Db_coverage=true \
-Dicu=enabled \
-Dbuildtype=debug \
-Dwerror=true \
-Dc_args='-Wunused-variable
-Wuninitialized
-Werror=maybe-uninitialized
-Wreturn-type
-DWRITE_READ_PARSE_PLAN_TREES
-DREALLOCATE_BITMAPSETS
-DCOPY_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST -fno-omit-frame-pointer' \
-Ddocs_pdf=disabled \
-Ddocs_html_style=website \
-Dllvm=disabled \
-Dtap_tests=enabled \
-Dextra_version=_debug_build


This commit: d1d286d83c0eed695910cb20d970ea9bea2e5001
Revert: Remove useless self-joins make it fail.

the preceding commit (81b2252e609cfa74550dd6804949485c094e4b85)
won't make the regress fail.

i also found that not specifying c_args: `-DREALLOCATE_BITMAPSETS`,
the regress test won't fail.


later, i found out that `select 1 from information_schema.columns`
would also crash the server.

information_schema.columns view is very complex.
I get the view information_schema.columns definitions,
omit unnecessary const and where qual parts of the it
so the minimum reproducer is:

SELECT 1
FROM (
(pg_attribute a LEFT JOIN pg_attrdef ad ON (((a.attrelid =
ad.adrelid) AND (a.attnum = ad.adnum
JOIN (pg_class c JOIN pg_namespace nc ON (c.relnamespace = nc.oid)) ON
(a.attrelid = c.oid))
JOIN (pg_type t JOIN pg_namespace nt ON ((t.typnamespace = nt.oid)))
ON (a.atttypid = t.oid))
LEFT JOIN (pg_type bt JOIN pg_namespace nbt ON (bt.typnamespace =
nbt.oid))  ON ( t.typbasetype = bt.oid  ))
LEFT JOIN (pg_collation co JOIN pg_namespace nco ON ( co.collnamespace
= nco.oid)) ON (a.attcollation = co.oid))
LEFT JOIN (pg_depend dep JOIN pg_sequence seq ON (dep.objid =
seq.seqrelid )) ON (((dep.refobjid = c.oid) AND (dep.refobjsubid =
a.attnum;




Re: SQL:2011 application time

2024-05-05 Thread jian he
On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth
 wrote:
>
> On 4/30/24 09:24, Robert Haas wrote:
> > Peter, could you have a look at
> > http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com
> > and express an opinion about whether each of those proposals are (a)
> > good or bad ideas and (b) whether they need to be fixed for the
> > current release?
>
> Here are the same patches but rebased. I've added a fourth which is my 
> progress on adding the CHECK
> constraint. I don't really consider it finished though, because it has these 
> problems:
>
> - The CHECK constraint should be marked as an internal dependency of the PK, 
> so that you can't drop
> it, and it gets dropped when you drop the PK. I don't see a good way to tie 
> the two together though,
> so I'd appreciate any advice there. They are separate AlterTableCmds, so how 
> do I get the
> ObjectAddress of both constraints at the same time? I wanted to store the 
> PK's ObjectAddress on the
> Constraint node, but since ObjectAddress isn't a Node it doesn't work.
>

hi.
I hope I understand the problem correctly.
my understanding is that we are trying to solve a corner case:
create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS));
insert into t values ('[1,2]','empty'), ('[1,2]','empty');


I think the entry point is ATAddCheckNNConstraint and index_create.
in a chain of DDL commands, you cannot be sure which one
(primary key constraint or check constraint) is being created first,
you just want to make sure that after both constraints are created,
then add a dependency between primary key and check constraint.

so you need to validate at different functions
(ATAddCheckNNConstraint, index_create)
that these two constraints are indeed created,
only after that we have a dependency linking these two constraints.


I've attached a patch trying to solve this problem.
the patch is not totally polished, but works as expected, and also has
lots of comments.
From 2028de0384b81bb9b2bff53fd391b08f57aba242 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Mon, 6 May 2024 10:12:26 +0800
Subject: [PATCH v4 1/1] add a special check constrint for PERIOD primary key

last column of PERIOD primary key cannot have empty value,
otherwise primary key may lost uniqueness property.

corner case demo:
create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS));
insert into t values ('[1,2]','empty'), ('[1,2]','empty');

this patch makes it fails by internally add a check constraint:
CHECK (NOT isempty(period_column))
after that, the table `t` will look like:

 Column |   Type| Collation | Nullable | Default
+---+---+--+-
 a  | int4range |   | not null |
 b  | int4range |   | not null |
Indexes:
"t_pkey" PRIMARY KEY (a, b WITHOUT OVERLAPS)
Check constraints:
"b_not_empty" CHECK (NOT isempty(b))

to distinguish this constraint with other check constraint, we
make it conperiod as true in pg_constraint catalog.

we aslo add a internal dependency between the primary key constraint
and this check constraint.
so you cannot drop the check constraint itself,
if you drop the primary key constraint, this check constraint
will be dropped automatically.

generally we add check constraint within the function ATAddCheckNNConstraint,
primary key constraint within the function index_create.
in a chain of DDL command, we are not sure which one is be
first created, to make it safe, we do cross check at these two
functions, after both check constraint and primary key constraint
are created, then we add a internal dependencies on it.

N.B. we also need to have special care for case
where check constraint was readded, e.g. ALTER TYPE.
if ALTER TYPE is altering the PERIOD column of the primary key,
alter column of primary key makes the index recreate, check constraint recreate,
however, former interally also including add a check constraint.
so we need to take care of merging two check constraint.

N.B. the check constraint name is hard-wired, so if you create the constraint
with the same name, PERIOD primary key cannot be created.

N.B. what about UNIQUE constraint?

N.B. seems ok to not care about FOREIGN KEY regarding this corner case?

Discussion: https://postgr.es/m/3775839b-3f0f-4c8a-ac03-a253222e6...@illuminatedcomputing.com
---
 doc/src/sgml/gist.sgml|   3 -
 doc/src/sgml/ref/create_table.sgml|   5 +-
 src/backend/catalog/heap.c|  10 +-
 src/backend/catalog/index.c   |  22 ++
 src/backend/commands/tablecmds.c  | 188 +
 src/backend/parser/parse_utilcmd.c|  88 +-
 src/include/commands/tablecmds.h  |   4 +
 .../regress/expected/without_overlaps.out | 251 +-
 src/test/regress/sql/without_overlaps.sql | 1

Re: Document NULL

2024-05-03 Thread jian he
On Fri, May 3, 2024 at 2:47 PM Laurenz Albe  wrote:
>
> On Thu, 2024-05-02 at 08:23 -0700, David G. Johnston wrote:
> > Version 2 attached.  Still a draft, focused on topic picking and overall 
> > structure.
>
> I'm fine with most of the material (ignoring ellipses and typos), except this:
>
> +The NOT NULL column constraint is largely syntax sugar for the 
> corresponding
> +column IS NOT NULL check constraint, though there are metadata 
> differences
> +described in create table.
>

the system does not translate (check constraint column IS NOT NULL)
to NOT NULL constraint,
at least in domain.

for example:
create domain connotnull integer;
alter domain connotnull add not null;
\dD connotnull

drop domain connotnull cascade;
create domain connotnull integer;
alter domain connotnull add check (value is not null);
\dD




Re: [PoC] Reducing planning time when tables have many partitions

2024-05-02 Thread jian he
On Thu, May 2, 2024 at 3:57 PM Yuya Watari  wrote:
>

hi. sorry to bother you, maybe a dumb question.

trying to understand something under the hood.
currently I only applied
v24-0001-Speed-up-searches-for-child-EquivalenceMembers.patch.

on v24-0001:
+/*
+ * add_eq_member - build a new EquivalenceMember and add it to an EC
+ */
+static EquivalenceMember *
+add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
+  JoinDomain *jdomain, Oid datatype)
+{
+ EquivalenceMember *em = make_eq_member(ec, expr, relids, jdomain,
+   NULL, datatype);
+
+ ec->ec_members = lappend(ec->ec_members, em);
+ return em;
+}
+
this part seems so weird to me.
add_eq_member function was added very very long ago,
why do we create a function with the same function name?

also I didn't see deletion of original add_eq_member function
(https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/optimizer/path/equivclass.c#n516)
in v24-0001.

Obviously, now I cannot compile it correctly.
What am I missing?




EXPLAN redundant options

2024-05-02 Thread jian he
hi.

just found out we can:
explain (verbose, verbose off, analyze on, analyze off, analyze on)
select count(*) from tenk1;

similar to COPY, do we want to error out these redundant options?




Re: SQL:2011 application time

2024-05-01 Thread jian he
On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth
 wrote:
>
> On 4/30/24 09:24, Robert Haas wrote:
> > Peter, could you have a look at
> > http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com
> > and express an opinion about whether each of those proposals are (a)
> > good or bad ideas and (b) whether they need to be fixed for the
> > current release?
>
> Here are the same patches but rebased. I've added a fourth which is my 
> progress on adding the CHECK
> constraint. I don't really consider it finished though, because it has these 
> problems:
>
> - The CHECK constraint should be marked as an internal dependency of the PK, 
> so that you can't drop
> it, and it gets dropped when you drop the PK. I don't see a good way to tie 
> the two together though,
> so I'd appreciate any advice there. They are separate AlterTableCmds, so how 
> do I get the
> ObjectAddress of both constraints at the same time? I wanted to store the 
> PK's ObjectAddress on the
> Constraint node, but since ObjectAddress isn't a Node it doesn't work.
>
> - The CHECK constraint should maybe be hidden when you say `\d foo`? Or maybe 
> not, but that's what
> we do with FK triggers.
>
> - When you create partitions you get a warning about the constraint already 
> existing, because it
> gets created via the PK and then also the partitioning code tries to copy it. 
> Solving the first
> issue here should solve this nicely though.
>
> Alternately we could just fix the GROUP BY functional dependency code to only 
> accept b-tree indexes.
> But I think the CHECK constraint approach is a better solution.
>

I will consider these issues later.
The following are general ideas after applying your patches.

CREATE TABLE temporal_rng1(
id int4range,
valid_at daterange,
CONSTRAINT temporal_rng1_pk unique (id, valid_at WITHOUT OVERLAPS)
);
insert into temporal_rng1(id, valid_at) values (int4range '[1,1]',
'empty'::daterange), ('[1,1]', 'empty');
table temporal_rng1;
  id   | valid_at
---+--
 [1,2) | empty
 [1,2) | empty
(2 rows)

i hope i didn't miss something:
exclude the 'empty' special value, WITHOUT OVERLAP constraint will be
unique and is more restrictive?

if so,
then adding a check constraint to make the WITHOUT OVERLAP not include
the special value 'empty'
is better than
writing a doc explaining that on some special occasion, a unique
constraint is not meant to be unique
?

in here
https://www.postgresql.org/docs/devel/ddl-constraints.html#DDL-CONSTRAINTS-UNIQUE-CONSTRAINTS
says:
<<
Unique constraints ensure that the data contained in a column, or a
group of columns, is unique among all the rows in the table.
<<

+ /*
+ * The WITHOUT OVERLAPS part (if any) must be
+ * a range or multirange type.
+ */
+ if (constraint->without_overlaps && lc == list_last_cell(constraint->keys))
+ {
+ Oid typid = InvalidOid;
+
+ if (!found && cxt->isalter)
+ {
+ /*
+ * Look up the column type on existing table.
+ * If we can't find it, let things fail in DefineIndex.
+ */
+ Relation rel = cxt->rel;
+ for (int i = 0; i < rel->rd_att->natts; i++)
+ {
+ Form_pg_attribute attr = TupleDescAttr(rel->rd_att, i);
+ const char *attname;
+
+ if (attr->attisdropped)
+ break;
+
+ attname = NameStr(attr->attname);
+ if (strcmp(attname, key) == 0)
+ {
+ typid = attr->atttypid;
+ break;
+ }
+ }
+ }
+ else
+ typid = typenameTypeId(NULL, column->typeName);
+
+ if (OidIsValid(typid) && !type_is_range(typid) && !type_is_multirange(typid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("column \"%s\" in WITHOUT OVERLAPS is not a range or
multirange type", key),
+ parser_errposition(cxt->pstate, constraint->location)));
+ }

+ if (attr->attisdropped)
+ break;
it will break the loop?
but here you want to continue the loop?

+ if (OidIsValid(typid) && !type_is_range(typid) && !type_is_multirange(typid))
didn't consider the case where typid is InvalidOid,
maybe we can simplify to
+ if (!type_is_range(typid) && !type_is_multirange(typid))


+ notnullcmds = lappend(notnullcmds, notemptycmd);
seems weird.
we can imitate notnullcmds related logic for notemptycmd,
not associated notnullcmds in any way.




Re: CREATE TABLE/ProcessUtility hook behavior change

2024-04-30 Thread jian he
On Tue, Apr 30, 2024 at 4:35 PM David Steele  wrote:
>
> On 4/30/24 12:57, jian he wrote:
> > On Tue, Apr 30, 2024 at 10:26 AM David Steele  wrote:
> >>
> >> Since bb766cde cannot be readily applied to older commits in master I'm
> >> unable to continue bisecting to find the ALTER TABLE behavioral change.
> >>
> >> This seems to leave me with manual code inspection and there have been a
> >> lot of changes in this area, so I'm hoping somebody will know why this
> >> ALTER TABLE change happened before I start digging into it.
> >

I just tested these two commits.
https://git.postgresql.org/cgit/postgresql.git/commit/?id=3da13a6257bc08b1d402c83feb2a35450f988365
https://git.postgresql.org/cgit/postgresql.git/commit/?id=b0e96f311985bceba79825214f8e43f65afa653a

i think it's related to the catalog not null commit.
it will alter table and add not null constraint internally (equivalent
to `alter table test alter id set not null`).




Re: CREATE TABLE/ProcessUtility hook behavior change

2024-04-29 Thread jian he
On Tue, Apr 30, 2024 at 10:26 AM David Steele  wrote:
>
> Hackers,
>
> While testing pgAudit against PG17 I noticed the following behavioral
> change:
>
> CREATE TABLE public.test
> (
> id INT,
> name TEXT,
> description TEXT,
> CONSTRAINT test_pkey PRIMARY KEY (id)
> );
> NOTICE:  AUDIT: SESSION,23,1,DDL,CREATE TABLE,TABLE,public.test,"CREATE
> TABLE public.test
> (
> id INT,
> name TEXT,
> description TEXT,
> CONSTRAINT test_pkey PRIMARY KEY (id)
> );",
> NOTICE:  AUDIT: SESSION,23,1,DDL,ALTER TABLE,TABLE,public.test,"CREATE
> TABLE public.test
> (
> id INT,
> name TEXT,
> description TEXT,
> CONSTRAINT test_pkey PRIMARY KEY (id)
> );",
> NOTICE:  AUDIT: SESSION,23,1,DDL,CREATE
> INDEX,INDEX,public.test_pkey,"CREATE TABLE public.test
> (
> id INT,
> name TEXT,
> description TEXT,
> CONSTRAINT test_pkey PRIMARY KEY (id)
> );",
>
> Prior to PG17, ProcessUtility was not being called for ALTER TABLE in
> this context. The change probably makes some sense, since the table is
> being created then altered to add the primary key, but since it is a
> behavioral change I wanted to bring it up.
>
> I attempted a bisect to identify the commit that caused this behavioral
> change but pgAudit has been broken since at least November on master and
> didn't get fixed again until bb766cde (April 8). Looking at that commit
> I'm a bit baffled by how it fixed the issue I was seeing, which was that
> an event trigger was firing in the wrong context in the pgAudit tests:
>
>   CREATE TABLE tmp (id int, data text);
> +ERROR:  not fired by event trigger manager
>
> That error comes out of pgAudit itself:
>
>  if (!CALLED_AS_EVENT_TRIGGER(fcinfo))
>  elog(ERROR, "not fired by event trigger manager");
>
> Since bb766cde cannot be readily applied to older commits in master I'm
> unable to continue bisecting to find the ALTER TABLE behavioral change.
>
> This seems to leave me with manual code inspection and there have been a
> lot of changes in this area, so I'm hoping somebody will know why this
> ALTER TABLE change happened before I start digging into it.

probably this commit:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=0cd711271d42b0888d36f8eda50e1092c2fed4b3

especially:

- * The parser will create AT_AttSetNotNull subcommands for
- * columns of PRIMARY KEY indexes/constraints, but we need
- * not do anything with them here, because the columns'
- * NOT NULL marks will already have been propagated into
- * the new table definition.
+ * We see this subtype when a primary key is created
+ * internally, for example when it is replaced with a new
+ * constraint (say because one of the columns changes
+ * type); in this case we need to reinstate attnotnull,
+ * because it was removed because of the drop of the old
+ * PK. Schedule this subcommand to an upcoming AT pass.
*/
+ cmd->subtype = AT_SetAttNotNull;
+ tab->subcmds[AT_PASS_OLD_COL_ATTRS] =
+ lappend(tab->subcmds[AT_PASS_OLD_COL_ATTRS], cmd);




Re: POC: GROUP BY optimization

2024-04-28 Thread jian he
On Wed, Apr 24, 2024 at 2:25 PM jian he  wrote:
>
> hi.
> I found an interesting case.
>
> CREATE TABLE t1 AS
>   SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
> z, i::int4 AS w
>   FROM generate_series(1, 100) AS i;
> CREATE INDEX t1_x_y_idx ON t1 (x, y);
> ANALYZE t1;
> SET enable_hashagg = off;
> SET enable_seqscan = off;
>
> EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
> EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
> EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
> EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
> the above part will use:
>   ->  Incremental Sort
>  Sort Key: x, $, $, $
>  Presorted Key: x
>  ->  Index Scan using t1_x_y_idx on t1

We can make these cases also `Presorted Key: x, y`.

in
`if (path->pathkeys && !pathkeys_contained_in(path->pathkeys,
root->group_pathkeys))` branch
we can simple do
-   infos = lappend(infos, info);
+   infos = lcons(info, infos);

similar to what we did at plancat.c (search lcons).

get_useful_group_keys_orderings returns a  list of PathKeyInfo,
then the caller function just iterates each element.
so for the caller, order of the returned list element from
get_useful_group_keys_orderings
does not matter.

for path Incremental Sort:
function make_ordered_path will return the same cost for different
numbers of  presorted keys.

for example:
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
make_ordered_path cost is same for:
`info->pathkeys: x,y,z,w`
`info->pathkeys:x,z,y,w`

if we arrange `info->pathkeys: x,y,z,w` before `info->pathkeys:x,z,y,w`
in get_useful_group_keys_orderings.
then with the same cost, we will choose the first one
(`info->pathkeys: x,y,z,w`),
if we use IncrementalSort, then we use `Presorted Key: x, y`.




pg_input_error_info doc 2 exampled crammed together

2024-04-28 Thread jian he
hi.

select * from pg_input_error_info('420', 'integer')
select message, detail from pg_input_error_info('1234.567', 'numeric(7,4)')
I found above two examples at [0] crammed together.


   
select * from pg_input_error_info('420',
'integer')


   message| detail | hint
| sql_error_code
--++--+
 value "420" is out of range for type integer ||  | 22003

   
   
select message, detail from
pg_input_error_info('1234.567', 'numeric(7,4)')


message |  detail
+---
 numeric field overflow | A field with precision 7, scale 4 must round
to an absolute value less than 10^3.



after checking the definition of [1], [2],
maybe here we should use  and also add `(1 row)` information.

or we can simply add a empty new line between
` value "420" is out of range for type integer ||  | 22003`
and
``


[0] 
https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-VALIDITY
[1] https://tdg.docbook.org/tdg/4.5/programlisting
[2] https://tdg.docbook.org/tdg/4.5/screen



-- 
 I recommend David Deutsch's <>

  Jian




add tab-complete for memory, serialize option and other minor issues.

2024-04-26 Thread jian he
hi.

I found some minor issues related to the EXPLAIN command.

cannot auto-complete with a white space.
src8=# explain (analyze,b

can auto-complete:
src8=# explain (analyze, b

to make tab-complete work, comma, must be followed with a white space,
not sure why.
--
explain (serialize binary) select 1;
ERROR:  EXPLAIN option SERIALIZE requires ANALYZE

do you think it's better to rephrase it as:
ERROR:  EXPLAIN option SERIALIZE requires ANALYZE option

since we have separate ANALYZE SQL commands.
--

 
  Specify the output format, which can be TEXT, XML, JSON, or YAML.
  Non-text output contains the same information as the text output
  format, but is easier for programs to parse.  This parameter defaults to
  TEXT.
 

should we add  attribute for {TEXT, XML, JSON, YAML} in the
above paragraph?

--
i created a patch for tab-complete for memory, SERIALIZE option.
From bec2c92ef61bb8272608a49e6837141e7e8346b3 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sat, 27 Apr 2024 10:33:16 +0800
Subject: [PATCH v1 1/1] add Tab-complete for EXPLAIN MEMORY, EXPLAIN SERIALIZE

---
 src/bin/psql/tab-complete.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 6fee3160..08641565 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3856,11 +3856,13 @@ psql_completion(const char *text, int start, int end)
 		 */
 		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
 			COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS", "GENERIC_PLAN",
-		  "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
-		else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|GENERIC_PLAN|BUFFERS|WAL|TIMING|SUMMARY"))
+		  "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT", "SERIALIZE", "MEMORY");
+		else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|GENERIC_PLAN|BUFFERS|WAL|TIMING|SUMMARY|MEMORY"))
 			COMPLETE_WITH("ON", "OFF");
 		else if (TailMatches("FORMAT"))
 			COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
+		else if (TailMatches("SERIALIZE"))
+			COMPLETE_WITH("TEXT", "NONE", "BINARY");
 	}
 	else if (Matches("EXPLAIN", "ANALYZE"))
 		COMPLETE_WITH("SELECT", "INSERT INTO", "DELETE FROM", "UPDATE", "DECLARE",

base-commit: 1713e3d6cd393fcc1d4873e75c7fa1f6c7023d75
-- 
2.34.1



Re: minor error message inconsistency in make_pathkey_from_sortinfo

2024-04-25 Thread jian he
On Wed, Apr 24, 2024 at 5:47 PM Yugo NAGATA  wrote:
>
> On Wed, 24 Apr 2024 15:05:00 +0800
> jian he  wrote:
>
> > hi.
> >
> > in make_pathkey_from_sortinfo
> >
> > equality_op = get_opfamily_member(opfamily,
> >   opcintype,
> >   opcintype,
> >   BTEqualStrategyNumber);
> > if (!OidIsValid(equality_op)) /* shouldn't happen */
> > elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
> > BTEqualStrategyNumber, opcintype, opcintype, opfamily);
> >
> > the error message seems not right?
>
> This message was introduced by 278cb434110 which was aiming to
> standardize the wording for similar errors. We can find the pattern
>
>  "missing {support function | operator} %d(%u,%u) in opfamily %u"
>
> in several places.
>

the error message
` operator %d`
would translate to
` operator 3`

but there is oid as 3 operator in the catalog.
that's my confusion.
the discussion at [1] didn't explain my confusion.


[1] 
https://postgr.es/m/cagpqqf2r9nk8htpv0ffi+fp776ewmyguorpc9zykzkc8sfq...@mail.gmail.com




Re: POC: GROUP BY optimization

2024-04-24 Thread jian he
hi
one more question (maybe a dumb one)

drop table if exists t1;
CREATE TABLE t1 AS
  SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
  FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;


EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY z,x,y,w
order by w;


  QUERY PLAN
--
 GroupAggregate
   Output: count(*), w, z, x, y
   Group Key: t1.w, t1.x, t1.y, t1.z
   ->  Sort
 Output: w, z, x, y
 Sort Key: t1.w, t1.x, t1.y, t1.z
 ->  Index Scan using t1_x_y_idx on public.t1
   Output: w, z, x, y
(8 rows)


if you do
` Sort Key: t1.w, t1.x, t1.y, t1.z`

then the output is supposed to be:

Output: w, x, y, z

?




minor error message inconsistency in make_pathkey_from_sortinfo

2024-04-24 Thread jian he
hi.

in make_pathkey_from_sortinfo

equality_op = get_opfamily_member(opfamily,
  opcintype,
  opcintype,
  BTEqualStrategyNumber);
if (!OidIsValid(equality_op)) /* shouldn't happen */
elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
BTEqualStrategyNumber, opcintype, opcintype, opfamily);

the error message seems not right?

maybe
if (!OidIsValid(equality_op)) /* shouldn't happen */
elog(ERROR, "missing operator =(%u,%u) in opfamily %u",opcintype,
opcintype, opfamily);

or

if (!OidIsValid(equality_op)) /* shouldn't happen */
elog(ERROR, "missing equality operator for type %u in opfamily
%u",opcintype, opcintype, opfamily);




Re: POC: GROUP BY optimization

2024-04-24 Thread jian he
hi.
I found an interesting case.

CREATE TABLE t1 AS
  SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
  FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
the above part will use:
  ->  Incremental Sort
 Sort Key: x, $, $, $
 Presorted Key: x
 ->  Index Scan using t1_x_y_idx on t1

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z;

these will use:
  ->  Incremental Sort
 Sort Key: x, y, $, $
 Presorted Key: x, y
 ->  Index Scan using t1_x_y_idx on t1

I guess this is fine, but not optimal?




clamp_row_est avoid infinite

2024-04-22 Thread jian he
hi.

/*
 * clamp_row_est
 * Force a row-count estimate to a sane value.
 */
double
clamp_row_est(double nrows)
{
/*
* Avoid infinite and NaN row estimates.  Costs derived from such values
* are going to be useless.  Also force the estimate to be at least one
* row, to make explain output look better and to avoid possible
* divide-by-zero when interpolating costs.  Make it an integer, too.
*/
if (nrows > MAXIMUM_ROWCOUNT || isnan(nrows))
nrows = MAXIMUM_ROWCOUNT;
else if (nrows <= 1.0)
nrows = 1.0;
else
nrows = rint(nrows);

return nrows;
}


The comments say `Avoid infinite and NaN`
but actually we only avoid NaN.

Do we need to add isinf(nrows)?




Re: POC: GROUP BY optimization

2024-04-22 Thread jian he
On Fri, Apr 19, 2024 at 6:44 PM jian he  wrote:
>
> On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov  
> wrote:
> >
> > Thank you for the fixes you've proposed.  I didn't look much into
> > details yet, but I think the main concern Tom expressed in [1] is
> > whether the feature is reasonable at all.  I think at this stage the
> > most important thing is to come up with convincing examples showing
> > how huge performance benefits it could cause.  I will return to this
> > later today and will try to provide some convincing examples.
> >

hi.
previously preprocess_groupclause will not process cases
where no ORDER BY clause is specified.
commit 0452b461b will reorder the GROUP BY element even though no
ORDER BY clause is specified
, if there are associated indexes on it.
(hope I understand it correctly).


for example (when enable_hashagg is false)
explain(verbose) select count(*) FROM btg GROUP BY y,x;
in pg16 will not reorder, it will be as is: `GROUP BY y,x`

after commit 0452b461b, it will reorder to `GROUP BY x,y`.
because there is an index `btree (x, y)` (only one) associated with it.
if you drop the index `btree (x, y)` , it will be `GROUP BY y,x` as pg16.


This reordering GROUP BY element when no ORDER BY clause is not specified
is performant useful when the work_mem is small.
I've attached some tests comparing master with REL_16_STABLE to
demonstrate that.
all the tests attached are under the condition:
work_mem='64kB', buildtype=release, max_parallel_workers_per_gather=0.


one example:
CREATE TABLE btg5 AS
SELECT i::numeric % 10 AS x, i % 10 AS y, 'abc' || i % 10 AS z, i % 10 AS w
FROM generate_series(1, 1e6) AS i;
CREATE INDEX btg5_x_y_idx ON btg5(x, y);

explain(analyze) SELECT count(*) FROM btg5 GROUP BY z, y, w, x;
in pg17,  the execution time is: 746.574 ms
in pg16,  the execution time is: 1693.483 ms

if I reorder it manually as:
`explain(analyze) SELECT count(*) FROM btg5 GROUP BY x, y, w, z;`
then in pg16, the execution time is 630.394 ms


low_mem_groupby_reorder.sql
Description: application/sql


slightly misleading Error message in guc.c

2024-04-22 Thread jian he
hi.
minor issue in guc.c.

set work_mem to '1kB';
ERROR:  1 kB is outside the valid range for parameter "work_mem" (64
.. 2147483647)
should it be
ERROR:  1 kB is outside the valid range for parameter "work_mem" (64
kB .. 2147483647 kB)
?
since the units for work_mem are { "B", "kB", "MB", "GB", and "TB"}

search `outside the valid range for parameter`,
there are two occurrences in guc.c.




Re: documentation structure

2024-04-19 Thread jian he
On Wed, Apr 17, 2024 at 7:07 PM Dagfinn Ilmari Mannsåker
 wrote:
>
>
> > It'd also be quite useful if clients could render more of the documentation
> > for functions. People are used to language servers providing full
> > documentation for functions etc...
>
> A more user-friendly version of \df+ (maybe spelled \hf, for symmetry
> with \h for commands?) would certainly be nice.
>

I think `\hf` is useful.
otherwise people first need google to find out the function html page,
then need Ctrl + F to locate specific function entry.

for \hf
we may need to offer a doc url link.
but currently many functions are unlinkable in the doc.
Also one section can have many sections.
I guess just linking directly to a nearby position in a html page
should be fine.


We can also add a url for functions decorated as underscore
like mysql 
(https://dev.mysql.com/doc/refman/8.3/en/string-functions.html#function_concat).
I am not sure it is an elegant solution.




Re: POC: GROUP BY optimization

2024-04-19 Thread jian he
On Thu, Apr 18, 2024 at 6:58 PM Alexander Korotkov  wrote:
>
> Thank you for the fixes you've proposed.  I didn't look much into
> details yet, but I think the main concern Tom expressed in [1] is
> whether the feature is reasonable at all.  I think at this stage the
> most important thing is to come up with convincing examples showing
> how huge performance benefits it could cause.  I will return to this
> later today and will try to provide some convincing examples.
>

hi.
I found a case where it improved performance.

+-- GROUP BY optimization by reorder columns
+CREATE TABLE btg AS SELECT
+ i % 100 AS x,
+ i % 100 AS y,
+ 'abc' || i % 10 AS z,
+ i AS w
+FROM generate_series(1,1) AS i;
+CREATE INDEX abc ON btg(x,y);
+ANALYZE btg;
+
I change
+FROM generate_series(1,1) AS i;
to
+ FROM generate_series(1, 1e6) AS i;

Then I found out about these 2 queries performance improved a lot.
A: explain(analyze) SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER
BY y, x \watch i=0.1 c=10
B: explain(analyze) SELECT count(*) FROM btg GROUP BY w, x, z, y ORDER
BY y, x, z, w \watch i=0.1 c=10

set (enable_seqscan,enable_hashagg) from on to off:
queryA execution time from 1533.013 ms to 533.430 ms
queryB execution time from 1996.817 ms to 497.020 ms




Re: documentation structure

2024-04-18 Thread jian he
On Thu, Apr 18, 2024 at 2:37 AM Dagfinn Ilmari Mannsåker
 wrote:
>
> Andres Freund  writes:
>
> > Hi,
> >
> > On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote:
> >> Andres Freund  writes:
> >> > I think the manual work for writing signatures in sgml is not 
> >> > insignificant,
> >> > nor is the volume of sgml for them. Manually maintaining the signatures 
> >> > makes
> >> > it impractical to significantly improve the presentation - which I don't 
> >> > think
> >> > is all that great today.
> >>
> >> And it's very inconsistent.  For example, some functions use 
> >> tags for optional parameters, others use square brackets, and some use
> >> VARIADIC to indicate variadic parameters, others use
> >> ellipses (sometimes in  tags or brackets).
> >
> > That seems almost inevitably the outcome of many people having to manually
> > infer the recommended semantics, for writing something boring but 
> > nontrivial,
> > from a 30k line file.
>
> As Corey mentioned elsethread, having a markup style guide (maybe a
> comment at the top of the file?) would be nice.
>
> >> > And the lack of argument names in the pg_proc entries is occasionally 
> >> > fairly
> >> > annoying, because a \df+ doesn't provide enough information to use 
> >> > functions.
> >>
> >> I was also annoyed by this the other day (specifically wrt. the boolean
> >> arguments to pg_ls_dir),
> >
> > My bane is regexp_match et al, I have given up on remembering the argument
> > order.
>
> There's a thread elsewhere about those specifically, but I can't be
> bothered to find the link right now.
>
> >> and started whipping up a Perl script to parse func.sgml and generate
> >> missing proargnames values for pg_proc.dat, which is how I discovered the
> >> above.
> >
> > Nice.
> >
> >> The script currently has a pile of hacky regexes to cope with that,
> >> so I'd be happy to submit a doc patch to turn it into actual markup to get
> >> rid of that, if people think that's a worhtwhile use of time and won't 
> >> clash
> >> with any other plans for the documentation.
> >
> > I guess it's a bit hard to say without knowing how voluminious the changes
> > would be. If we end up rewriting the whole file the tradeoff is less clear
> > than if it's a dozen inconsistent entries.
>
> It turned out to not be that many that used [] for optional parameters,
> see the attached patch.
>

hi.
I manually checked the html output. It looks good to me.




Re: "backend process" confused with "server process"

2024-04-15 Thread jian he
On Mon, Apr 15, 2024 at 5:27 PM Daniel Gustafsson  wrote:
>
> > On 15 Apr 2024, at 11:07, Andrey M. Borodin  wrote:
> >> On 15 Apr 2024, at 14:01, jian he  wrote:
>
> >> "is not a PostgreSQL server process" is the same thing as "not a
> >> PostgreSQL backend process”?
> >
> > As far as I understand, backend is something attached to frontend.
> > There might be infrastructure processes like syslogger or stats collector, 
> > client can signal it too.
>
> I think that's a good summary, in line with the glossary in our docs where we
> define "Auxiliary process" and "Backend (process)" (but not PostgreSQL server
> process which maybe we should?).  "PostgreSQL server process" is used in copy
> to/from error messaging as well in what seems to be a user-freindly way to say
> "backend".  In the docs we mostly use "server process" to mean a process
> running in the operating system on the server.  In a few places we use
> "database server process" too when talking about backends.
>
> In the case of pg_log_backend_memory_contexts(), the function comment states
> the following:
>
> "Signal a backend or an auxiliary process to log its memory contexts"
>
> So in this case, "PostgreSQL server process" seems like the correct term.
>

in [1] (pg_stat_activity)
does "PostgreSQL server process" means all backend_type:
autovacuum launcher, autovacuum worker, logical replication launcher,
logical replication worker, parallel worker, background writer, client
backend, checkpointer, archiver, standalone backend, startup,
walreceiver, walsender and walwriter.
?

and
`PostgreSQL backend process`
means only  `client backend`?


I aslo found an minor inconsistency.
`Autovacuum (process)`
is not the same thing as
`autovacuum launcher`
?

but there is a link from `autovacuum launcher` to `Autovacuum (process)`.


[1] 
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW
[2] https://www.postgresql.org/docs/current/glossary.html




"backend process" confused with "server process"

2024-04-15 Thread jian he
hi.

pg_log_backend_memory_contexts
we have
`
if (proc == NULL)
{
/*
* This is just a warning so a loop-through-resultset will not abort
* if one backend terminated on its own during the run.
*/
ereport(WARNING,
(errmsg("PID %d is not a PostgreSQL server process", pid)));
PG_RETURN_BOOL(false);
}
`


pg_signal_backend
`
if (proc == NULL)
{
/*
* This is just a warning so a loop-through-resultset will not abort
* if one backend terminated on its own during the run.
*/
ereport(WARNING,
(errmsg("PID %d is not a PostgreSQL backend process", pid)));

return SIGNAL_BACKEND_ERROR;
}
`

"is not a PostgreSQL server process" is the same thing as "not a
PostgreSQL backend process"?
should we unify it?




Re: sql/json remaining issue

2024-04-14 Thread jian he
hi.
https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items#Open_Issues
issue: Problems with deparsed SQL/JSON query function

original the bug report link:
https://postgr.es/m/cacjufxeqhqsfrg_p7emyo5zak3d767ifdl8vz_4%3dzbhpotr...@mail.gmail.com

forgive me for putting it in the new email thread.
I made the following change, added several tests on it.

--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -4636,10 +4636,10 @@ transformJsonBehavior(ParseState *pstate,
JsonBehavior *behavior,
  {
  expr = transformExprRecurse(pstate, behavior->expr);
  if (!IsA(expr, Const) && !IsA(expr, FuncExpr) &&
- !IsA(expr, OpExpr))
+ !IsA(expr, OpExpr) && !IsA(expr, CoerceViaIO) && !IsA(expr, CoerceToDomain))
  ereport(ERROR,
  (errcode(ERRCODE_DATATYPE_MISMATCH),
- errmsg("can only specify a constant, non-aggregate function, or
operator expression for DEFAULT"),
+ errmsg("can only specify a constant, non-aggregate function, or
operator expression or cast expression for DEFAULT"),
  parser_errposition(pstate, exprLocation(expr;
  if (contain_var_clause(expr))
  ereport(ERROR,

these two expression node also looks like Const:
CoerceViaIO:   "foo1"'::jsonb::text
CoerceToDomain: 'foo'::jsonb_test_domain

we need to deal with these two, otherwise we cannot use domain type in
DEFAULT expression.
also the following should not fail:
SELECT JSON_VALUE(jsonb '{"d1": "foo"}', '$.a2' returning text DEFAULT
'"foo1"'::text::json::text ON ERROR);


we have `if (contain_var_clause(expr))` further check it,
so it should be fine?
From 124cd4245266343daecdb4294b2013d9ebdd6b24 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Mon, 15 Apr 2024 12:37:36 +0800
Subject: [PATCH v1 1/1] in transformJsonBehavior better handle default
 expression

sql/json query functions for process json can return empty or error.
We can specify DEFAULT expression for handling empty or error cases while applying
the path expression to the json or while type coercion.
the default expression can be just a plain constant,
however, a constant can be formed as a cast expression,eg (1::jsonb::text).

so allow the DEFAULT expression formed as CoerceViaIO node
or CoerceToDomain node in transformJsonBehavior
for better handling these cases.
Discussion: https://postgr.es/m/CACJufxEqhqsfrg_p7EMyo5zak3d767iFDL8vz_4%3DZBHpOtrghw@mail.gmail.com

---
 src/backend/parser/parse_expr.c   |  4 +--
 .../regress/expected/sqljson_jsontable.out| 25 ++
 .../regress/expected/sqljson_queryfuncs.out   | 26 +--
 src/test/regress/sql/sqljson_jsontable.sql| 13 ++
 src/test/regress/sql/sqljson_queryfuncs.sql   |  7 +
 5 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 4c98d7a0..94dbb531 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -4636,10 +4636,10 @@ transformJsonBehavior(ParseState *pstate, JsonBehavior *behavior,
 		{
 			expr = transformExprRecurse(pstate, behavior->expr);
 			if (!IsA(expr, Const) && !IsA(expr, FuncExpr) &&
-!IsA(expr, OpExpr))
+!IsA(expr, OpExpr) && !IsA(expr, CoerceViaIO) && !IsA(expr, CoerceToDomain))
 ereport(ERROR,
 		(errcode(ERRCODE_DATATYPE_MISMATCH),
-		 errmsg("can only specify a constant, non-aggregate function, or operator expression for DEFAULT"),
+		 errmsg("can only specify a constant, non-aggregate function, or operator expression or cast expression for DEFAULT"),
 		 parser_errposition(pstate, exprLocation(expr;
 			if (contain_var_clause(expr))
 ereport(ERROR,
diff --git a/src/test/regress/expected/sqljson_jsontable.out b/src/test/regress/expected/sqljson_jsontable.out
index a00eec8a..b45dd32a 100644
--- a/src/test/regress/expected/sqljson_jsontable.out
+++ b/src/test/regress/expected/sqljson_jsontable.out
@@ -113,6 +113,31 @@ FROM json_table_test vals
  [1, 1.23, "2", "aaa", "foo", null, false, true, {"aaa": 123}, "[1,2]", "\"str\""] | 11 | | "str"   | "str|  | | "str"   | "\"str\""| "\"str\""
 (14 rows)
 
+--check default expression
+SELECT * FROM JSON_TABLE(jsonb '{"d1": "H"}', '$'
+COLUMNS (js1 jsonb_test_domain path '$.a2' DEFAULT '"foo1"'::jsonb::text ON ERROR));
+  js1   
+
+ "foo1"
+(1 row)
+
+SELECT * FROM JSON_TABLE(jsonb '{"d1": "H"}', '$'
+COLUMNS (js1 jsonb_test_domain path '$.a2' DEFAULT 'foo'::jsonb_test_domain ON ERROR));
+ERROR:  value for domain jsonb_test_domain violates check constraint "jsonb_test_domain_check"
+SELECT * FROM JSON_TABLE(jsonb '{"d1": "H"}'

Re: documentation structure

2024-04-14 Thread jian he
On Wed, Mar 20, 2024 at 5:40 AM Andrew Dunstan  wrote:
>
>
> +many for improving the index.
>
> My own pet docs peeve is a purely editorial one: func.sgml is a 30k line 
> beast, and I think there's a good case for splitting out at least the larger 
> chunks of it.
>

I think I successfully reduced func.sgml from 311322 lines to 13167 lines.
(base-commit: 93582974315174d544592185d797a2b44696d1e5)

writing a patch would be unreviewable.
key gotcha is put the contents between opening ``  and closing
`` (both inclusive)
into a new file.
in func.sgml, using ``  to refernce the new file.
also update filelist.sgml

here is how I do it:

I found out these build html files are the biggest one:
doc/src/sgml/html/functions-string.html
doc/src/sgml/html/functions-matching.html
doc/src/sgml/html/functions-datetime.html
doc/src/sgml/html/functions-json.html
doc/src/sgml/html/functions-aggregate.html
doc/src/sgml/html/functions-info.html
doc/src/sgml/html/functions-admin.html

so create these new sgml files hold corrspedoning content:
func-string.sgml
func-matching.sgml
func-datetime.sgml
func-json.sgml
func-aggregate.sgml
func-info.sgml
func-admin.sgml

based on funs.sgml structure pattern:

next section1 line number:



next section1 line number:



next section1 line number:



next section1 line number:



next section1 line number:



next section1 line number:



next section1 line number:


step1:   pipe the relative line range contents to new sgml files.
(example: line 2407 to line 4177 include all the content correspond to
functions-string.html)

sed -n '2407,4177 p' func.sgml > func-string.sgml
sed -n '5328,7756 p' func.sgml >  func-matching.sgml
sed -n '8939,11122 p' func.sgml > func-datetime.sgml
sed -n '15498,19348 p' func.sgml > func-json.sgml
sed -n '21479,22896 p' func.sgml > func-aggregate.sgml
sed -n '24257,27896 p' func.sgml > func-info.sgml
sed -n '27898,30579 p' func.sgml > func-admin.sgml

step2:
in place delete these line ranges in func.sgml
sed --in-place  "2407,4177d ; 5328,7756d ; 8939,11122d ; 15498,19348d
; 21479,22896d ; 24257,27896d ; 27898,30579d" \
func.sgml
reference: 
https://unix.stackexchange.com/questions/676210/matching-multiple-ranges-with-sed-range-expressions
   
https://www.gnu.org/software/sed/manual/sed.html#Command_002dLine-Options

step3:
put following lines into relative position in func.sgml:
(based on above structure pattern, quickly location line position)

`







`

step4: update filelist.sgml:
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3fb0709f..0b78a361 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -18,6 +18,13 @@
 
 
 
+
+
+
+
+
+
+
 
 
 

 doc/src/sgml/filelist.sgml   | 7 +
 doc/src/sgml/func-admin.sgml |  2682 +
 doc/src/sgml/func-aggregate.sgml |  1418 +++
 doc/src/sgml/func-datetime.sgml  |  2184 
 doc/src/sgml/func-info.sgml  |  3640 ++
 doc/src/sgml/func-json.sgml  |  3851 ++
 doc/src/sgml/func-matching.sgml  |  2429 
 doc/src/sgml/func-string.sgml|  1771 +++
 doc/src/sgml/func.sgml   | 17979 +

we can do it one by one, but it's still worth it.




call ATPostAlterTypeParse inconsistency

2024-04-14 Thread jian he
hi.
one minor issue.

within ATPostAlterTypeCleanup, we call ATPostAlterTypeParse:
ATPostAlterTypeParse(oldId, relid, InvalidOid,
(char *) lfirst(def_item),
wqueue, lockmode, tab->rewrite);


function ATPostAlterTypeParse is:
static void
ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
List **wqueue, LOCKMODE lockmode, bool rewrite)

but tab->rewrite is an int.
so within ATPostAlterTypeCleanup we should call it like:

ATPostAlterTypeParse(oldId, relid, InvalidOid,
(char *) lfirst(def_item),
wqueue, lockmode, tab->rewrite != 0);

or

ATPostAlterTypeParse(oldId, relid, InvalidOid,
(char *) lfirst(def_item),
wqueue, lockmode, tab->rewrite > 0);




Re: SQL:2011 application time

2024-04-14 Thread jian he
On Wed, Apr 3, 2024 at 1:30 PM Paul Jungwirth
 wrote:
>
> On 3/24/24 00:38, Peter Eisentraut wrote:> I have committed the patches
> > v33-0001-Add-temporal-FOREIGN-KEYs.patch and 
> > v33-0002-Support-multiranges-in-temporal-FKs.patch
> > (together).
>
> Hi Hackers,
>
> I found some problems with temporal primary keys and the idea of uniqueness, 
> especially around the
> indisunique column. Here are some small fixes and a proposal for a larger 
> fix, which I think we need
> but I'd like some feedback on.
>
> The first patch fixes problems with ON CONFLICT DO NOTHING/UPDATE.
>
> DO NOTHING fails because it doesn't expect a non-btree unique index. It's 
> fine to make it accept a
> temporal PRIMARY KEY/UNIQUE index though (i.e. an index with both indisunique 
> and indisexclusion).
> This is no different than an exclusion constraint. So I skip 
> BuildSpeculativeIndexInfo for WITHOUT
> OVERLAPS indexes. (Incidentally, AFAICT ii_UniqueOps is never used, only 
> ii_UniqueProcs. Right?)
>

hi.
for unique index, primary key:
ii_ExclusionOps, ii_UniqueOps is enough to distinguish this index
support without overlaps,
we don't need another ii_HasWithoutOverlaps?
(i didn't test it though)


ON CONFLICT DO NOTHING
ON CONFLICT (id, valid_at) DO NOTHING
ON CONFLICT ON CONSTRAINT temporal_rng_pk DO NOTHING
I am confused by the test.
here temporal_rng only has one primary key, ON CONFLICT only deals with it.
I thought these three are the same thing?


DROP TABLE temporal_rng;
CREATE TABLE temporal_rng (id int4range,valid_at daterange);
ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY
(id, valid_at WITHOUT OVERLAPS);

+-- ON CONFLICT
+--
+TRUNCATE temporal_rng;
+INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)',
daterange('2000-01-01', '2010-01-01'));
+-- with a conflict
+INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)',
daterange('2005-01-01', '2006-01-01')) ON CONFLICT DO NOTHING;
+-- id matches but no conflict

+TRUNCATE temporal_rng;
+INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)',
daterange('2000-01-01', '2010-01-01'));
+-- with a conflict
+INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)',
daterange('2005-01-01', '2006-01-01')) ON CONFLICT (id, valid_at) DO
NOTHING;
+ERROR:  there is no unique or exclusion constraint matching the ON
CONFLICT specification

+TRUNCATE temporal_rng;
+INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)',
daterange('2000-01-01', '2010-01-01'));
+-- with a conflict
+INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)',
daterange('2005-01-01', '2006-01-01')) ON CONFLICT ON CONSTRAINT
temporal_rng_pk DO NOTHING;




Re: sql/json remaining issue

2024-04-13 Thread jian he
On Fri, Apr 12, 2024 at 5:44 PM Amit Langote  wrote:
>
> > elog(ERROR, "unrecognized json wrapper %d", wrapper);
> > should be
> > elog(ERROR, "unrecognized json wrapper %d", (int) wrapper);
>
> Fixed in 0003.
>
the fix seems not in 0003?
other than that, everything looks fine.



SELECT * FROM JSON_TABLE (
'{"favorites":
{"movies":
  [{"name": "One", "director": "John Doe"},
   {"name": "Two", "director": "Don Joe"}],
 "books":
  [{"name": "Mystery", "authors": [{"name": "Brown Dan"}]},
   {"name": "Wonder", "authors": [{"name": "Jun Murakami"},
{"name":"Craig Doe"}]}]
}}'::json, '$.favs[*]'
COLUMNS (user_id FOR ORDINALITY,
  NESTED '$.movies[*]'
COLUMNS (
movie_id FOR ORDINALITY,
mname text PATH '$.name',
director text),
  NESTED '$.books[*]'
COLUMNS (
  book_id FOR ORDINALITY,
  bname text PATH '$.name',
  NESTED '$.authors[*]'
COLUMNS (
  author_id FOR ORDINALITY,
  author_name text PATH '$.name';


I actually did run the query, it returns null.
'$.favs[*]'
should be
'$.favorites[*]'



one more minor thing, I previously mentioned in getJsonPathVariable
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("could not find jsonpath variable \"%s\"",
pnstrdup(varName, varNameLength;

do we need to remove pnstrdup?




Re: altering a column's collation leaves an invalid foreign key

2024-04-13 Thread jian he
On Fri, Apr 12, 2024 at 5:06 PM jian he  wrote:
>
> On Tue, Mar 26, 2024 at 1:00 PM jian he  wrote:
> >
> > On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth
> >  wrote:
> > >
> > > On 3/23/24 10:04, Paul Jungwirth wrote:
> > > > Perhaps if the previous collation was nondeterministic we should force 
> > > > a re-check.
> > >
> > > Here is a patch implementing this. It was a bit more fuss than I 
> > > expected, so maybe someone has a
> > > better way.
I think I found a simple way.

the logic is:
* ATExecAlterColumnType changes one column once at a time.
* one column can only have one collation. so we don't need  to store a
list of collation oid.
* ATExecAlterColumnType we can get the new collation (targetcollid)
and original collation info.
* RememberAllDependentForRebuilding  will check the column dependent,
whether this column is referenced by a foreign key or not information
is recorded.
so AlteredTableInfo->changedConstraintOids have the primary key and
foreign key oids.
* ATRewriteCatalogs will call ATPostAlterTypeCleanup (see the comments
in ATRewriteCatalogs)
* for tab->changedConstraintOids (foreign key, primary key) will call
ATPostAlterTypeParse, so
for foreign key (con->contype == CONSTR_FOREIGN)  will call TryReuseForeignKey.
* in TryReuseForeignKey, we can pass the information that our primary
key old collation is nondeterministic
and old collation != new collation to the foreign key constraint.
so foreign key can act accordingly at ATAddForeignKeyConstraint (old_check_ok).


based on the above logic, I add one bool in struct AlteredTableInfo,
one bool in struct Constraint.
bool in AlteredTableInfo is for storing it, later passing it to struct
Constraint.
we need bool in Constraint because ATAddForeignKeyConstraint needs it.
From 0eb0846450cb33996d6fb35c19290d297c26fd3c Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sat, 13 Apr 2024 20:28:25 +0800
Subject: [PATCH v3 1/1] ALTER COLUMN SET DATA TYPE Re-check foreign key ties
 under certain condition

with deterministic collations, we check ties by looking at binary
equality. But nondeterministic collation can allow non-identical values
to be considered equal, e.g. 'a' and 'A' when case-insensitive.
ALTER COLUMN .. SET DATA TYPE make
some references that used to be valid may not be anymore.

for `ALTER COLUMN .. SET DATA TYPE`:
If the changed key columns is part of the primary key columns,
then under the following condition
we need to revalidate the foreign key constraint:
* the primary key is referenced by a foreign key constraint
* the new collation is not the same as the old
* the old collation is nondeterministic.

discussion: https://postgr.es/m/78d824e0-b21e-480d-a252-e4b84bc2c24b%40illuminatedcomputing.com
---
 src/backend/commands/tablecmds.c  | 62 +++
 src/include/nodes/parsenodes.h|  3 +-
 .../regress/expected/collate.icu.utf8.out |  9 +++
 src/test/regress/sql/collate.icu.utf8.sql |  8 +++
 4 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 000212f2..d5bd61f1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -198,6 +198,7 @@ typedef struct AlteredTableInfo
 	/* Objects to rebuild after completing ALTER TYPE operations */
 	List	   *changedConstraintOids;	/* OIDs of constraints to rebuild */
 	List	   *changedConstraintDefs;	/* string definitions of same */
+	bool		verify_new_collation; /* T if we should recheck new collation for foreign key */
 	List	   *changedIndexOids;	/* OIDs of indexes to rebuild */
 	List	   *changedIndexDefs;	/* string definitions of same */
 	char	   *replicaIdentityIndex;	/* index to reset as REPLICA IDENTITY */
@@ -583,12 +584,13 @@ static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab,
    LOCKMODE lockmode);
 static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
  char *cmd, List **wqueue, LOCKMODE lockmode,
- bool rewrite);
+ bool rewrite,
+ bool verify_new_collation);
 static void RebuildConstraintComment(AlteredTableInfo *tab, AlterTablePass pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
-static void TryReuseForeignKey(Oid oldId, Constraint *con);
+static void TryReuseForeignKey(Oid oldId, Constraint *con, bool verify_new_collation);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
 	 List *options, LOCKMODE lockmode);
 static void change_owner_fix_column_acls(Oid relationOid,
@@ -10256,6 +10258,15 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	old_pfeqop_item);
 		}
 		if (old_check_ok)
+		{
+			/* also see TryReuseForeignKey.
+			 * collation_rechec

Re: Can't find not null constraint, but \d+ shows that

2024-04-12 Thread jian he
On Fri, Apr 12, 2024 at 3:52 PM Alvaro Herrera  wrote:
>
> On 2024-Apr-12, jian he wrote:
>
> > Now I am more confused...
>
> > +CREATE TABLE notnull_tbl1 (c0 int, c1 int, PRIMARY KEY (c0, c1));
> > +ALTER TABLE  notnull_tbl1 DROP c1;
>
> > same query, mysql make let "c0" be not null
>
> Yes, that was Postgres' old model.  But the way we think of it now, is
> that a column is marked attnotnull when a pg_constraint entry exists to
> support that flag, which can be a not-null constraint, or a primary key
> constraint.  In the old Postgres model, you're right that we would
> continue to have c0 as not-null, just like mysql.  In the new model,
> that flag no longer has no reason to be there, because the backing
> primary key constraint has been removed, which is why we reset it.
>
> So what I was saying in the cases with replica identity and generated
> columns, is that there's an attnotnull flag we cannot remove, because of
> either of those things, but we don't have any backing constraint for it,
> which is an inconsistency with the view of the world that I described
> above.  I would like to manufacture one not-null constraint at that
> point, or just abort the drop of the PK ... but I don't see how to do
> either of those things.
>

thanks for your explanation.
now I understand it.
I wonder is there any incompatibility issue, or do we need to say something
about the new behavior when dropping a key column?

the comments look good to me.

only minor cosmetic issue:
+ if (unconstrained_cols)
i would like change it to
+ if (unconstrained_cols != NIL)

+ foreach(lc, unconstrained_cols)
we can change to
+ foreach_int(attnum, unconstrained_cols)
per commit
https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff




Re: altering a column's collation leaves an invalid foreign key

2024-04-12 Thread jian he
On Tue, Mar 26, 2024 at 1:00 PM jian he  wrote:
>
> On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth
>  wrote:
> >
> > On 3/23/24 10:04, Paul Jungwirth wrote:
> > > Perhaps if the previous collation was nondeterministic we should force a 
> > > re-check.
> >
> > Here is a patch implementing this. It was a bit more fuss than I expected, 
> > so maybe someone has a
> > better way.
> >
>
> + /* test follows the one in ri_FetchConstraintInfo() */
> + if (ARR_NDIM(arr) != 1 ||
> + ARR_HASNULL(arr) ||
> + ARR_ELEMTYPE(arr) != INT2OID)
> + elog(ERROR, "conkey is not a 1-D smallint array");
> + attarr = (AttrNumber *) ARR_DATA_PTR(arr);
> +
> + /* stash a List of the collation Oids in our Constraint node */
> + for (i = 0; i < numkeys; i++)
> + con->old_collations = lappend_oid(con->old_collations,
> +  list_nth_oid(changedCollationOids, attarr[i] - 1));
>
> I don't understand the "ri_FetchConstraintInfo" comment.

I still don't understand this comment.

>
> +static void
> +RememberCollationForRebuilding(AttrNumber attnum, AlteredTableInfo *tab)
> +{
> + Oid typid;
> + int32 typmod;
> + Oid collid;
> + ListCell   *lc;
> +
> + /* Fill in the list with InvalidOid if this is our first visit */
> + if (tab->changedCollationOids == NIL)
> + {
> + int len = RelationGetNumberOfAttributes(tab->rel);
> + int i;
> +
> + for (i = 0; i < len; i++)
> + tab->changedCollationOids = lappend_oid(tab->changedCollationOids,
> + InvalidOid);
> + }
> +
> + get_atttypetypmodcoll(RelationGetRelid(tab->rel), attnum,
> +  , , );
> +
> + lc = list_nth_cell(tab->changedCollationOids, attnum - 1);
> + lfirst_oid(lc) = collid;
> +}
>
> do we need to check if `collid` is a valid collation?
> like:
>
> if (!OidIsValid(collid))
> {
> lc = list_nth_cell(tab->changedCollationOids, attnum - 1);
> lfirst_oid(lc) = collid;
> }
now I think RememberCollationForRebuilding is fine. no need further change.


in ATAddForeignKeyConstraint.
if (old_check_ok)
{
/*
* When a pfeqop changes, revalidate the constraint.  We could
* permit intra-opfamily changes, but that adds subtle complexity
* without any concrete benefit for core types.  We need not
* assess ppeqop or ffeqop, which RI_Initial_Check() does not use.
*/
old_check_ok = (pfeqop == lfirst_oid(old_pfeqop_item));
old_pfeqop_item = lnext(fkconstraint->old_conpfeqop,
old_pfeqop_item);
}
maybe we can do the logic right below it. like:

if (old_check_ok)
{

* All deterministic collations use bitwise equality to resolve
* ties, but if the previous collation was nondeterministic,
* we must re-check the foreign key, because some references
* that use to be "equal" may not be anymore. If we have
* InvalidOid here, either there was no collation or the
* attribute didn't change.
old_check_ok = (old_collation == InvalidOid ||
get_collation_isdeterministic(old_collation));
}
then we don't need to cram it with the other `if (old_check_ok){}`.


do we need to add an entry in
https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items#Open_Issues
?




Re: Can't find not null constraint, but \d+ shows that

2024-04-11 Thread jian he
On Thu, Apr 11, 2024 at 10:48 PM Alvaro Herrera  wrote:
>
>
> I'm still not ready with this -- still not convinced about the new AT
> pass.  Also, I want to add a test for the pg_dump behavior, and there's
> an XXX comment.
>
Now I am more confused...

+-- make sure attnotnull is reset correctly when a PK is dropped indirectly,
+-- or kept if there's a reason for that
+CREATE TABLE notnull_tbl1 (c0 int, c1 int, PRIMARY KEY (c0, c1));
+ALTER TABLE  notnull_tbl1 DROP c1;
+\d+ notnull_tbl1
+   Table "public.notnull_tbl1"
+ Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
++-+---+--+-+-+--+-
+ c0 | integer |   |  | | plain   |  |
+
+DROP TABLE notnull_tbl1;

same query, mysql make let "c0" be not null
mysql https://dbfiddle.uk/_ltoU7PO

for postgre
https://dbfiddle.uk/ZHJXEqL1
from 9.3 to 16 (click the link (https://dbfiddle.uk/ZHJXEqL1), then
click "9.3" choose which version you like)
all will make the remaining column "co" be not null.

latest
0001-Better-handle-indirect-constraint-drops.patch make c0 attnotnull be false.

previous patches:
v2-0001-Handle-ALTER-.-DROP-NOT-NULL-when-no-pg_constrain.patch  make
c0 attnotnull be true.
0001-Correctly-reset-attnotnull-when-constraints-dropped-.patch make
c0 attnotnull be false.




Re: session username in default psql prompt?

2024-04-11 Thread jian he
On Tue, Mar 26, 2024 at 7:14 AM Andrew Dunstan  wrote:
>
>
>
> On Mon, Mar 25, 2024 at 9:14 AM Jelte Fennema-Nio  wrote:
>>
>> On Mon, 25 Mar 2024 at 14:06, Robert Haas  wrote:
>> > On Mon, Mar 25, 2024 at 4:30 AM Jelte Fennema-Nio  
>> > wrote:
>> > > That problem seems easy to address by adding a newline into the
>> > > default prompt.
>> >
>> > Ugh. Please, no!
>>
>> I guess it's partially a matter of taste, but personally I'm never
>> going back to a single line prompt. It's so nice for zoomed-in demos
>> that your SQL queries don't get broken up.
>
>
>
> Very  much a matter of taste. I knew when I saw your suggestion there would 
> be some kickback. If horizontal space is at a premium vertical space is 
> doubly so, I suspect.
>

the change (session username in default psql prompt) is quite visible,
maybe this time we can conduct a poll,
but in a way the poll can reach more people?




Re: Can't find not null constraint, but \d+ shows that

2024-04-11 Thread jian he
On Thu, Apr 11, 2024 at 3:19 PM Tender Wang  wrote:
>
>> +DROP TABLE notnull_tbl1;
>> +-- make sure attnotnull is reset correctly when a PK is dropped indirectly
>> +CREATE TABLE notnull_tbl1 (c0 int, c1 int, PRIMARY KEY (c0, c1));
>> +ALTER TABLE  notnull_tbl1 DROP c1;
>> +\d+ notnull_tbl1
>> +   Table "public.notnull_tbl1"
>> + Column |  Type   | Collation | Nullable | Default | Storage | Stats
>> target | Description
>> ++-+---+--+-+-+--+-
>> + c0 | integer |   | not null | | plain   |  
>> |
>> +
>>
>> this is not what we expected?
>> "not null" for "c0" now should be false?
>> am I missing something?
>
> Yeah, now this is expected behavior.
> Users can  drop manually not-null of "c0" if they want, and no error reporte.
>

sorry for the noise.
these two past patches confused me:
0001-Correctly-reset-attnotnull-when-constraints-dropped-.patch
v4-0001-Fix-pg_attribute-attnotnull-not-reset-when-droppe.patch

I thought dropping a column of primary key (ALTER TABLE notnull_tbl2 DROP c1)
will make the others key columns to not have "not null" property.

now I figured out that
dropping a column of primary key columns will not change other key
columns'  "not null" property.
dropping the primary key associated constraint will make all key
columns "not null" property disappear.

v2-0001-Handle-ALTER-.-DROP-NOT-NULL-when-no-pg_constrain.patch
behavior looks fine to me now.


inline drop_orphaned_notnull in ATExecDropNotNull looks fine to me.




Re: Can't find not null constraint, but \d+ shows that

2024-04-11 Thread jian he
On Wed, Apr 10, 2024 at 2:10 PM jian he  wrote:
>
> DROP TABLE if exists notnull_tbl2;
> CREATE TABLE notnull_tbl2 (c0 int generated by default as IDENTITY, c1 int);
> ALTER TABLE notnull_tbl2 ADD CONSTRAINT Q PRIMARY KEY(c0, c1);
> ALTER TABLE notnull_tbl2 DROP CONSTRAINT notnull_tbl2_c0_not_null;
> ALTER TABLE notnull_tbl2 DROP c1;
> \d notnull_tbl2

> ALTER TABLE notnull_tbl2 DROP CONSTRAINT notnull_tbl2_c0_not_null;
per above sequence execution order, this should error out?

otherwise which "not null" (attribute|constraint) to anchor "generated
by default as identity" not null property?
"DROP c1" will drop the not null property for "c0" and "c1".
if "DROP CONSTRAINT notnull_tbl2_c0_not_nul" not error out, then
" ALTER TABLE notnull_tbl2 DROP c1;"
should either error out
or transform "c0" from "c0 int generated by default as identity"
to
"c0 int"


On Thu, Apr 11, 2024 at 1:23 AM Alvaro Herrera  wrote:
>
> On 2024-Apr-10, Alvaro Herrera wrote:
>
> > One thing missing here is pg_dump support.  If you just dump this table,
> > it'll end up with no constraint at all.  That's obviously bad, so I
> > propose we have pg_dump add a regular NOT NULL constraint for those, to
> > avoid perpetuating the weird situation further.
>
> Here's another crude patchset, this time including the pg_dump aspect.
>

+DROP TABLE notnull_tbl1;
+-- make sure attnotnull is reset correctly when a PK is dropped indirectly
+CREATE TABLE notnull_tbl1 (c0 int, c1 int, PRIMARY KEY (c0, c1));
+ALTER TABLE  notnull_tbl1 DROP c1;
+\d+ notnull_tbl1
+   Table "public.notnull_tbl1"
+ Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
++-+---+--+-+-+--+-
+ c0 | integer |   | not null | | plain   |  |
+

this is not what we expected?
"not null" for "c0" now should be false?
am I missing something?




Re: sql/json remaining issue

2024-04-10 Thread jian he
On Wed, Apr 10, 2024 at 4:39 PM Amit Langote  wrote:
>
>
> Attached is a bit more polished version of that, which also addresses
> the error messages in JsonPathQuery() and JsonPathValue().  I noticed
> that there was comment I had written at one point during JSON_TABLE()
> hacking that said that we should be doing this.
>
> I've also added an open item for this.
>

`
 | NESTED [ PATH ] json_path_specification [ AS json_path_name ]
COLUMNS ( json_table_column [, ...] )
NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS
( json_table_column [, ...] )
`
 "json_path_specification" should be "path_expression"?

your explanation about memory usage is clear to me!


The following are minor cosmetic issues while applying v2.
+errmsg("JSON path expression in JSON_VALUE should return singleton
scalar item")));
"singleton" is not intuitive to me.
Then I looked around.
https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F=singleton
There is only one appearance of "singleton" in the manual.
then I wonder what's the difference between
22038 ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED
2203F ERRCODE_SQL_JSON_SCALAR_REQUIRED

i assume '{"hello":"world"}'  is a singleton, but not a scalar item?
if so, then I think the error message within the "if (count > 1)"
branch in JsonPathValue
should use ERRCODE_SINGLETON_SQL_JSON_ITEM_REQUIRED
within the "if (!IsAJsonbScalar(res))" branch should use
ERRCODE_SQL_JSON_SCALAR_REQUIRED
?


errhint("Use WITH WRAPPER clause to wrap SQL/JSON item sequence into array.")));
maybe
errhint("Use WITH WRAPPER clause to wrap SQL/JSON items into array.")));
or
errhint("Use WITH WRAPPER clause to wrap SQL/JSON item sequences into
array.")));


+ if (column_name)
+ ereport(ERROR,
+ (errcode(ERRCODE_MORE_THAN_ONE_SQL_JSON_ITEM),
+ errmsg("JSON path expression for column \"%s\" should return
singleton scalar item",
+ column_name)));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SQL_JSON_SCALAR_REQUIRED),
+ errmsg("JSON path expression in JSON_VALUE should return singleton
scalar item")));
the error message seems similar, but the error code is different?
both within "if (count > 1)" and "if (!IsAJsonbScalar(res))" branch.


in src/include/utils/jsonpath.h, comments
/* SQL/JSON item */
should be
/* SQL/JSON query functions */


elog(ERROR, "unrecognized json wrapper %d", wrapper);
should be
elog(ERROR, "unrecognized json wrapper %d", (int) wrapper);




Re: Can't find not null constraint, but \d+ shows that

2024-04-10 Thread jian he
On Wed, Apr 10, 2024 at 7:01 PM Alvaro Herrera  wrote:
>
> On 2024-Apr-10, jian he wrote:
>
> > another related bug, in master.
> >
> > drop table if exists notnull_tbl1;
> > CREATE TABLE notnull_tbl1 (c0 int not null, c1 int);
> > ALTER TABLE notnull_tbl1 ADD CONSTRAINT Q PRIMARY KEY(c0, c1);
> > \d+ notnull_tbl1
> > ALTER TABLE notnull_tbl1 ALTER c0 DROP NOT NULL;
> > ALTER TABLE notnull_tbl1 ALTER c1 DROP NOT NULL;
> >
> > "ALTER TABLE notnull_tbl1 ALTER c0 DROP NOT NULL;"
> > should fail?
>
> No, this should not fail, and it is working correctly in master.  You
> can drop the not-null constraint, but the column will still be
> non-nullable, because the primary key still exists.  If you drop the
> primary key later, then the column becomes nullable.  This is by design.
>

now I got it. the second time, it will fail.
it should be the expected behavior.

per commit:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff
In the function dropconstraint_internal, I changed "foreach" to
"foreach_int" in some places,
and other minor cosmetic changes within the function
dropconstraint_internal only.

Since I saw your changes in dropconstraint_internal, I posted here.
I will review your latest patch later.
From dd41bf8d1f2dea5725909150609f8e565e11efcf Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 10 Apr 2024 20:49:04 +0800
Subject: [PATCH v1 1/1] minor coesmetuic refactor in dropconstraint_internal

mainly change `foreach` to `foreach_int` while iterating through list unconstrained_cols.
---
 src/backend/commands/tablecmds.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8a02c5b0..8dc623e8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12968,12 +12968,11 @@ dropconstraint_internal(Relation rel, HeapTuple constraintTup, DropBehavior beha
 	 * have had pg_attribute.attnotnull set.  See if we need to reset it, and
 	 * do so.
 	 */
-	if (unconstrained_cols)
+	if (unconstrained_cols != NIL)
 	{
 		Relation	attrel;
 		Bitmapset  *pkcols;
 		Bitmapset  *ircols;
-		ListCell   *lc;
 
 		/* Make the above deletion visible */
 		CommandCounterIncrement();
@@ -12989,9 +12988,8 @@ dropconstraint_internal(Relation rel, HeapTuple constraintTup, DropBehavior beha
 	   INDEX_ATTR_BITMAP_PRIMARY_KEY);
 		ircols = RelationGetIndexAttrBitmap(rel, INDEX_ATTR_BITMAP_IDENTITY_KEY);
 
-		foreach(lc, unconstrained_cols)
+		foreach_int(attnum, unconstrained_cols)
 		{
-			AttrNumber	attnum = lfirst_int(lc);
 			HeapTuple	atttup;
 			HeapTuple	contup;
 			Form_pg_attribute attForm;
@@ -13039,11 +13037,11 @@ dropconstraint_internal(Relation rel, HeapTuple constraintTup, DropBehavior beha
 			 * It's not valid to drop the not-null constraint for a column in
 			 * the replica identity index, either. (FULL is not affected.)
 			 */
-			if (bms_is_member(lfirst_int(lc) - FirstLowInvalidHeapAttributeNumber, ircols))
+			if (bms_is_member(attnum - FirstLowInvalidHeapAttributeNumber, ircols))
 ereport(ERROR,
 		errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 		errmsg("column \"%s\" is in index used as replica identity",
-			   get_attname(RelationGetRelid(rel), lfirst_int(lc), false)));
+			   get_attname(RelationGetRelid(rel), attnum, false)));
 
 			/* Reset attnotnull */
 			if (attForm->attnotnull)
@@ -13233,10 +13231,10 @@ dropconstraint_internal(Relation rel, HeapTuple constraintTup, DropBehavior beha
 		 * Find out the list of column names to process.  Fortunately, we
 		 * already have the list of column numbers.
 		 */
-		foreach(lc, unconstrained_cols)
+		foreach_int(attnum, unconstrained_cols)
 		{
 			colnames = lappend(colnames, get_attname(RelationGetRelid(rel),
-	 lfirst_int(lc), false));
+	 attnum, false));
 		}
 
 		foreach(child, children)
-- 
2.34.1



Re: Can't find not null constraint, but \d+ shows that

2024-04-10 Thread jian he
another related bug, in master.

drop table if exists notnull_tbl1;
CREATE TABLE notnull_tbl1 (c0 int not null, c1 int);
ALTER TABLE notnull_tbl1 ADD CONSTRAINT Q PRIMARY KEY(c0, c1);
\d+ notnull_tbl1
ALTER TABLE notnull_tbl1 ALTER c0 DROP NOT NULL;
ALTER TABLE notnull_tbl1 ALTER c1 DROP NOT NULL;

"ALTER TABLE notnull_tbl1 ALTER c0 DROP NOT NULL;"
should fail?

I didn't investigate deep enough.


Re: Can't find not null constraint, but \d+ shows that

2024-04-10 Thread jian he
On Wed, Apr 10, 2024 at 1:29 AM Alvaro Herrera  wrote:
>
> On 2024-Mar-29, Tender Wang wrote:
>
> > I think aboved case can explain what's meaning about comments in
> > dropconstraint_internal.
> > But here, in RemoveConstraintById() , we only care about primary key case,
> > so NOT NULL is better to removed from comments.
>
> Actually, I think it's better if all the resets of attnotnull occur in
> RemoveConstraintById, for both types of constraints; we would keep that
> block in dropconstraint_internal only to raise errors in the cases where
> the constraint is protecting a replica identity or a generated column.
> Something like the attached, perhaps, may need more polish.
>

DROP TABLE if exists notnull_tbl2;
CREATE TABLE notnull_tbl2 (c0 int generated by default as IDENTITY, c1 int);
ALTER TABLE notnull_tbl2 ADD CONSTRAINT Q PRIMARY KEY(c0, c1);
ALTER TABLE notnull_tbl2 DROP CONSTRAINT notnull_tbl2_c0_not_null;
ALTER TABLE notnull_tbl2 DROP c1;
\d notnull_tbl2

last "\d notnull_tbl2" command, master output is:
Table "public.notnull_tbl2"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+--
 c0 | integer |   | not null | generated by default as identity



last "\d notnull_tbl2" command, applying
0001-Correctly-reset-attnotnull-when-constraints-dropped-.patch
output:
Table "public.notnull_tbl2"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+--
 c0 | integer |   |  | generated by default as identity


there may also have similar issues with  the replicate identity.




sql/json remaining issue

2024-04-09 Thread jian he
hi.
`
 | NESTED [ PATH ] json_path_specification [ AS json_path_name ]
COLUMNS ( json_table_column [, ...] )
NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS
( json_table_column [, ...] )
`
 "json_path_specification" should be "path_expression"?
--
drop table s1;
create or replace function random_text_1000() returns text
as $$select string_agg(md5(random()::text),'') from
generate_Series(1,1000) s $$ LANGUAGE SQL;

create unlogged table s1(a int GENERATED BY DEFAULT AS IDENTITY, js jsonb);
insert into s1(js)
select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g
|| ']},{"z22": [32, 204,145]}]},"c": ' || g
|| ',"id": "' || random_text_1000() || '"}')
from generate_series(1_000_000, 1_000_000) g;
insert into s1(js)
select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g
|| ']},{"z22": [32, 204,145]}]},"c": ' || g
|| ',"id": "' || random_text_1000() || '"}')
from generate_series(235, 235 + 20,1) g;

select count(*), pg_size_pretty(pg_total_relation_size('s1')) from s1;
 count  | pg_size_pretty
+
 22 | 6398 MB
--

explain(analyze, costs off,buffers, timing)
SELECT sub.*, s.a as s_a FROM s,
(values(23)) x(x),
generate_series(13, 13) y,
JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y
COLUMNS (
xx1 int PATH '$.c',
NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS
(c int PATH '$')),
NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'),
NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x"
-2))' as z1 COLUMNS (a int PATH '$')),
NESTED PATH '$.a.za[1]' COLUMNS
(NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int
PATH '$ ? (@ <= ($"x" + 76))' default -1000 ON EMPTY))
)) sub;

for one jsonb, it can expand to 7 rows, the above query will return
around 1.4 million rows.
i use the above query, and pg_log_backend_memory_contexts in another
session to check the memory usage.
didn't find memory over consumed issue.

but I am not sure this is the right way to check the memory consumption.
--
begin;
SELECT sub.*, s.a as s_a FROM s,
(values(23)) x(x),
generate_series(13, 13) y,
JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y
COLUMNS (
xx1 int PATH '$.c',
NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS
(c int PATH '$')),
NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'),
NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x"
-2))' as z1 COLUMNS (a int PATH '$')),
NESTED PATH '$.a.za[1]' COLUMNS
(NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int
PATH '$ ? (@ <= ($"x" + 76))' error ON EMPTY))
)) sub;
rollback;

only the last row will fail, because of "error ON EMPTY", (1_000_000
<= 23 + 76) is false.
I remember the very previous patch, because of error cleanup, it took
a lot of resources.
does our current implementation, only the very last row fail, will it
be easy to clean up the transaction?


the last query error message is:
`
ERROR:  no SQL/JSON item
`

we are in ExecEvalJsonExprPath, can we output it to be:
`
ERROR:  after applying json_path "5s", no SQL/JSON item found
`
in a json_table query, we can have multiple path_expressions, like the
above query.
it's not easy to know applying which path_expression failed.




Re: remaining sql/json patches

2024-04-07 Thread jian he
On Mon, Apr 8, 2024 at 11:21 AM jian he  wrote:
>
> On Mon, Apr 8, 2024 at 12:34 AM jian he  wrote:
> >
> > On Sun, Apr 7, 2024 at 9:36 PM Amit Langote  wrote:
> > > 0002 needs an expanded commit message but I've run out of energy today.
> > >
>
> other than that, it looks good to me.

one more tiny issue.
+EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view1;
+ERROR:  relation "jsonb_table_view1" does not exist
+LINE 1: EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view1...
+   ^
maybe you want
EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM jsonb_table_view7;
at the end of the sqljson_jsontable.sql.
I guess it will be fine, but the format json explain's out is quite big.

you also need to `drop table s cascade;` at the end of the test?




Re: remaining sql/json patches

2024-04-07 Thread jian he
On Mon, Apr 8, 2024 at 12:34 AM jian he  wrote:
>
> On Sun, Apr 7, 2024 at 9:36 PM Amit Langote  wrote:
> > 0002 needs an expanded commit message but I've run out of energy today.
> >
>

+/*
+ * Fetch next row from a JsonTablePlan's path evaluation result and from
+ * any child nested path(s).
+ *
+ * Returns true if the any of the paths (this or the nested) has more rows to
+ * return.
+ *
+ * By fetching the nested path(s)'s rows based on the parent row at each
+ * level, this essentially joins the rows of different levels.  If any level
+ * has no matching rows, the columns at that level will compute to NULL,
+ * making it an OUTER join.
+ */
+static bool
+JsonTablePlanScanNextRow(JsonTablePlanState *planstate)

"if the any"
should be
"if any" ?

also I think,
 + If any level
+ * has no matching rows, the columns at that level will compute to NULL,
+ * making it an OUTER join.
means
+ If any level rows do not match, the rows at that level will compute to NULL,
+ making it an OUTER join.

other than that, it looks good to me.




Re: remaining sql/json patches

2024-04-07 Thread jian he
On Sun, Apr 7, 2024 at 9:36 PM Amit Langote  wrote:
>
>
> 0002 needs an expanded commit message but I've run out of energy today.
>

some cosmetic issues in v51, 0002.

in struct JsonTablePathScan,
/* ERROR/EMPTY ON ERROR behavior */
bool errorOnError;

the comments seem not right.
I think "errorOnError" means
while evaluating the top level JSON path expression, whether "error on
error" is specified or not?


+  | NESTED  PATH  ]
json_path_specification  AS
json_path_name  COLUMNS (
json_table_column ,
... )
 

"NESTED  PATH  ] "
no need the closing bracket.



+ /* Update the nested plan(s)'s row(s) using this new row. */
+ if (planstate->nested)
+ {
+ JsonTableResetNestedPlan(planstate->nested);
+ if (JsonTablePlanNextRow(planstate->nested))
+ return true;
+ }
+
  return true;
 }

this part can be simplified as:
+ if (planstate->nested)
+{
+ JsonTableResetNestedPlan(planstate->nested);
+ JsonTablePlanNextRow(planstate->nested));
+}
since the last part, if it returns false, eventually it returns true.
also the comments seem slightly confusing?


v51 recursion function(JsonTablePlanNextRow, JsonTablePlanScanNextRow)
is far clearer than v50!
thanks. I think I get it.




Re: remaining sql/json patches

2024-04-07 Thread jian he
On Sun, Apr 7, 2024 at 12:30 PM jian he  wrote:
>
> other than that, it looks good to me.
while looking at it again.

+ | NESTED path_opt Sconst
+ COLUMNS '(' json_table_column_definition_list ')'
+ {
+ JsonTableColumn *n = makeNode(JsonTableColumn);
+
+ n->coltype = JTC_NESTED;
+ n->pathspec = (JsonTablePathSpec *)
+ makeJsonTablePathSpec($3, NULL, @3, -1);
+ n->columns = $6;
+ n->location = @1;
+ $$ = (Node *) n;
+ }
+ | NESTED path_opt Sconst AS name
+ COLUMNS '(' json_table_column_definition_list ')'
+ {
+ JsonTableColumn *n = makeNode(JsonTableColumn);
+
+ n->coltype = JTC_NESTED;
+ n->pathspec = (JsonTablePathSpec *)
+ makeJsonTablePathSpec($3, $5, @3, @5);
+ n->columns = $8;
+ n->location = @1;
+ $$ = (Node *) n;
+ }
+ ;
+
+path_opt:
+ PATH
+ | /* EMPTY */
  ;

for `NESTED PATH`, `PATH` is optional.
So for the doc, many places we need to replace `NESTED PATH` to `NESTED [PATH]`?




Re: remaining sql/json patches

2024-04-06 Thread jian he
hi.
about v50.
+/*
+ * JsonTableSiblingJoin -
+ * Plan to union-join rows of nested paths of the same level
+ */
+typedef struct JsonTableSiblingJoin
+{
+ JsonTablePlan plan;
+
+ JsonTablePlan *lplan;
+ JsonTablePlan *rplan;
+} JsonTableSiblingJoin;

"Plan to union-join rows of nested paths of the same level"
same level problem misleading?
I think it means
"Plan to union-join rows of top level columns clause is a nested path"

+ if (IsA(planstate->plan, JsonTableSiblingJoin))
+ {
+ /* Fetch new from left sibling. */
+ if (!JsonTablePlanNextRow(planstate->left))
+ {
+ /*
+ * Left sibling ran out of rows, fetch new from right sibling.
+ */
+ if (!JsonTablePlanNextRow(planstate->right))
+ {
+ /* Right sibling and thus the plan has now more rows. */
+ return false;
+ }
+ }
+ }
/* Right sibling and thus the plan has now more rows. */
I think you mean:
/* Right sibling ran out of rows and thus the plan has no more rows. */


in  section,
+  | NESTED PATH json_path_specification
 AS path_name 
+COLUMNS ( json_table_column
, ... )

maybe make it into one line.

  | NESTED PATH json_path_specification
 AS path_name  COLUMNS
( json_table_column ,
... )

since the surrounding pattern is the next line beginning with "[",
meaning that next line is optional.


+ at arbitrary nesting levels.
maybe
+ at arbitrary nested level.

in src/tools/pgindent/typedefs.list, "JsonPathSpec" is unnecessary.

other than that, it looks good to me.




Re: remaining sql/json patches

2024-04-06 Thread jian he
On Fri, Apr 5, 2024 at 8:35 PM Amit Langote  wrote:
>
> On Thu, Apr 4, 2024 at 9:02 PM Amit Langote  wrote:
> > I'll post the rebased 0002 tomorrow after addressing your comments.
>
> Here's one.  Main changes:
>
> * Fixed a bug in get_table_json_columns() which caused nested columns
> to be deparsed incorrectly, something Jian reported upthread.
> * Simplified the algorithm in JsonTablePlanNextRow()
>
> I'll post another revision or two maybe tomorrow, but posting what I
> have now in case Jian wants to do more testing.
>

+ else
+ {
+ /*
+ * Parent and thus the plan has no more rows.
+ */
+ return false;
+ }
in JsonTablePlanNextRow, the above comment seems strange to me.

+ /*
+ * Re-evaluate a nested plan's row pattern using the new parent row
+ * pattern, if present.
+ */
+ Assert(parent != NULL);
+ if (!parent->current.isnull)
+ JsonTableResetRowPattern(planstate, parent->current.value);
Is this assertion useful?
if parent is null, then parent->current.isnull will cause segmentation fault.

I tested with 3 NESTED PATH, it works! (I didn't fully understand
JsonTablePlanNextRow though).
the doc needs some polish work.




Re: remaining sql/json patches

2024-04-06 Thread jian he
On Sat, Apr 6, 2024 at 2:03 PM Amit Langote  wrote:
>
> >
> > * problem with type "char". the view def  output is not the same as
> > the select * from v1.
> >
> > create or replace view v1 as
> > SELECT col FROM s,
> > JSON_TABLE(jsonb '{"d": ["hello", "hello1"]}', '$' as c1
> > COLUMNS(col "char" path '$.d' without wrapper keep quotes))sub;
> >
> > \sv v1
> > CREATE OR REPLACE VIEW public.v1 AS
> >  SELECT sub.col
> >FROM s,
> > JSON_TABLE(
> > '{"d": ["hello", "hello1"]}'::jsonb, '$' AS c1
> > COLUMNS (
> > col "char" PATH '$."d"'
> > )
> > ) sub
> > one under the hood called JSON_QUERY_OP, another called JSON_VALUE_OP.
>
> Hmm, I don't see a problem as long as both are equivalent or produce
> the same result.  Though, perhaps we could make
> get_json_expr_options() also deparse JSW_NONE explicitly into "WITHOUT
> WRAPPER" instead of a blank.  But that's existing code, so will take
> care of it as part of the above open item.
>
> > I will do extensive checking for other types later, so far, other than
> > these two issues,
> > get_json_table_columns is pretty solid, I've tried nested columns with
> > nested columns, it just works.
>
> Thanks for checking.
>
After applying v50, this type also has some issues.
CREATE OR REPLACE VIEW t1 as
SELECT sub.* FROM JSON_TABLE(jsonb '{"d": ["hello", "hello1"]}',
'$' AS c1 COLUMNS (
"tsvector0" tsvector path '$.d' without wrapper omit quotes,
"tsvector1" tsvector path '$.d' without wrapper keep quotes))sub;
table t1;

return
tsvector0|tsvector1
-+-
 '"hello1"]' '["hello",' | '"hello1"]' '["hello",'
(1 row)

src5=# \sv t1
CREATE OR REPLACE VIEW public.t1 AS
 SELECT tsvector0,
tsvector1
   FROM JSON_TABLE(
'{"d": ["hello", "hello1"]}'::jsonb, '$' AS c1
COLUMNS (
tsvector0 tsvector PATH '$."d"' OMIT QUOTES,
tsvector1 tsvector PATH '$."d"'
)
) sub

but

 SELECT tsvector0,
tsvector1
   FROM JSON_TABLE(
'{"d": ["hello", "hello1"]}'::jsonb, '$' AS c1
COLUMNS (
tsvector0 tsvector PATH '$."d"' OMIT QUOTES,
tsvector1 tsvector PATH '$."d"'
)
) sub

only return
tsvector0| tsvector1
-+---
 '"hello1"]' '["hello",' |




Re: remaining sql/json patches

2024-04-05 Thread jian he
On Fri, Apr 5, 2024 at 8:35 PM Amit Langote  wrote:
> Here's one.  Main changes:
>
> * Fixed a bug in get_table_json_columns() which caused nested columns
> to be deparsed incorrectly, something Jian reported upthread.
> * Simplified the algorithm in JsonTablePlanNextRow()
>
> I'll post another revision or two maybe tomorrow, but posting what I
> have now in case Jian wants to do more testing.

i am using the upthread view validation function.
by comparing `execute the view definition` and `select * from the_view`,
I did find 2 issues.

* problem in transformJsonBehavior, JSON_BEHAVIOR_DEFAULT branch.
I think we can fix this problem later, since sql/json query function
already committed?

CREATE DOMAIN jsonb_test_domain AS text CHECK (value <> 'foo');
normally, we do:
SELECT JSON_VALUE(jsonb '{"d1": "H"}', '$.a2' returning
jsonb_test_domain DEFAULT 'foo' ON ERROR);

but parsing back view def, we do:
SELECT JSON_VALUE(jsonb '{"d1": "H"}', '$.a2' returning
jsonb_test_domain DEFAULT 'foo'::text::jsonb_test_domain ON ERROR);

then I found the following two queries should not be error out.
SELECT JSON_VALUE(jsonb '{"d1": "H"}', '$.a2' returning
jsonb_test_domain DEFAULT 'foo1'::text::jsonb_test_domain ON ERROR);
SELECT JSON_VALUE(jsonb '{"d1": "H"}', '$.a2' returning
jsonb_test_domain DEFAULT 'foo1'::jsonb_test_domain ON ERROR);


* problem with type "char". the view def  output is not the same as
the select * from v1.

create or replace view v1 as
SELECT col FROM s,
JSON_TABLE(jsonb '{"d": ["hello", "hello1"]}', '$' as c1
COLUMNS(col "char" path '$.d' without wrapper keep quotes))sub;

\sv v1
CREATE OR REPLACE VIEW public.v1 AS
 SELECT sub.col
   FROM s,
JSON_TABLE(
'{"d": ["hello", "hello1"]}'::jsonb, '$' AS c1
COLUMNS (
col "char" PATH '$."d"'
)
) sub
one under the hood called JSON_QUERY_OP, another called JSON_VALUE_OP.

I will do extensive checking for other types later, so far, other than
these two issues,
get_json_table_columns is pretty solid, I've tried nested columns with
nested columns, it just works.




Re: add function argument names to regex* functions.

2024-04-04 Thread jian he
On Wed, Apr 3, 2024 at 4:45 AM Tom Lane  wrote:
>
> jian he  writes:
> > On Thu, Jan 18, 2024 at 4:17 PM Peter Eisentraut  
> > wrote:
> >> Reading back through the discussion, I wasn't quite able to interpret
> >> the resolution regarding Oracle compatibility.  From the patch, it looks
> >> like you chose not to adopt the parameter names from Oracle.  Was that
> >> your intention?
>
> > per committee message:
> > https://git.postgresql.org/cgit/postgresql.git/commit/?id=6424337073589476303b10f6d7cc74f501b8d9d7
> > Even if the names are all the same, our function is still not the same
> > as oracle.
>
> The fact that there's minor discrepancies in the regex languages
> doesn't seem to me to have a lot of bearing on whether we should
> follow Oracle's choices of parameter names.
>
> However, if we do follow Oracle, it seems like we should do that
> consistently, which this patch doesn't.  For instance, per [1]
> Oracle calls the arguments of regex_substr
>
> source_char,
> pattern,
> position,
> occurrence,
> match_param,
> subexpr
>
> while we have
>
> string,
> pattern,
> start,
> N,
> flags,
> subexpr
>
> The patch proposes to replace "N" with "occurrence" but not touch
> the other discrepancies, which seems to me to be a pretty poor
> choice.  "occurrence" is very long and difficult to spell correctly,
> and if you're not following Oracle slavishly, exactly what is the
> argument in its favor?  I quite agree that Oracle's other choices
> aren't improvements over ours, but neither is that one.
>
> On the whole my inclination would be to stick to the names we have
> in the documentation.  There might be an argument for changing "N"
> to something lower-case so you don't have to quote it; but if we do,
> I'd go for, say, "count".
>

we have
---
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted.

in the regexp_replace explanation section.
changing "N" to lower-case would be misleading for regexp_replace?
so I choose "count".

By the way, I think the above  is so hard to comprehend.
I can only find related test in src/test/regress/sql/strings.sql are:
SELECT regexp_replace('111222', E'(\\d{3})(\\d{3})(\\d{4})',
E'(\\1) \\2-\\3');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'X\\', 'g');
SELECT regexp_replace('foobarrbazz', E'(.)\\1', E'XY', 'g');

but these tests seem not friendly.
maybe we should have some simple examples to demonstrate the above paragraph.
From 372a0c6cb894194b819fc380efda179bf6d1055d Mon Sep 17 00:00:00 2001
From: jian he 
Date: Thu, 4 Apr 2024 21:50:13 +0800
Subject: [PATCH v4 1/1] add regex functions argument names.

Specifically add function argument names to the following funtions:
regexp_replace,
regexp_match,
regexp_matches,
regexp_count,
regexp_instr,
regexp_like,
regexp_substr,
regexp_split_to_table,
regexp_split_to_array
So it would be easier to understand these functions in psql via \df.
now these functions can be called in different notaions.

discussion: https://postgr.es/m/CACJufxG3NFKKsh6x4fRLv8h3V-HvN4W5dA%3DzNKMxsNcDwOKang%40mail.gmail.com
---
 doc/src/sgml/func.sgml  | 50 +++
 src/include/catalog/pg_proc.dat | 71 ++---
 2 files changed, 82 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff690113..57ad1624 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3332,7 +3332,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
 
 regexp_instr ( string text, pattern text
  [, start integer
- [, N integer
+ [, count integer
  [, endoption integer
  [, flags text
  [, subexpr integer ] ] ] ] ] )
@@ -3340,7 +3340,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in


 Returns the position within string where
-the N'th match of the POSIX regular
+the count'th match of the POSIX regular
 expression pattern occurs, or zero if there is
 no such match; see .

@@ -3446,14 +3446,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in

 regexp_replace ( string te

Re: remaining sql/json patches

2024-04-04 Thread jian he
On Thu, Apr 4, 2024 at 3:50 PM jian he  wrote:
>
> On Thu, Apr 4, 2024 at 2:41 PM jian he  wrote:
> >
> > On Wed, Apr 3, 2024 at 8:39 PM Amit Langote  wrote:
> > >
> > > Attached updated patches.  I have addressed your doc comments on 0001,
> > > but not 0002 yet.

hi
some doc issue about v49, 0002.
+  Each
+  NESTED PATH clause can be used to generate one or more
+  columns using the data from a nested level of the row pattern, which can be
+  specified using a COLUMNS clause.
 maybe change to

+  Each
+  NESTED PATH clause can be used to generate one or more
+  columns using the data from an upper nested level of the row
pattern, which can be
+  specified using a COLUMNS clause


+   Child
+   columns may themselves contain a NESTED PATH
+   specifification thus allowing to extract data located at arbitrary nesting
+   levels.
maybe change to
+  Child
+  columns themselves  may contain a NESTED PATH
+   specification thus allowing to extract data located at any arbitrary nesting
+   level.


+
+ 
+ 
+  The following is a modified version of the above query to show the usage
+  of NESTED PATH for populating title and director
+  columns, illustrating how they are joined to the parent columns id and
+  kind:
+
+SELECT jt.* FROM
+ my_films,
+ JSON_TABLE ( js, '$.favorites[*] ? (@.films[*].director == $filter)'
+   PASSING 'Alfred Hitchcock' AS filter
+   COLUMNS (
+id FOR ORDINALITY,
+kind text PATH '$.kind',
+NESTED PATH '$.films[*]' COLUMNS (
+  title text FORMAT JSON PATH '$.title' OMIT QUOTES,
+  director text PATH '$.director' KEEP QUOTES))) AS jt;
+ id |   kind   |  title  |  director
++--+-+
+  1 | horror   | Psycho  | "Alfred Hitchcock"
+  2 | thriller | Vertigo | "Alfred Hitchcock"
+(2 rows)
+
+ 
+ 
+  The following is the same query but without the filter in the root
+  path:
+
+SELECT jt.* FROM
+ my_films,
+ JSON_TABLE ( js, '$.favorites[*]'
+   COLUMNS (
+id FOR ORDINALITY,
+kind text PATH '$.kind',
+NESTED PATH '$.films[*]' COLUMNS (
+  title text FORMAT JSON PATH '$.title' OMIT QUOTES,
+  director text PATH '$.director' KEEP QUOTES))) AS jt;
+ id |   kind   |  title  |  director
++--+-+
+  1 | comedy   | Bananas | "Woody Allen"
+  1 | comedy   | The Dinner Game | "Francis Veber"
+  2 | horror   | Psycho  | "Alfred Hitchcock"
+  3 | thriller | Vertigo | "Alfred Hitchcock"
+  4 | drama| Yojimbo | "Akira Kurosawa"
+(5 rows)
 

just found out that the query and the query's output condensed together.
in https://www.postgresql.org/docs/current/tutorial-window.html
the query we use , the output we use .
maybe we can do it the same way,
or we could just have one or two empty new lines separate them.
we have the similar problem in v49, 0001.




Re: remaining sql/json patches

2024-04-04 Thread jian he
On Thu, Apr 4, 2024 at 2:41 PM jian he  wrote:
>
> On Wed, Apr 3, 2024 at 8:39 PM Amit Langote  wrote:
> >
> > Attached updated patches.  I have addressed your doc comments on 0001,
> > but not 0002 yet.
> >
>
about v49, 0002.

--tests setup.
drop table if exists s cascade;
create table s(js jsonb);
insert into s values
('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,2345]},{"z22": [32,
204,145]}]},"c": 3}'),
('{"a":{"za":[{"z1": [21,4222]},{"z21": [32, 134,1345]}]},"c": 10}');

after playing around, I found, the non-nested column will be sorted first,
and the nested column will be ordered as is.
the below query, column "xx1" will be the first column, "xx" will be
the second column.

SELECT sub.* FROM s,(values(23)) x(x),generate_series(13, 13) y,
JSON_TABLE(js, '$' as c1 PASSING x AS x, y AS y COLUMNS(
NESTED PATH '$.a.za[2]' as n3 columns (NESTED PATH '$.z22[*]' as z22
COLUMNS (c int path  '$')),
NESTED PATH '$.a.za[1]' as n4 columns (d int[] PATH '$.z21'),
NESTED PATH '$.a.za[0]' as n1 columns (NESTED PATH '$.z1[*]' as z1
COLUMNS (a int path  '$')),
xx1 int path '$.c',
NESTED PATH '$.a.za[1]' as n2 columns (NESTED PATH '$.z21[*]' as z21
COLUMNS (b int path '$')),
xx int path '$.c'
))sub;
maybe this behavior is fine. but there is no explanation?

--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1327,6 +1327,7 @@ JsonPathMutableContext
 JsonPathParseItem
 JsonPathParseResult
 JsonPathPredicateCallback
+JsonPathSpec
this change is no need.


+ if (scan->child)
+ get_json_table_nested_columns(tf, scan->child, context, showimplicit,
+  scan->colMax >= scan->colMin);
except parse_jsontable.c, we only use colMin, colMax in get_json_table_columns.
aslo in parse_jsontable.c, we do it via:

+ /* Start of column range */
+ colMin = list_length(tf->colvalexprs);

+ /* End of column range */
+ colMax = list_length(tf->colvalexprs) - 1;

maybe we can use (bool *) to tag whether this JsonTableColumn is nested or not
in transformJsonTableColumns.

currently colMin, colMax seems to make parsing back json_table (nested
path only) not working.

I also added some slightly complicated tests to prove that the PASSING
clause works
with every level, aslo the multi level nesting clause works as intended.

As mentioned in the previous mail, parsing back nest columns
json_table expression
not working as we expected.

so the last view (jsonb_table_view7) I added,  the view definition is WRONG!!
the good news is the output is what we expected, the coverage is pretty high.


v1-0001-add-test-for-nested-path-clause-json_table.for_v49_0002
Description: Binary data


Re: remaining sql/json patches

2024-04-04 Thread jian he
On Wed, Apr 3, 2024 at 8:39 PM Amit Langote  wrote:
>
> Attached updated patches.  I have addressed your doc comments on 0001,
> but not 0002 yet.
>

in v49, 0002.
+\sv jsonb_table_view1
+CREATE OR REPLACE VIEW public.jsonb_table_view1 AS
+ SELECT id,
+a1,
+b1,
+a11,
+a21,
+a22
+   FROM JSON_TABLE(
+'null'::jsonb, '$[*]' AS json_table_path_0
+PASSING
+1 + 2 AS a,
+'"foo"'::json AS "b c"
+COLUMNS (
+id FOR ORDINALITY,
+a1 integer PATH '$."a1"',
+b1 text PATH '$."b1"',
+a11 text PATH '$."a11"',
+a21 text PATH '$."a21"',
+a22 text PATH '$."a22"',
+NESTED PATH '$[1]' AS p1
+COLUMNS (
+id FOR ORDINALITY,
+a1 integer PATH '$."a1"',
+b1 text PATH '$."b1"',
+a11 text PATH '$."a11"',
+a21 text PATH '$."a21"',
+a22 text PATH '$."a22"',
+NESTED PATH '$[*]' AS "p1 1"
+COLUMNS (
+id FOR ORDINALITY,
+a1 integer PATH '$."a1"',
+b1 text PATH '$."b1"',
+a11 text PATH '$."a11"',
+a21 text PATH '$."a21"',
+a22 text PATH '$."a22"'
+)
+),
+NESTED PATH '$[2]' AS p2
+COLUMNS (
+id FOR ORDINALITY,
+a1 integer PATH '$."a1"',
+b1 text PATH '$."b1"',
+a11 text PATH '$."a11"',
+a21 text PATH '$."a21"',
+a22 text PATH '$."a22"'
+NESTED PATH '$[*]' AS "p2:1"
+COLUMNS (
+id FOR ORDINALITY,
+a1 integer PATH '$."a1"',
+b1 text PATH '$."b1"',
+a11 text PATH '$."a11"',
+a21 text PATH '$."a21"',
+a22 text PATH '$."a22"'
+),
+NESTED PATH '$[*]' AS p22
+COLUMNS (
+id FOR ORDINALITY,
+a1 integer PATH '$."a1"',
+b1 text PATH '$."b1"',
+a11 text PATH '$."a11"',
+a21 text PATH '$."a21"',
+a22 text PATH '$."a22"'
+)
+)
+)
+)

execute this view definition (not the "create view") will have syntax error.
That means the changes in v49,0002 ruleutils.c are wrong.
also \sv the output is quite long, not easy to validate it.

we need a way to validate that the view definition is equivalent to
"select * from view".
so I added a view validate function to it.

we can put it in v49, 0001.
since json data type don't equality operator,
so I did some minor change to make the view validate  function works with
jsonb_table_view2
jsonb_table_view3
jsonb_table_view4
jsonb_table_view5
jsonb_table_view6


v49-0001-validate-parsing-back-json_table.no-cfbot
Description: Binary data


Re: remaining sql/json patches

2024-04-03 Thread jian he
hi.
+  
+   json_table is an SQL/JSON function which
+   queries JSON data
+   and presents the results as a relational view, which can be accessed as a
+   regular SQL table. You can only use
json_table inside the
+   FROM clause of a SELECT,
+   UPDATE, DELETE, or
MERGE
+   statement.
+  

the only issue is that MERGE Synopsis don't have
FROM clause.
other than that, it's quite correct.
see following tests demo:

drop table ss;
create table ss(a int);
insert into ss select 1;
delete from ss using JSON_TABLE(jsonb '1', '$' COLUMNS (a int PATH '$'
) ERROR ON ERROR) jt where jt.a = 1;
insert into ss select 2;
update ss set a = 1 from JSON_TABLE(jsonb '2', '$' COLUMNS (a int PATH
'$')) jt where jt.a = 2;
DROP TABLE IF EXISTS target;
CREATE TABLE target (tid integer, balance integer) WITH
(autovacuum_enabled=off);
INSERT INTO target VALUES (1, 10),(2, 20),(3, 30);
MERGE INTO target USING JSON_TABLE(jsonb '2', '$' COLUMNS (a int PATH
'$' ) ERROR ON ERROR) source(sid)
ON target.tid = source.sid
WHEN MATCHED THEN UPDATE SET balance = 0
returning *;
--

+  
+   To split the row pattern into columns, json_table
+   provides the COLUMNS clause that defines the
+   schema of the created view. For each column, a separate path expression
+   can be specified to be evaluated against the row pattern to get a
+   SQL/JSON value that will become the value for the specified column in
+   a given output row.
+  
should be "an SQL/JSON".

+
+ Inserts a SQL/JSON value obtained by applying
+ path_expression against the row pattern into
+ the view's output row after coercing it to specified
+ type.
+
should be "an SQL/JSON".

"coercing it to specified type"
should be
"coercing it to the specified type"?
---
+
+ The value corresponds to whether evaluating the PATH
+ expression yields any SQL/JSON values.
+
maybe we can change to
+
+ The value corresponds to whether applying the
path_expression
+ expression yields any SQL/JSON values.
+
so it looks more consistent with the preceding paragraph.

+
+ Optionally, ON ERROR can be used to specify whether
+ to throw an error or return the specified value to handle structural
+ errors, respectively.  The default is to return a boolean value
+ FALSE.
+
we don't need "respectively" here?

+ if (jt->on_error &&
+ jt->on_error->btype != JSON_BEHAVIOR_ERROR &&
+ jt->on_error->btype != JSON_BEHAVIOR_EMPTY &&
+ jt->on_error->btype != JSON_BEHAVIOR_EMPTY_ARRAY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid ON ERROR behavior"),
+ errdetail("Only EMPTY or ERROR is allowed for ON ERROR in JSON_TABLE()."),
+ parser_errposition(pstate, jt->on_error->location));

errdetail("Only EMPTY or ERROR is allowed for ON ERROR in JSON_TABLE()."),
maybe change to something like:
`
errdetail("Only EMPTY or ERROR is allowed for ON ERROR in the
top-level JSON_TABLE() ").
`
i guess mentioning "top-level" is fine.
since "top-level", we have 19  appearances in functions-json.html.




Re: remaining sql/json patches

2024-04-03 Thread jian he
On Wed, Apr 3, 2024 at 3:15 PM jian he  wrote:
>
> On Wed, Apr 3, 2024 at 11:30 AM jian he  wrote:
> >
> > On Tue, Apr 2, 2024 at 9:57 PM Amit Langote  wrote:
> > >
> > > Please let me know if you have further comments on 0001.  I'd like to
> > > get that in before spending more energy on 0002.
> > >

more doc issue with v48. 0001, 0002.

 The optional json_path_name serves as an
 identifier of the provided path_expression.
 The path name must be unique and distinct from the column names.

"path name" should be
json_path_name


git diff --check
doc/src/sgml/func.sgml:19192: trailing whitespace.
+ id |   kind   |  title  | director


+  
+   JSON data stored at a nested level of the row pattern can be extracted using
+   the NESTED PATH clause.  Each
+   NESTED PATH clause can be used to generate one or more
+   columns using the data from a nested level of the row pattern, which can be
+   specified using a COLUMNS clause.  Rows constructed from
+   such columns are called child rows and are joined
+   agaist the row constructed from the columns specified in the parent
+   COLUMNS clause to get the row in the final view.  Child
+   columns may themselves contain a NESTED PATH
+   specifification thus allowing to extract data located at arbitrary nesting
+   levels.  Columns produced by NESTED PATHs at the same
+   level are considered to be siblings and are joined
+   with each other before joining to the parent row.
+  

"agaist" should be "against".
"specifification" should be "specification".
+Rows constructed from
+   such columns are called child rows and are joined
+   agaist the row constructed from the columns specified in the parent
+   COLUMNS clause to get the row in the final view.
this sentence is long, not easy to comprehend, maybe we can rephrase it
or split it into two.



+  | NESTED PATH json_path_specification
 AS path_name 
+COLUMNS ( json_table_column
, ... )
v48, 0002 patch.
in the json_table synopsis section, put these two lines into one line,
I think would make it more readable.
also the following sgml code will render the html into one line.

  NESTED PATH
json_path_specification 
AS json_path_name

  COLUMNS (
json_table_column ,
... )


also path_name should be
json_path_name.



+
+ The NESTED PATH syntax is recursive,
+ so you can go down multiple nested levels by specifying several
+ NESTED PATH subclauses within each other.
+ It allows to unnest the hierarchy of JSON objects and arrays
+ in a single function invocation rather than chaining several
+ JSON_TABLE expressions in an SQL statement.
+
"The NESTED PATH syntax is recursive"
should be
`
The NESTED PATH syntax can be recursive,
you can go down multiple nested levels by specifying several
NESTED PATH subclauses within each other.
`




Re: remaining sql/json patches

2024-04-03 Thread jian he
On Wed, Apr 3, 2024 at 11:30 AM jian he  wrote:
>
> On Tue, Apr 2, 2024 at 9:57 PM Amit Langote  wrote:
> >
> > Please let me know if you have further comments on 0001.  I'd like to
> > get that in before spending more energy on 0002.
> >

-- a/src/backend/parser/parse_target.c
+++ b/src/backend/parser/parse_target.c
@@ -2019,6 +2019,9 @@ FigureColnameInternal(Node *node, char **name)
  case JSON_VALUE_OP:
  *name = "json_value";
  return 2;
+ case JSON_TABLE_OP:
+ *name = "json_table";
+ return 2;
  default:
  elog(ERROR, "unrecognized JsonExpr op: %d",
  (int) ((JsonFuncExpr *) node)->op);

"case JSON_TABLE_OP part", no need?
json_table output must provide column name and type?

I did some minor refactor transformJsonTableColumns, make the comments
align with the function intention.
in v48-0001, in transformJsonTableColumns we can `Assert(rawc->name);`.
since non-nested JsonTableColumn must specify column name.
in v48-0002, we can change to `if (rawc->coltype != JTC_NESTED)
Assert(rawc->name);`



SELECT * FROM JSON_TABLE(jsonb '1', '$' COLUMNS (a int PATH '$.a' )
ERROR ON ERROR) jt;
ERROR:  no SQL/JSON item

I thought it should just return NULL.
In this case, I thought that
(not column-level) ERROR ON ERROR should not interfere with "COLUMNS
(a int PATH '$.a' )".

+-- Other miscellanous checks
"miscellanous" should be "miscellaneous".


overall the coverage is pretty high.
the current test output looks fine.


v48-0001-minor-refactor-transformJsonTableColumns.only_for_v48_0001
Description: Binary data


Re: remaining sql/json patches

2024-04-02 Thread jian he
On Tue, Apr 2, 2024 at 9:57 PM Amit Langote  wrote:
>
> Please let me know if you have further comments on 0001.  I'd like to
> get that in before spending more energy on 0002.
>

hi. some issues with the doc.
i think, some of the "path expression" can be replaced by
"path_expression".
maybe not all of them.

+  
+   
+
+ context_item,
path_expression 
AS json_path_name
  PASSING {
value AS
varname } ,
...
+
+
+
+ The input data to query, the JSON path expression defining the query,
+ and an optional PASSING clause, which can provide data
+ values to the path_expression.
+ The result of the input data
+ evaluation is called the row pattern. The row
+ pattern is used as the source for row values in the constructed view.
+
+
+   

maybe
change this part "The input data to query, the JSON path expression
defining the query,"
to
`
context_item is the input data to query,
path_expression is the JSON path expression
defining the query,
`

+
+ Specifying FORMAT JSON makes it explcit that you
+ expect that the value to be a valid json object.
+
"explcit" change to "explicit", or should it be "explicitly"?
also FORMAT JSON can be override by OMIT QUOTES.
SELECT sub.* FROM JSON_TABLE('{"a":{"z1": "a"}}', '$.a' COLUMNS(xx
TEXT format json path '$.z1' omit quotes))sub;
it return not double quoted literal 'a', which cannot be a valid json.

create or replace FUNCTION test_format_json() returns table (thetype
text, is_ok bool) AS $$
declare
part1_sql text := $sql$SELECT sub.* FROM JSON_TABLE('{"a":{"z1":
"a"}}', '$.a'  COLUMNS(xx $sql$;
part2_sql text := $sql$ format json path '$.z1' omit quotes))sub $sql$;
run_status bool := true;
r record;
fin record;
BEGIN
for r in
select format_type(oid, -1) as aa
from pg_type where  typtype = 'b' and typarray != 0 and
typnamespace = 11 and typnotnull is false
loop
begin
-- raise notice '%',CONCAT_WS(' ', part1_sql, r.aa, part2_sql);
-- raise notice 'r.aa %', r.aa;
run_status := true;
execute CONCAT_WS(' ', part1_sql, r.aa, part2_sql) into fin;
return query select r.aa, run_status;
exception when others then
begin
run_status := false;
return query select r.aa, run_status;
end;
end;
end loop;
END;
$$ language plpgsql;
create table sss_1 as select * from test_format_json();
select * from sss_1 where is_ok is true;

use the above query, I've figure out that FORMAT JSON can apply to the
following types:
bytea
name
text
json
bpchar
character varying
jsonb
and these type's customized domain type.

overall, the idea is that:
 Specifying FORMAT JSON makes it explicitly that you
 expect that the value to be a valid json object.
FORMAT JSON can be overridden by OMIT QUOTES
specification, which can make the return value not a valid
json.
FORMAT JSON can only work with certain kinds of
data types.
---
+
+ Optionally, you can add ON ERROR clause to define
+ error behavior.
+
I think "error behavior" may refer to "what kind of error message it will omit"
but here, it's about what to do when an error happens.
so I guess it's misleading.

maybe we can explain it similar to json_exist.
+
+ Optionally, you can add ON ERROR clause to define
+  the behavior if an error occurs.
+

+
+ The specified type should have a cast from the
+ boolean.
+
should be
+
+ The specified type should have a cast from the
+ boolean.
+


+
+ Inserts a SQL/JSON value into the output row.
+
maybe
+
+ Inserts a value that the data type is
type into the output row.
+

+
+ Inserts a boolean item into each output row.
+
maybe changed to:
+
+ Inserts a value that the data type is
type into the output row.
+

"name type EXISTS" branch mentioned: "The specified type should have a
cast from the boolean."
but "name type [FORMAT JSON [ENCODING UTF8]] [ PATH path_expression ]"
never mentioned the "type"parameter.
maybe add one para, something like:
"after apply path_expression, the yield value cannot be coerce to
type it will return null"




Re: remaining sql/json patches

2024-04-02 Thread jian he
On Fri, Mar 22, 2024 at 12:08 AM Amit Langote  wrote:
>
> On Wed, Mar 20, 2024 at 9:53 PM Amit Langote  wrote:
> > I'll push 0001 tomorrow.
>
> Pushed that one.  Here's the remaining JSON_TABLE() patch.
>
I know v45 is very different from v47.
but v45 contains all the remaining features to be implemented.

I've attached 2 files.
v45-0001-propagate-passing-clause-to-every-json_ta.based_on_v45
after_apply_jsonpathvar.sql.

the first file should be applied after v45-0001-JSON_TABLE.patch
the second file has all kinds of tests to prove that
applying JsonPathVariable to the NESTED PATH is ok.

I know that v45 is not the whole patch we are going to push for postgres17.
I just want to point out that applying the PASSING clause to the NESTED PATH
works fine with V45.

that means, I think, we can safely apply PASSING clause to NESTED PATH for
feature "PLAN DEFAULT clause", "specific PLAN clause" and "sibling
NESTED COLUMNS clauses".


v45-0001-propagate-passing-clause-to-every-json_ta.based_on_v45
Description: Binary data


after_apply_jsonpathvar.sql
Description: application/sql


Re: remaining sql/json patches

2024-04-02 Thread jian he
hi.

+/*
+ * Recursively transform child JSON_TABLE plan.
+ *
+ * Default plan is transformed into a cross/union join of its nested columns.
+ * Simple and outer/inner plans are transformed into a JsonTablePlan by
+ * finding and transforming corresponding nested column.
+ * Sibling plans are recursively transformed into a JsonTableSibling.
+ */
+static Node *
+transformJsonTableChildPlan(JsonTableParseContext *cxt,
+ List *columns)
this comment is not the same as the function intention for now.
maybe we need to refactor it.


/*
* Each call to fetch a new set of rows - of which there may be very many
* if XMLTABLE is being used in a lateral join - will allocate a possibly
* substantial amount of memory, so we cannot use the per-query context
* here. perTableCxt now serves the same function as "argcontext" does in
* FunctionScan - a place to store per-one-call (i.e. one result table)
* lifetime data (as opposed to per-query or per-result-tuple).
*/
MemoryContextSwitchTo(tstate->perTableCxt);

maybe we can replace "XMLTABLE" to "XMLTABLE or JSON_TABLE"?



/* Transform and coerce the PASSING arguments to to jsonb. */
there should be only one "to"?

---
json_table_column clause doesn't have a passing clause.
we can only have one passing clause in json_table.
but during JsonTableInitPathScan, for each output columns associated
JsonTablePlanState
we already initialized the PASSING arguments via  `planstate->args = args;`
also transformJsonTableColumn already has a passingArgs argument.
technically we can use the jsonpath variable for every output column
regardless of whether it's nested or not.

JsonTable already has the "passing" clause,
we just need to pass it to function transformJsonTableColumns and it's callees.
based on that, I implemented it. seems quite straightforward.
I also wrote several contrived, slightly complicated tests.
It seems to work just fine.

simple explanation:
previously the following sql will fail, error message is that "could
not find jsonpath variable  %s".
now it will work.

SELECT sub.* FROM
JSON_TABLE(jsonb '{"a":{"za":[{"z1": [11,]},{"z21": [22,
234,2345]}]},"c": 3}',
'$' PASSING 22 AS x, 234 AS y
COLUMNS(
xx int path '$.c',
NESTED PATH '$.a.za[1]' as n1 columns
(NESTED PATH '$.z21[*]' as n2
COLUMNS (z21 int path '$?(@ == $"x" || @ == $"y" )' default 0 on empty)),
NESTED PATH '$.a.za[0]' as n4 columns
(NESTED PATH '$.z1[*]' as n3
COLUMNS (z1 int path '$?(@ > $"y" + 1988)' default 0 on empty)))
)sub;


v47-0001-propagate-passing-clause-to-every-json_table_.no-cfbot
Description: Binary data


Re: Emitting JSON to file using COPY TO

2024-04-01 Thread jian he
On Sat, Mar 9, 2024 at 9:13 AM jian he  wrote:
>
> On Sat, Mar 9, 2024 at 2:03 AM Joe Conway  wrote:
> >
> > On 3/8/24 12:28, Andrey M. Borodin wrote:
> > > Hello everyone!
> > >
> > > Thanks for working on this, really nice feature!
> > >
> > >> On 9 Jan 2024, at 01:40, Joe Conway  wrote:
> > >>
> > >> Thanks -- will have a look
> > >
> > > Joe, recently folks proposed a lot of patches in this thread that seem 
> > > like diverted from original way of implementation.
> > > As an author of CF entry [0] can you please comment on which patch 
> > > version needs review?
> >
> >
> > I don't know if I agree with the proposed changes, but I have also been
> > waiting to see how the parallel discussion regarding COPY extensibility
> > shakes out.
> >
> > And there were a couple of issues found that need to be tracked down.
> >
> > Additionally I have had time/availability challenges recently.
> >
> > Overall, chances seem slim that this will make it into 17, but I have
> > not quite given up hope yet either.
>
> Hi.
> summary changes I've made in v9 patches at [0]
>
> meta: rebased. Now you need to use `git apply` or `git am`, previously
> copyto_json.007.diff, you need to use GNU patch.
>
>
> at [1], Dean Rasheed found some corner cases when the returned slot's
> tts_tupleDescriptor
> from
> `
> ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0, true);
> processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
> `
> cannot be used for composite_to_json.
> generally DestReceiver->rStartup is to send the TupleDesc to the DestReceiver,
> The COPY TO DestReceiver's rStartup function is copy_dest_startup,
> however copy_dest_startup is a no-op.
> That means to make the final TupleDesc of COPY TO (FORMAT JSON)
> operation bullet proof,
> we need to copy the tupDesc from CopyToState's queryDesc.
> This only applies to when the COPY TO source is a query (example:
> copy (select 1) to stdout), not a table.
> The above is my interpretation.
>

trying to simplify the explanation.
first refer to the struct DestReceiver.
COPY TO (FORMAT JSON), we didn't send the preliminary Tupdesc to the
DestReceiver
via the rStartup function pointer within struct _DestReceiver.

`CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)`
the slot is the final slot returned via query execution.
but we cannot use the tupdesc (slot->tts_tupleDescriptor) to do
composite_to_json.
because the final return slot Tupdesc may change during the query execution.

so we need to copy the tupDesc from CopyToState's queryDesc.

aslo rebased, now we can apply it cleanly.
From 17d9f3765bb74d863e229afed59af7dd923f5379 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Mon, 1 Apr 2024 19:47:40 +0800
Subject: [PATCH v10 1/2] introduce json format for COPY TO operation.  json
 format is only allowed in COPY TO operation.  also cannot be used with header
 option.

 discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
 discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c...@joeconway.com
---
 doc/src/sgml/ref/copy.sgml |   5 ++
 src/backend/commands/copy.c|  13 +++
 src/backend/commands/copyto.c  | 125 -
 src/backend/parser/gram.y  |   8 ++
 src/backend/utils/adt/json.c   |   5 +-
 src/include/commands/copy.h|   1 +
 src/include/utils/json.h   |   2 +
 src/test/regress/expected/copy.out |  54 +
 src/test/regress/sql/copy.sql  |  38 +
 9 files changed, 208 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 33ce7c4e..add84dbb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -218,9 +218,14 @@ COPY { table_name [ ( text,
   csv (Comma Separated Values),
+  json (JavaScript Object Notation),
   or binary.
   The default is text.
  
+ 
+  The json option is allowed only in
+  COPY TO.
+ 
 

 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f75e1d70..02f16d9e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -497,6 +497,8 @@ ProcessCopyOptions(ParseState *pstate,
  /* default format */ ;
 			else if (strcmp(fmt, "csv") == 0)
 opts_out->csv_mode = true;
+			else if (strcmp(fmt, "json") == 0)
+opts_out->json_mode = true;
 			else if (strcmp(fmt, "binary") == 0)
 opts_out->binary = true;
 			else
@@ -740,6 +742,11 @@ ProcessCopyOptions(ParseState *pstate,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot specify HEADER in BINARY mo

Re: remaining sql/json patches

2024-04-01 Thread jian he
On Mon, Apr 1, 2024 at 8:00 AM jian he  wrote:
>
> +-- Should fail (JSON arguments are not passed to column paths)
> +SELECT *
> +FROM JSON_TABLE(
> + jsonb '[1,2,3]',
> + '$[*] ? (@ < $x)'
> + PASSING 10 AS x
> + COLUMNS (y text FORMAT JSON PATH '$ ? (@ < $x)')
> + ) jt;
> +ERROR:  could not find jsonpath variable "x"
>
> the error message does not correspond to the comments intention.
> also "y text FORMAT JSON" should be fine?

sorry for the noise, i've figured out why.

> only the second last example really using the PASSING clause.
> should the following query work just fine in this context?
>
> create table s(js jsonb);
> insert into s select '{"a":{"za":[{"z1": [11,]},{"z21": [22,
> 234,2345]}]},"c": 3}';
> SELECT sub.* FROM s,JSON_TABLE(js, '$' passing 11 AS "b c", 1 + 2 as y
> COLUMNS (xx int path '$.c ? (@ == $y)')) sub;
>
>
> I thought the json and text data type were quite similar.
> should these following two queries return the same result?
>
> SELECT sub.* FROM s, JSON_TABLE(js, '$' COLUMNS(
> xx int path '$.c',
> nested PATH '$.a.za[1]' columns (NESTED PATH '$.z21[*]' COLUMNS (a12
> jsonb path '$'))
> ))sub;
>
> SELECT sub.* FROM s,JSON_TABLE(js, '$' COLUMNS (
> c int path '$.c',
> NESTED PATH '$.a.za[1]' columns (z json path '$')
> )) sub;
sorry for the noise, i've figured out why.

there are 12 appearances of "NESTED PATH" in  sqljson_jsontable.sql.
but we don't have a real example of  NESTED PATH nested with NESTED PATH.
so I added some real tests on it.
i also added some tests about the PASSING clause.
please check the attachment.


/*
 * JsonTableInitPlan
 * Initialize information for evaluating a jsonpath given in
 * JsonTablePlan
 */
static void
JsonTableInitPathScan(JsonTableExecContext *cxt,
  JsonTablePlanState *planstate,
  List *args, MemoryContext mcxt)
{
JsonTablePlan *plan = (JsonTablePlan *) planstate->plan;
int i;

planstate->path = DatumGetJsonPathP(plan->path->value->constvalue);
planstate->args = args;
planstate->mcxt = AllocSetContextCreate(mcxt, "JsonTableExecContext",
ALLOCSET_DEFAULT_SIZES);

/* No row pattern evaluated yet. */
planstate->currentRow = PointerGetDatum(NULL);
planstate->currentRowIsNull = true;

for (i = plan->colMin; i <= plan->colMax; i++)
cxt->colexprplans[i] = planstate;
}

JsonTableInitPathScan's work is to init/assign struct
JsonTablePlanState's elements.
maybe we should just put JsonTableInitPathScan's work into JsonTableInitPlan
and also rename JsonTableInitPlan to "JsonTableInitPlanState" or
"InitJsonTablePlanState".



JsonTableSiblingJoin *join = (JsonTableSiblingJoin *) plan;
just rename the variable name, seems unnecessary?


v47-0001-add-more-json_table-tests.no-cfbot
Description: Binary data


Re: remaining sql/json patches

2024-03-31 Thread jian he
typedef struct JsonTableExecContext
{
int magic;
JsonTablePlanState *rootplanstate;
JsonTablePlanState **colexprplans;
} JsonTableExecContext;

imho, this kind of naming is kind of inconsistent.
"state" and "plan" are mixed together.
maybe

typedef struct JsonTableExecContext
{
int magic;
JsonTablePlanState *rootplanstate;
JsonTablePlanState **colexprstates;
} JsonTableExecContext;


+ cxt->colexprplans = palloc(sizeof(JsonTablePlanState *) *
+   list_length(tf->colvalexprs));
+
  /* Initialize plan */
- cxt->rootplanstate = JsonTableInitPlan(cxt, rootplan, args,
+ cxt->rootplanstate = JsonTableInitPlan(cxt, (Node *) rootplan, NULL, args,
CurrentMemoryContext);
I think, the comments "Initialize plan" is not right, here we
initialize the rootplanstate (JsonTablePlanState)
and also for each (no ordinality) columns, we also initialized the
specific JsonTablePlanState.

 static void JsonTableRescan(JsonTablePlanState *planstate);
@@ -331,6 +354,9 @@ static Datum JsonTableGetValue(TableFuncScanState
*state, int colnum,
Oid typid, int32 typmod, bool *isnull);
 static void JsonTableDestroyOpaque(TableFuncScanState *state);
 static bool JsonTablePlanNextRow(JsonTablePlanState *planstate);
+static bool JsonTablePlanPathNextRow(JsonTablePlanState *planstate);
+static void JsonTableRescan(JsonTablePlanState *planstate);

JsonTableRescan included twice?




  1   2   3   4   5   >