Re: remaining sql/json patches

2023-07-23 Thread jian he
hi
based on v10*.patch. questions/ideas about the doc.

> json_exists ( context_item, path_expression [ PASSING { value AS varname } [, 
> ...]] [ RETURNING data_type ] [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ])
> Returns true if the SQL/JSON path_expression applied to the context_item 
> using the values yields any items. The ON ERROR clause specifies what is 
> returned if an error occurs. Note that if the path_expression is strict, an 
> error is generated if it yields no items. The default value is UNKNOWN which 
> causes a NULL result.

only SELECT JSON_EXISTS(NULL::jsonb, '$'); will cause a null result.
In lex mode, if yield no items return false, no error will return,
even error on error.
Only  case error will happen, strict mode error on error. (select
json_exists(jsonb '{"a": [1,2,3]}', 'strict $.b' error on error)

so I came up with the following:
Returns true if the SQL/JSON path_expression applied to the
context_item using the values yields any items. The ON ERROR clause
specifies what is returned if an error occurs, if not specified, the
default value is false when it yields no items.
Note that if the path_expression is strict, ERROR ON ERROR specified,
an error is generated if it yields no items.
--
/* --first branch of json_table_column spec.

name type [ PATH json_path_specification ]
[ { WITHOUT | WITH { CONDITIONAL | [UNCONDITIONAL] } } [ ARRAY
] WRAPPER ]
[ { KEEP | OMIT } QUOTES [ ON SCALAR STRING ] ]
[ { ERROR | NULL | DEFAULT expression } ON EMPTY ]
[ { ERROR | NULL | DEFAULT expression } ON ERROR ]
*/
I am not sure what " [ ON SCALAR STRING ]"  means. There is no test on this.
i wonder how to achieve the following  query with json_table:
select json_query(jsonb '"world"', '$' returning text keep quotes) ;

the following case will fail.
SELECT * FROM JSON_TABLE(jsonb '"world"', '$' COLUMNS (item text PATH
'$' keep quotes ON SCALAR STRING ));
ERROR:  cannot use OMIT QUOTES clause with scalar columns
LINE 1: ...T * FROM JSON_TABLE(jsonb '"world"', '$' COLUMNS (item text ...
 ^
error should be ERROR:  cannot use KEEP QUOTES clause with scalar columns?
LINE1 should be: SELECT * FROM JSON_TABLE(jsonb '"world"', '$' COLUMNS
(item text ...

quote from json_query:
> This function must return a JSON string, so if the path expression returns 
> multiple SQL/JSON items, you must wrap the result using the
> WITH WRAPPER clause.

I think the final result will be: if the RETURNING clause is not
specified, then the returned data type is jsonb. if multiple SQL/JSON
items returned, if not specified WITH WRAPPER, null will be returned.

quote from json_query:
>  The ON ERROR and ON EMPTY clauses have similar semantics to those clauses 
> for json_value.
quote from json_table:
> These clauses have the same syntax and semantics as for json_value and 
> json_query.

it would be better in json_value syntax explicit mention: if not
explicitly mentioned, what will happen when on error, on empty
happened ?
-
> You can have only one ordinality column per table
but the regress test shows that you can have more than one ordinality column.

similar to here
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/expected/sqljson.out#n804
Maybe in file src/test/regress/sql/jsonb_sqljson.sql line 349, you can
also create a  table first. insert corner case data.
then split the very wide select query (more than 26 columns) into 4
small queries, better to view the expected result on the web.




Re: PG 16 draft release notes ready

2023-07-23 Thread Pavel Luzanov

Please consider to add item to the psql section:

Add psql \drg command to display role grants and remove the "Member of" 
column from \du & \dg altogether (d65ddaca)


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: PG 16 draft release notes ready

2023-07-23 Thread Michael Paquier
On Tue, Jul 04, 2023 at 05:32:07PM -0400, Bruce Momjian wrote:
> On Tue, Jul  4, 2023 at 03:31:05PM +0900, Michael Paquier wrote:
>> On Thu, May 18, 2023 at 04:49:47PM -0400, Bruce Momjian wrote:
>> Sawada-san has mentioned on twitter that fdd8937 is not mentioned in
>> the release notes, and it seems to me that he is right.  This is
>> described as a bug in the commit log, but it did not get backpatched
>> because of the lack of complaints.  Also, because we've removed
>> support for anything older than Windows 10 in PG16, this change very
>> easy to do.
> 
> I did review this and wasn't sure exactly what I would describe.  It is
> saying huge pages will now work on some versions of Windows 10 but
> didn't before?

Windows 10 has always used a forced automated rolling upgrade process,
so there are not many versions older than 1703, I suppose.  I don't
know if large pages were working before 1703 where
FILE_MAP_LARGE_PAGES has been introduced, and I have never been able
to test that.  Honestly, I don't think that we need to be picky about
the version mentioned, as per the forced upgrade process done by
Microsoft.

So, my preference would be to keep it simple and add an item like "Fix
huge pages on Windows 10 and newer versions", with as potential
subnote "The backend sets a flag named FILE_MAP_LARGE_PAGES to allow
huge pages", though this is not really mandatory to go down to this
level of internals, either.
--
Michael


signature.asc
Description: PGP signature


Re: Use COPY for populating all pgbench tables

2023-07-23 Thread Michael Paquier
On Fri, Jul 21, 2023 at 12:22:06PM -0500, Tristan Partin wrote:
> v7 looks good from my perspective. Thanks for working through this patch
> with me. Much appreciated.

Cool.  I have applied the new tests for now to move on with this
thread.
--
Michael


signature.asc
Description: PGP signature


Re: Use of additional index columns in rows filtering

2023-07-23 Thread Tomas Vondra



On 7/21/23 21:17, Peter Geoghegan wrote:
> ...
>> But I was
>> thinking more about the costing part - if you convert the clauses in
>> some way, does that affect the reliability of estimates somehow?
> 
> Obviously, it doesn't affect the selectivity at all. That seems most
> important (you kinda said the same thing yourself).
> 

Sorry, I think I meant 'cost estimates', not the selectivity estimates.
If you convert the original "simple" clauses into the more complex list,
presumably we'd cost that differently, right? I may be entirely wrong,
but my intuition is that costing these tiny clauses will be much more
difficult than costing the original clauses.

>> If the
>> conversion from AND to OR makes the list of clauses more complex, that
>> might be an issue ...
> 
> That's definitely a concern. Even still, the biggest problem by far in
> this general area is selectivity estimation. Which, in a way, can be
> made a lot easier by this general approach.
> 
> Let's say we have the tenk1 table, with the same composite index as in
> my example upthread (on "(two,four,twenty)"). Further suppose you have
> a very simple query: "select count(*) from tenk1". On master (and with
> the patch) that's going to give you an index-only scan on the
> composite index (assuming it's the only index), which is quite
> efficient despite being a full index scan -- 11 buffer hits.
> 
> This much more complicated count(*) query is where it gets interesting:
> 
> select
>   count(*),
>   two,
>   four,
>   twenty
> from
>   tenk1_dyn_saop
> where
>   two in (0, 1)
>   and four in (1, 2, 3, 4)
>   and twenty in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
> group by
>   two,
>   four,
>   twenty
> order by
>   two,
>   four,
>   twenty;
> 
> It's inherently very difficult to predict how selective this query
> will be using basic statistics. But maybe it doesn't need to matter so
> much, so often.
> 
> The patch can execute this with an index-only scan + GroupAggregate.
> What ends up happening is that the patch gets 9 buffer hits -- so
> pretty close to 11. Master can use almost the same query plan (it uses
> quals but needs to hashagg+ sort). It ends up getting 245 buffer hits
> -- vastly more than what we see for the full index scan case (and
> nothing to do with the sort/an issue with a limit). That's nearly as
> many hits as you'd get with a sequential scan. (BTW, I don't need to
> coax the query planner to get this result on master.)
> 
> With the patch you can vary the predicate in whatever way, so that the
> selectivity shifts up or down. Occasionally you'll get maybe one extra
> buffer access relative to the base full index scan case, but overall
> the patch makes the worst case look very much like a full index scan
> (plus some relatively tiny CPU overhead). This is just common sense,
> in a way; selectivities are always between 0.0 and 1.0. Why shouldn't
> we be able to think about it like that?
> 

Right, I agree with this reasoning in principle.

But I'm getting a bit lost regarding what's the proposed costing
strategy. It's hard to follow threads spanning days, with various other
distractions, etc.

In principle, I think:

a) If we estimate the scan to return almost everything (or rather if we
expect it to visit almost the whole index), it makes perfect sense to
cost is as a full index scan.

b) What should we do if we expect to read only a fraction of the index?
If we're optimistic, and cost is according to the estimates, but then
end up most of the index, how bad could it be (compared to the optimal
plan choice)? Similarly, if we're pessimistic/defensive and cost it as
full index scan, how many "good" cases would we reject based on the
artificially high cost estimate?

I don't have a very good idea how sensitive the cost is to selectivity
changes, which I think is crucial for making judgments.

>> I wasn't really thinking about LIMIT, and I don't think it changes the
>> overall behavior very much (sure, it's damn difficult to estimate for
>> skewed data sets, but meh).
>>
>> The case I had in mind is something like this:
>>
>> CREATE TABLE t (a int, b int, c int);
>> CREATE INDEX ON t (a);
>> INSERT INTO t SELECT i, i, i FROM generate_series(1,100) s(i);
>>
>> SELECT * FROM t WHERE bad_qual(a) AND b < 1 AND c < 1 ORDER BY a;
>>
>> where bad_qual is expensive and matches almost all rows.
> 
> You must distinguish between quals that can become required scan keys
> (or can terminate the scan), and all other quals. This is really
> important for my pending SAOP patch, but I think it might be important
> here too. I wonder if the best place to address the possibility of
> such a regression is in the index AM itself.
> 
> Let's make your example a bit more concrete: let's assume that
> bad_qual is a very expensive integer comparison, against a column that
> has only one possible value. So now your example becomes:
> 
> CREATE TABLE t (a expensive_int, b int, c int);
> CREATE INDEX ON t (a);
> INSERT INTO t SEL

Re: Making Vars outer-join aware

2023-07-23 Thread Anton A. Melnikov

On 08.06.2023 19:58, Tom Lane wrote:

I think the right thing here is not either of your patches, but
to tweak adjust_relid_set() to not fail on negative oldrelid.
I'll go make it so.


Thanks! This fully solves the problem with ChangeVarNodes() that i wrote above.



Hmm.  That implies that you're changing plan data structures around after
setrefs.c, which doesn't seem like a great design to me --- IMO that ought
to happen in PlanCustomPath, which will still see the original varnos.


My further searchers led to the fact that it is possible to immediately set the
necessary varnos during custom_scan->scan.plan.targetlist creation and leave the
сustom_scan->custom_scan_tlist = NIL rather than changing them later using 
ChangeVarNodes().
This resulted in a noticeable code simplification.
Thanks a lot for pointing on it!

Sincerely yours,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: There should be a way to use the force flag when restoring databases

2023-07-23 Thread Ahmed Ibrahim
Hi everyone,

I have been working on this. This is a proposed patch for it so we have a
force option for DROPping the database.

I'd appreciate it if anyone can review.



On Thu, Jul 20, 2023 at 9:36 PM Gurjeet Singh  wrote:

> On Thu, Jul 20, 2023 at 2:10 AM Daniel Gustafsson  wrote:
> >
> > > On 19 Jul 2023, at 19:28, Gurjeet Singh  wrote:
> > >
> > > The docs for 'pg_restore --create` say "Create the database before
> > > restoring into it. If --clean is also specified, drop and recreate the
> > > target database before connecting to it."
> > >
> > > If we provided a force option, it may then additionally say: "If the
> > > --clean and --force options are specified, DROP DATABASE ... WITH
> > > FORCE command will be used to drop the database."
> >
> > pg_restore --clean refers to dropping any pre-existing database objects
> and not
> > just databases, but --force would only apply to databases.
> >
> > I wonder if it's worth complicating pg_restore with that when running
> dropdb
> > --force before pg_restore is an option for those wanting to use WITH
> FORCE.
>
> Fair point. But the same argument could've been applied to --clean
> option, as well; why overload the meaning of --clean and make it drop
> database, when a dropdb before pg_restore was an option.
>
> IMHO, if pg_restore offers to drop database, providing an option to
> the user to do it forcibly is not that much of a stretch, and within
> reason for the user to expect it to be there, like Joan did.
>
> Best regards,
> Gurjeet
> http://Gurje.et
>
>
>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..aeec3c8dcb 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -154,6 +154,8 @@ typedef struct _restoreOptions
 	int			enable_row_security;
 	int			sequence_data;	/* dump sequence data even in schema-only mode */
 	int			binary_upgrade;
+
+	int			force;
 } RestoreOptions;
 
 typedef struct _dumpOptions
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 39ebcfec32..cc13be841a 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -532,6 +532,16 @@ RestoreArchive(Archive *AHX)
  */
 if (*te->dropStmt != '\0')
 {
+	if (ropt->force){
+		char	   *dropStmt = pg_strdup(te->dropStmt);
+		PQExpBuffer ftStmt = createPQExpBuffer();
+		dropStmt[strlen(dropStmt) - 2] = ' ';
+		dropStmt[strlen(dropStmt) - 1] = '\0';
+		appendPQExpBufferStr(ftStmt, dropStmt);
+		appendPQExpBufferStr(ftStmt, "WITH(FORCE);");
+		te->dropStmt = ftStmt->data;
+	}
+	
 	if (!ropt->if_exists ||
 		strncmp(te->dropStmt, "--", 2) == 0)
 	{
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index 049a100634..02347e86fb 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -74,6 +74,7 @@ main(int argc, char **argv)
 	static int	no_security_labels = 0;
 	static int	no_subscriptions = 0;
 	static int	strict_names = 0;
+	static int	force = 0;
 
 	struct option cmdopts[] = {
 		{"clean", 0, NULL, 'c'},
@@ -123,6 +124,7 @@ main(int argc, char **argv)
 		{"no-publications", no_argument, &no_publications, 1},
 		{"no-security-labels", no_argument, &no_security_labels, 1},
 		{"no-subscriptions", no_argument, &no_subscriptions, 1},
+		{"force", no_argument, &force, 1},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -351,6 +353,7 @@ main(int argc, char **argv)
 	opts->no_publications = no_publications;
 	opts->no_security_labels = no_security_labels;
 	opts->no_subscriptions = no_subscriptions;
+	opts->force = force;
 
 	if (if_exists && !opts->dropSchema)
 		pg_fatal("option --if-exists requires option -c/--clean");


multiple membership grants and information_schema.applicable_roles

2023-07-23 Thread Pavel Luzanov
I found that multiple membership grants added in v16 affects the 
information_schema.applicable_roles view.


Examples on a master, but they works for v16 too.

Setup multiple membership alice in bob:

postgres@postgres(17.0)=# \drg alice
   List of role grants
 Role name | Member of |   Options    | Grantor
---+---+--+--
 alice | bob   | INHERIT, SET | alice
 alice | bob   | INHERIT, SET | charlie
 alice | bob   | ADMIN    | postgres
(3 rows)

The application_roles view shows duplicates:

postgres@postgres(17.0)=# SELECT * FROM 
information_schema.applicable_roles WHERE grantee = 'alice';

 grantee | role_name | is_grantable
-+---+--
 alice   | bob   | NO
 alice   | bob   | YES
(2 rows)

View definition:

postgres@postgres(17.0)=# \sv information_schema.applicable_roles
CREATE OR REPLACE VIEW information_schema.applicable_roles AS
 SELECT a.rolname::information_schema.sql_identifier AS grantee,
    b.rolname::information_schema.sql_identifier AS role_name,
    CASE
    WHEN m.admin_option THEN 'YES'::text
    ELSE 'NO'::text
    END::information_schema.yes_or_no AS is_grantable
   FROM ( SELECT pg_auth_members.member,
    pg_auth_members.roleid,
    pg_auth_members.admin_option
   FROM pg_auth_members
    UNION
 SELECT pg_database.datdba,
    pg_authid.oid,
    false
   FROM pg_database,
    pg_authid
  WHERE pg_database.datname = current_database() AND 
pg_authid.rolname = 'pg_database_owner'::name) m

 JOIN pg_authid a ON m.member = a.oid
 JOIN pg_authid b ON m.roleid = b.oid
  WHERE pg_has_role(a.oid, 'USAGE'::text);


I think that only one row with admin option should be returned.
This can be achieved by adding group by + bool_or to the inner select 
from pg_auth_members.


BEGIN;
BEGIN
postgres@postgres(17.0)=*# CREATE OR REPLACE VIEW 
information_schema.applicable_roles AS

SELECT a.rolname::information_schema.sql_identifier AS grantee,
    b.rolname::information_schema.sql_identifier AS role_name,
    CASE
    WHEN m.admin_option THEN 'YES'::text
    ELSE 'NO'::text
    END::information_schema.yes_or_no AS is_grantable
   FROM ( SELECT pg_auth_members.member,
    pg_auth_members.roleid,
    bool_or(pg_auth_members.admin_option) AS admin_option
   FROM pg_auth_members
   GROUP BY 1, 2
    UNION
 SELECT pg_database.datdba,
    pg_authid.oid,
    false
   FROM pg_database,
    pg_authid
  WHERE pg_database.datname = current_database() AND 
pg_authid.rolname = 'pg_database_owner'::name) m

 JOIN pg_authid a ON m.member = a.oid
 JOIN pg_authid b ON m.roleid = b.oid
  WHERE pg_has_role(a.oid, 'USAGE'::text);
CREATE VIEW
postgres@postgres(17.0)=*# SELECT * FROM 
information_schema.applicable_roles WHERE grantee = 'alice';

 grantee | role_name | is_grantable
-+---+--
 alice   | bob   | YES
(1 row)

postgres@postgres(17.0)=*# ROLLBACK;
ROLLBACK

Should we add group by + bool_or to the applicable_roles view?

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: Use of additional index columns in rows filtering

2023-07-23 Thread Peter Geoghegan
On Sun, Jul 23, 2023 at 5:04 AM Tomas Vondra
 wrote:
> Sorry, I think I meant 'cost estimates', not the selectivity estimates.
> If you convert the original "simple" clauses into the more complex list,
> presumably we'd cost that differently, right? I may be entirely wrong,
> but my intuition is that costing these tiny clauses will be much more
> difficult than costing the original clauses.

I think that that's definitely true (it is more difficult), but that
there may be a bigger picture.

> Right, I agree with this reasoning in principle.
>
> But I'm getting a bit lost regarding what's the proposed costing
> strategy.

To be clear, I don't have a clue how to better estimate the
cardinality of multiple attributes from a composite index. The big and
immediate change to the SAOP costing with my patch is that
genericcostestimate/btcostestimate can safely assume that visiting
each leaf page more than once is a physical impossibility. Because it
is. It is no longer necessary to treat SAOPs similarly to a nested
loop join during costing, which is how it works today.

Now, whenever you add increasingly complicated clauses to a
multi-column SAOP query (like the ones I've shown you), it makes sense
for the cost to "saturate" at a certain point. That should be
representative of the physical reality, for both CPU costs and I/O
costs. Right now the worst case is really relevant to the average
case, since the risk of the costs just exploding at runtime is very
real.

If the only problem in this area was the repeated accesses to the same
leaf pages (accesses that happen in very close succession anyway),
then all of this would be a nice win, but not much more. It certainly
wouldn't be expected to change the way we think about stuff that isn't
directly and obviously relevant. But, it's not just the index pages.
Once you start to consider the interactions with filter/qpquals, it
gets much more interesting. Now you're talking about completely
avoiding physical I/Os for heap accesses, which has the potential to
make a dramatic difference to some types of queries, particularly in
the worst case.

> It's hard to follow threads spanning days, with various other
> distractions, etc.

I have to admit that my thinking on this very high level stuff is
rather unrefined. As much as anything else, I'm trying to invent (or
discover) a shared vocabulary for discussing these issues. I might
have gone about it clumsily, at times. I appreciate being able to
bounce this stuff off you.

> I don't have a very good idea how sensitive the cost is to selectivity
> changes, which I think is crucial for making judgments.

I'm not trying to find a way for the optimizer to make better
judgments about costing with a multi-column index. What I'm suggesting
(rather tentatively) is to find a way for it to make fewer (even no)
judgements at all.

If you can find a way of reducing the number of possible choices
without any real downside -- in particular by just not producing index
paths that cannot possibly be a win in some cases -- then you reduce
the number of bad choices. The challenge is making that kind of
approach in the optimizer actually representative of the executor in
the real world. The executor has to have robust performance under a
variety of runtime conditions for any of this to make sense.

> > It's natural to think of things like this _bt_readpage optimization as
> > something that makes existing types of plan shapes run faster. But you
> > can also think of them as things that make new and fundamentally
> > better plan shapes feasible, by making risky things much less risky.
> >
>
> That'd be an interesting optimization, but I don't think that matters
> for this patch, as it's not messing with index scan keys at all. I mean,
> it does not affect what scan keys get passed to the AM at all, or what
> scan keys are required. And it does not influence what the AM does. So
> all this seems interesting, but rather orthogonal to this patch.

Your patch is approximately the opposite of what I'm talking about, in
terms of its structure. The important commonality is that each patch
adds "superfluous" quals that can be very useful at runtime, under the
right circumstances -- which can be hard to predict. Another
similarity is that both patches inspire some of the same kind of
lingering doubts about extreme cases -- cases where (somehow) the
usual/expected cost asymmetry that usually works in our favor doesn't
apply.

My current plan is to post v1 of my patch early next week. It would be
better to discuss this on the thread that I create for that patch.

You're right that "exploiting index ordering" on the index AM side is
totally unrelated to your patch. Sorry about that.

> I was rather thinking about maybe relaxing the rules about which index
> paths we create, to include indexes that might be interesting thanks to
> index-only filters (unless already picked thanks to sorting).

That seems like it would make sense. In general I think that we
ove

Re: multiple membership grants and information_schema.applicable_roles

2023-07-23 Thread Tom Lane
Pavel Luzanov  writes:
> The application_roles view shows duplicates:

> postgres@postgres(17.0)=# SELECT * FROM 
> information_schema.applicable_roles WHERE grantee = 'alice';
>   grantee | role_name | is_grantable
> -+---+--
>   alice   | bob   | NO
>   alice   | bob   | YES
> (2 rows)

AFAICT this is also possible with the SQL standard's definition
of this view, so I don't see a bug here:

CREATE RECURSIVE VIEW APPLICABLE_ROLES ( GRANTEE, ROLE_NAME, IS_GRANTABLE ) AS
( ( SELECT GRANTEE, ROLE_NAME, IS_GRANTABLE
FROM DEFINITION_SCHEMA.ROLE_AUTHORIZATION_DESCRIPTORS
WHERE ( GRANTEE IN
( CURRENT_USER, 'PUBLIC' )
 OR
GRANTEE IN
( SELECT ROLE_NAME
  FROM ENABLED_ROLES ) ) )
  UNION
  ( SELECT RAD.GRANTEE, RAD.ROLE_NAME, RAD.IS_GRANTABLE
FROM DEFINITION_SCHEMA.ROLE_AUTHORIZATION_DESCRIPTORS RAD
  JOIN
 APPLICABLE_ROLES R
ON
   RAD.GRANTEE = R.ROLE_NAME ) );

The UNION would remove rows only when they are duplicates across all
three columns.

I do see what seems like a different issue: the standard appears to expect
that indirect role grants should also be shown (via the recursive CTE),
and we are not doing that.

regards, tom lane




[BUG] Crash on pgbench initialization.

2023-07-23 Thread Anton A. Melnikov

Hello!

My colleague Victoria Shepard reported that pgbench might crash
during initialization with some values of shared_buffers and
max_worker_processes in conf.

After some research, found this happens when the LimitAdditionalPins() returns 
exactly zero.
In the current master, this will happen e.g. if shared_buffers = 10MB and 
max_worker_processes = 40.
Then the command "pgbench --initialize postgres" will lead to crash.
See the backtrace attached.

There is a fix in the patch applied. Please take a look on it.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company#1  0x7f9169557859 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x7f91696eabf7, sa_sigaction 
= 0x7f91696eabf7}, sa_mask = {__val = {1, 140262515851294, 3, 140727328224084, 
12, 140262515851298, 2, 3487533463194566656, 7292515507211941683, 
94490218952736, 7291953849184874368, 0, 6147461878018348800, 140727328224176, 
94490235583952, 140727328225056}}, sa_flags = 938440736, sa_restorer = 
0x7ffda268ef80}
sigs = {__val = {32, 0 }}
#2  0x55f03865f3a8 in ExceptionalCondition (conditionName=0x55f038846b8c 
"nblocks > 0", fileName=0x55f038846997 "md.c", lineNumber=534) at assert.c:66
No locals.
#3  0x55f038479e41 in mdzeroextend (reln=0x55f038ed6e38, 
forknum=MAIN_FORKNUM, blocknum=0, nblocks=0, skipFsync=false) at md.c:534
v = 0x55f038d96300
curblocknum = 0
remblocks = 0
__func__ = "mdzeroextend"
#4  0x55f03847c747 in smgrzeroextend (reln=0x55f038ed6e38, 
forknum=MAIN_FORKNUM, blocknum=0, nblocks=0, skipFsync=false) at smgr.c:525
No locals.
#5  0x55f03842fc72 in ExtendBufferedRelShared (eb=..., fork=MAIN_FORKNUM, 
strategy=0x55f038e1d7a8, flags=8, extend_by=0, extend_upto=4294967295, 
buffers=0x7ffda268ba30, extended_by=0x7ffda268b8fc) at bufmgr.c:2057
first_block = 0
io_context = IOCONTEXT_BULKWRITE
io_start = {ticks = 0}
__func__ = "ExtendBufferedRelShared"
#6  0x55f03842f512 in ExtendBufferedRelCommon (eb=..., fork=MAIN_FORKNUM, 
strategy=0x55f038e1d7a8, flags=8, extend_by=17, extend_upto=4294967295, 
buffers=0x7ffda268ba30, extended_by=0x7ffda268b9dc) at bufmgr.c:1805
first_block = 22000
#7  0x55f03842de78 in ExtendBufferedRelBy (eb=..., fork=MAIN_FORKNUM, 
strategy=0x55f038e1d7a8, flags=8, extend_by=17, buffers=0x7ffda268ba30, 
extended_by=0x7ffda268b9dc) at bufmgr.c:862
No locals.
#8  0x55f037f773fa in RelationAddBlocks (relation=0x7f91655d97b8, 
bistate=0x55f038e1d778, num_pages=17, use_fsm=false, did_unlock=0x7ffda268bb8d) 
at hio.c:324
victim_buffers = {1, 0, 953770752, 22000, -1570194736, 0, 955084344, 
22000, 955072168, 22000, 0, 0, -1570194800, 32765, 944220747, 22000, 16384, 0, 
955084344, 22000, 0, 0, 953770752, 22000, -1570194752, 32765, 944228643, 22000, 
-1570194752, 0, 955084344, 22000, 0, 0, 1700632504, 0, -1570194704, 32765, 
939296560, 22000, -1570194624, 0, 1700632504, 32657, 16384, 0, 0, 0, 
-1570194672, 32765, 943901980, 22000, 8000, 0, 1700632504, 32657, -1570194624, 
32765, 943923688, 22000, -1570194512, 0, 1700632504, 32657}
first_block = 4294967295
last_block = 4294967295
extend_by_pages = 17
not_in_fsm_pages = 17
buffer = 22000
page = 0xa268ba20 
__func__ = "RelationAddBlocks"
#9  0x55f037f77d01 in RelationGetBufferForTuple (relation=0x7f91655d97b8, 
len=128, otherBuffer=0, options=6, bistate=0x55f038e1d778, 
vmbuffer=0x7ffda268bc2c, vmbuffer_other=0x0, num_pages=17) at hio.c:749
use_fsm = false
buffer = 0
page = 0x6a268bc34 
nearlyEmptyFreeSpace = 8016
pageFreeSpace = 0
saveFreeSpace = 0
targetFreeSpace = 128
targetBlock = 4294967295
otherBlock = 4294967295
unlockedTargetBuffer = 127
recheckVmPins = false
__func__ = "RelationGetBufferForTuple"
#10 0x55f037f5e5e2 in heap_multi_insert (relation=0x7f91655d97b8, 
slots=0x55f038e37b08, ntuples=1000, cid=15, options=6, bistate=0x55f038e1d778) 
at heapam.c:2193
buffer = 32657
all_visible_cleared = false
all_frozen_set = false
nthispage = -1570194336
xid = 734
heaptuples = 0x55f038ed1e98
i = 1000
ndone = 0
scratch = {data = 
"мh\242\001\000\000\000\204\352}e\221\177\000\000\340\274h\242\375\177\000\000\v|F8\360U\000\000\020\275h\242\001\000\000\000\204\352}e\221\177\000\000\020\275h\242\375\177\000\000\211\233F8\360U\000\000\020\275h\001\000\000\224\223\200\352}e\221\177\000\000`\267\241\000\000\000\000
 
\000\000\000\000\001\000\000\000\260\275h\242\375\177\000\000\242\352B8\004\000\000\000P\275h\242\375\177\000\000\342\333J8\360U\000\000\000\000\000\000\002\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000p\177\000\000\330\344\336\070\360U\000\000\200\275h\242\375\177\000\000\333\334J8\360

Re: Row pattern recognition

2023-07-23 Thread Vik Fearing

On 7/22/23 08:14, Tatsuo Ishii wrote:

On 7/22/23 03:11, Tatsuo Ishii wrote:

Maybe. Suppose a window function executes row pattern matching using
price > PREV(price). The window function already receives
WindowStatePerFuncData. If we can pass the WindowStatePerFuncData to
PREV, we could let PREV do the real work (getting previous tuple).
I have not tried this yet, though.


I don't understand this logic.  Window functions work over a window
frame.


Yes.


What we are talking about here is *defining* a window
frame.


Well, we are defining a "reduced" window frame within a (full) window
frame. A "reduced" window frame is calculated each time when a window
function is called.



Why?  It should only be recalculated when the current row changes and we 
need a new frame.  The reduced window frame *is* the window frame for 
all functions over that window.




How can a window function execute row pattern matching?


A window function is called for each row fed by an outer plan. It
fetches current, previous and next row to execute pattern matching. If
it matches, the window function moves to next row and repeat the
process, until pattern match fails.

Below is an example window function to execute pattern matching (I
will include this in the v3 patch). row_is_in_reduced_frame() is a
function to execute pattern matching. It returns the number of rows in
the reduced frame if pattern match succeeds. If succeeds, the function
returns the last row in the reduced frame instead of the last row in
the full window frame.



I strongly disagree with this.  Window function do not need to know how 
the frame is defined, and indeed they should not.  WinGetFuncArgInFrame 
should answer yes or no and the window function just works on that. 
Otherwise we will get extension (and possibly even core) functions that 
don't handle the frame properly.

--
Vik Fearing





Re: Row pattern recognition

2023-07-23 Thread Tatsuo Ishii
>>> What we are talking about here is *defining* a window
>>> frame.
>> Well, we are defining a "reduced" window frame within a (full) window
>> frame. A "reduced" window frame is calculated each time when a window
>> function is called.
> 
> 
> Why?  It should only be recalculated when the current row changes and
> we need a new frame.  The reduced window frame *is* the window frame
> for all functions over that window.

We already recalculate a frame each time a row is processed even
without RPR. See ExecWindowAgg.

Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW,
which means the frame head is changed each time current row position
changes.

> I strongly disagree with this.  Window function do not need to know
> how the frame is defined, and indeed they should not.

We already break the rule by defining *support functions. See
windowfuncs.c.

> WinGetFuncArgInFrame should answer yes or no and the window function
> just works on that. Otherwise we will get extension (and possibly even
> core) functions that don't handle the frame properly.

Maybe I can move row_is_in_reduced_frame into WinGetFuncArgInFrame
just for convenience.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Support worker_spi to execute the function dynamically.

2023-07-23 Thread Masahiro Ikeda

On 2023-07-22 01:05, Bharath Rupireddy wrote:
On Fri, Jul 21, 2023 at 4:05 PM Masahiro Ikeda 
 wrote:

(2)

Do we need "worker_spi.total_workers = 0" and
"shared_preload_libraries = worker_spi" in dynamic.conf.

Currently, the static bg workers will not be launched because
"shared_preload_libraries = worker_spi" is removed. So
"worker_spi.total_workers = 0" is meaningless.


You're right. worker_spi.total_workers = 0 in custom.conf has no
effect. without shared_preload_libraries = worker_spi. Removed that.


OK. If so, we need to remove the following comment in Makefile.


# enable our module in shared_preload_libraries for dynamic bgworkers


I also confirmed that the tap tests work with meson and make.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-23 Thread Masahiko Sawada
On Sat, Jul 22, 2023 at 7:32 PM Amit Kapila  wrote:
>
> On Fri, Jul 21, 2023 at 6:55 AM Masahiko Sawada  wrote:
> >
> > I've attached the updated patch. I'll push it early next week, barring
> > any objections.
> >
>
> You have moved most of the comments related to the restriction of
> which index can be picked atop IsIndexUsableForReplicaIdentityFull().
> Now, the comments related to limitation atop
> FindUsableIndexForReplicaIdentityFull() look slightly odd as it refers
> to limitations but those limitation were not stated. The comments I am
> referring to are: "Note that the limitations of index scans for
> replica identity full only  might not be a good idea in some
> cases". Shall we move these as well atop
> IsIndexUsableForReplicaIdentityFull()?

Good point.

Looking at neighbor comments, the following comment looks slightly odd to me:

 * XXX: See IsIndexUsableForReplicaIdentityFull() to know the challenges in
 * supporting indexes other than btree and hash. For partial indexes, the
 * required changes are likely to be larger. If none of the tuples satisfy
 * the expression for the index scan, we fall-back to sequential execution,
 * which might not be a good idea in some cases.

Are the first and second sentences related actually?

I think we can move it as well to
IsIndexUsableForReplicaIdentityFull() with some adjustments. I've
attached the updated patch that incorporated your comment and included
my idea to update the comment.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v5-0001-Remove-unnecessary-checks-for-indexes-for-REPLICA.patch
Description: Binary data


Re: pg16b2: REINDEX segv on null pointer in RemoveFromWaitQueue

2023-07-23 Thread Masahiko Sawada
On Wed, Jul 12, 2023 at 8:52 PM Justin Pryzby  wrote:
>
> On Mon, Jul 10, 2023 at 09:01:37PM -0500, Justin Pryzby wrote:
> > An instance compiled locally, without assertions, failed like this:
> >
> ...
> >
> > => REINDEX was running, with parallel workers, but deadlocked with
> > ANALYZE, and then crashed.
> >
> > It looks like parallel workers are needed to hit this issue.
> > I'm not sure if the issue is specific to extended stats - probably not.
> >
> > I reproduced the crash with manual REINDEX+ANALYZE, and with assertions 
> > (which
> > were not hit), and on a more recent commit (1124cb2cf).  The crash is hit 
> > about
> > 30% of the time when running a loop around REINDEX and then also running
> > ANALYZE.
> >
> > I hope someone has a hunch where to look; so far, I wasn't able to create a
> > minimal reproducer.
>
> I was able to reproduce this in isolation by reloading data into a test
> instance, ANALYZEing the DB to populate pg_statistic_ext_data (so it's
> over 3MB in size), and then REINDEXing the stats_ext index in a loop
> while ANALYZEing a table with extended stats.
>
> I still don't have a minimal reproducer, but on a hunch I found that
> this fails at 5764f611e but not its parent.
>
> commit 5764f611e10b126e09e37fdffbe884c44643a6ce
> Author: Andres Freund 
> Date:   Wed Jan 18 11:41:14 2023 -0800
>
> Use dlist/dclist instead of PROC_QUEUE / SHM_QUEUE for heavyweight locks
>

Good catch. I didn't realize this email but while investigating the
same issue that has been reported recently[1], I reached the same
commit. I've sent my analysis and a patch to fix this issue there.
Andres, since this issue seems to be relevant with your commit
5764f611e, could you please look at this issue and my patch?

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoDs7vzK7NErse7xTruqY-FXmM%2B3K00SdBtMcQhiRNkoeQ%40mail.gmail.com

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: POC: GROUP BY optimization

2023-07-23 Thread Andrey Lepikhov

On 20/7/2023 18:46, Tomas Vondra wrote:

On 7/20/23 08:37, Andrey Lepikhov wrote:

On 3/10/2022 21:56, Tom Lane wrote:

Revert "Optimize order of GROUP BY keys".

This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
several follow-on fixes.
...
Since we're hard up against the release deadline for v15, let's
revert these changes for now.  We can always try again later.


It may be time to restart the project. As a first step, I rebased the
patch on the current master. It wasn't trivial because of some latest
optimizations (a29eab, 1349d27 and 8d83a5d).
Now, Let's repeat the review and rewrite the current path according to
the reasons uttered in the revert commit.


I think the fundamental task is to make the costing more reliable, and
the commit message 443df6e2db points out a couple challenges in this
area. Not sure how feasible it is to address enough of them ...

1) procost = 1.0 - I guess we could make this more realistic by doing
some microbenchmarks and tuning the costs for the most expensive cases.

2) estimating quicksort comparisons - This relies on ndistinct
estimates, and I'm not sure how much more reliable we can make those.
Probably not much :-( Not sure what to do about this, the only thing I
can think of is to track "reliability" of the estimates and only do the
reordering if we have high confidence in the estimates. That means we'll
miss some optimization opportunities, but it should limit the risk.

I read up on the history of this thread.
As I see, all the problems mentioned above can be beaten by excluding 
the new cost model at all. We can sort GROUP BY columns according to the 
'ndistinct' value.
I see the reason for introducing the cost model in [1]. The main 
supporting point here is that with this patch, people couldn't optimize 
the query by themselves, organizing the order of the columns in a more 
optimal way. But now we have at least the GUC to switch off the 
behaviour introduced here. Also, some extensions, like the well-known 
pg_hint_plan, can help with automation.
So, how about committing of the undoubted part of the feature and 
working on the cost model in a new thread?


[1] 
https://www.postgresql.org/message-id/6d1e0cdb-dde3-f62a-43e2-e90bbd9b0f42%402ndquadrant.com


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Synchronizing slots from primary to standby

2023-07-23 Thread Bharath Rupireddy
On Fri, Jul 21, 2023 at 5:16 PM shveta malik  wrote:
>
> Thanks Bharat for letting us know. It is okay to split the patch, it
> may definitely help to understand the modules better but shall we take
> a step back and try to reevaluate the design first before moving to
> other tasks?

Agree that design comes first. FWIW, I'm attaching the v9 patch set
that I have with me. It can't be a perfect patch set unless the design
is finalized.

> I analyzed more on the issues stated in [1] for replacing LIST_SLOTS
> with SELECT query. On rethinking, it might not be a good idea to
> replace this cmd with SELECT in Launcher code-path

I think there are open fundamental design aspects, before optimizing
LIST_SLOTS, see below. I'm sure we can come back to this later.

> Secondly, I was thinking if the design proposed in the patch is the
> best one. No doubt, it is the most simplistic design and thus may
> .. Any feedback is appreciated.

Here are my thoughts about this feature:

Current design:

1. On primary, never allow walsenders associated with logical
replication slots to go ahead of physical standbys that are candidates
for future primary after failover. This enables subscribers to connect
to new primary after failover.
2. On all candidate standbys, periodically sync logical slots from
primary (creating the slots if necessary) with one slot sync worker
per logical slot.

Important considerations:

1. Does this design guarantee the row versions required by subscribers
aren't removed on candidate standbys as raised here -
https://www.postgresql.org/message-id/20220218222319.yozkbhren7vkjbi5%40alap3.anarazel.de?

It seems safe with logical decoding on standbys feature. Also, a
test-case from upthread is already in patch sets (in v9 too)
https://www.postgresql.org/message-id/CAAaqYe9FdKODa1a9n%3Dqj%2Bw3NiB9gkwvhRHhcJNginuYYRCnLrg%40mail.gmail.com.
However, we need to verify the use cases extensively.

2. All candidate standbys will start one slot sync worker per logical
slot which might not be scalable. Is having one (or a few more - not
necessarily one for each logical slot) worker for all logical slots
enough?

It seems safe to have one worker for all logical slots - it's not a
problem even if the worker takes a bit of time to get to sync a
logical slot on a candidate standby, because the standby is ensured to
retain all the WAL and row versions required to decode and send to the
logical slots.

3. Indefinite waiting of logical walsenders for candidate standbys may
not be a good idea. Is having a timeout for logical walsenders a good
idea?

A problem with timeout is that it can make logical slots unusable
after failover.

4. All candidate standbys retain WAL required by logical slots. Amount
of WAL retained may be huge if there's a replication lag with logical
replication subscribers.

This turns out to be a typical problem with replication, so there's
nothing much this feature can do to prevent WAL file accumulation
except for asking one to monitor replication lag and WAL file growth.

5. Logical subscribers replication lag will depend on all candidate
standbys replication lag. If candidate standbys are too far from
primary and logical subscribers are too close, still logical
subscribers will have replication lag. There's nothing much this
feature can do to prevent this except for calling it out in
documentation.

6. This feature might need to prevent the GUCs from deviating on
primary and the candidate standbys - there's no point in syncing a
logical slot on candidate standbys if logical walsender related to it
on primary isn't keeping itself behind all the candidate standbys. If
preventing this from happening proves to be tough, calling it out in
documentation to keep GUCs the same is a good start.

7. There are some important review comments provided upthread as far
as this design and patches are concerned -
https://www.postgresql.org/message-id/20220207204557.74mgbhowydjco4mh%40alap3.anarazel.de
and 
https://www.postgresql.org/message-id/20220207203222.22aktwxrt3fcllru%40alap3.anarazel.de.
I'm sure we can come to these once the design is clear.

Please feel free to add the list if I'm missing anything.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 666a73e79d9965779488db1cac6cd2d0a2c73ffb Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 22 Jul 2023 10:17:48 +
Subject: [PATCH v9] Allow logical walsenders to wait for physical standbys

---
 doc/src/sgml/config.sgml  |  42 
 .../replication/logical/reorderbuffer.c   |   9 +
 src/backend/replication/slot.c| 216 +-
 src/backend/utils/misc/guc_tables.c   |  30 +++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/include/replication/slot.h|   4 +
 src/include/utils/guc_hooks.h |   4 +
 src/test/recovery/meson.build |   1 +
 src/tes

Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-07-23 Thread Richard Guo
On Fri, Jul 21, 2023 at 8:51 PM Etsuro Fujita 
wrote:

> I think we should choose the latter, so I modified your patch as
> mentioned, after re-creating it on top of my patch.  Attached is a new
> version (0002-Allow-join-pushdown-even-if-pseudoconstant-quals-v2.patch).
> I am attaching my patch as well
> (0001-Disable-join-pushdown-if-pseudoconstant-quals-v2.patch).
>
> Other changes made to your patch:
>
> * I renamed the new member of the ForeignPath struct to
> fdw_restrictinfo.  (And I named that of the CustomPath struct
> custom_restrictinfo.)


That's much better, and more consistent with other members in
ForeignPath/CustomPath.  Thanks!


> * In your patch, only for create_foreign_join_path(), the API was
> modified so that the caller provides the new member of ForeignPath,
> but I modified that for
> create_foreignscan_path()/create_foreign_upper_path() as well, for
> consistency.


LGTM.


> * In this bit I changed the last argument to NIL, which would be
> nitpicking, though.
>
> @@ -1038,7 +1038,7 @@ postgresGetForeignPaths(PlannerInfo *root,
>   add_path(baserel, (Path *) path);
>
>   /* Add paths with pathkeys */
> - add_paths_with_pathkeys_for_rel(root, baserel, NULL);
> + add_paths_with_pathkeys_for_rel(root, baserel, NULL, NULL);


Good catch!  This was my oversight.


> * I dropped this test case, because it would not be stable if the
> system clock was too slow.


Agreed.  And the test case from 0001 should be sufficient.

So the two patches both look good to me now.

Thanks
Richard


Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-07-23 Thread Amit Kapila
On Fri, Mar 10, 2023 at 3:24 PM Peter Eisentraut
 wrote:
>
> On 08.03.23 21:57, Jeff Davis wrote:
>
> > * It appears rules IS NULL behaves differently from rules=''. Is that
> > desired? For instance:
> >create collation c1(provider=icu,
> >  locale='und-u-ka-shifted-ks-level1',
> >  deterministic=false);
> >create collation c2(provider=icu,
> >  locale='und-u-ka-shifted-ks-level1',
> >  rules='',
> >  deterministic=false);
> >select 'a b' collate c1 = 'ab' collate c1; -- true
> >select 'a b' collate c2 = 'ab' collate c2; -- false
>
> I'm puzzled by this.  The general behavior is, extract the rules of the
> original locale, append the custom rules, use that.  If the custom rules
> are the empty string, that should match using the original rules
> untouched.  Needs further investigation.
>
> > * Can you document the interaction between locale keywords
> > ("@colStrength=primary") and a rule like '[strength 2]'?
>
> I'll look into that.
>

This thread is listed on PostgreSQL 16 Open Items list. This is a
gentle reminder to see if there is a plan to move forward with respect
to open points.

-- 
With Regards,
Amit Kapila.




Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2023-07-23 Thread Richard Guo
On Sat, Jul 22, 2023 at 12:02 AM Tom Lane  wrote:

> Richard Guo  writes:
> > Instead of fixing add_outer_joins_to_relids() to cope with child joins,
> > I'm wondering if we can build join relids for a child join from its
> > parent by adjust_child_relids, something like attached.
>
> That looks like a good solid solution.  Pushed with a bit of
> editorialization --- mostly, that I put the test case into
> partition_join.sql where there's already suitable test tables.


Thanks for pushing it!

Thanks
Richard


Re: Support worker_spi to execute the function dynamically.

2023-07-23 Thread Bharath Rupireddy
On Mon, Jul 24, 2023 at 6:34 AM Masahiro Ikeda  wrote:
>
> OK. If so, we need to remove the following comment in Makefile.
>
> > # enable our module in shared_preload_libraries for dynamic bgworkers

Done.

> I also confirmed that the tap tests work with meson and make.

Thanks for verifying.

I also added a note atop worker_spi.c that the module also
demonstrates how to write core (SQL) tests and extended (TAP) tests.

I'm attaching the v3 patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v3-0001-Add-TAP-tests-to-worker_spi-module.patch
Description: Binary data


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-23 Thread Amit Kapila
On Mon, Jul 24, 2023 at 6:39 AM Masahiko Sawada  wrote:
>
> On Sat, Jul 22, 2023 at 7:32 PM Amit Kapila  wrote:
> >
> >
> > You have moved most of the comments related to the restriction of
> > which index can be picked atop IsIndexUsableForReplicaIdentityFull().
> > Now, the comments related to limitation atop
> > FindUsableIndexForReplicaIdentityFull() look slightly odd as it refers
> > to limitations but those limitation were not stated. The comments I am
> > referring to are: "Note that the limitations of index scans for
> > replica identity full only  might not be a good idea in some
> > cases". Shall we move these as well atop
> > IsIndexUsableForReplicaIdentityFull()?
>
> Good point.
>
> Looking at neighbor comments, the following comment looks slightly odd to me:
>
>  * XXX: See IsIndexUsableForReplicaIdentityFull() to know the challenges in
>  * supporting indexes other than btree and hash. For partial indexes, the
>  * required changes are likely to be larger. If none of the tuples satisfy
>  * the expression for the index scan, we fall-back to sequential execution,
>  * which might not be a good idea in some cases.
>
> Are the first and second sentences related actually?
>

Not really.

> I think we can move it as well to
> IsIndexUsableForReplicaIdentityFull() with some adjustments. I've
> attached the updated patch that incorporated your comment and included
> my idea to update the comment.
>

LGTM.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2023-07-23 Thread Amit Kapila
On Mon, Jul 24, 2023 at 8:03 AM Bharath Rupireddy
 wrote:
>
> On Fri, Jul 21, 2023 at 5:16 PM shveta malik  wrote:
> >
> > Thanks Bharat for letting us know. It is okay to split the patch, it
> > may definitely help to understand the modules better but shall we take
> > a step back and try to reevaluate the design first before moving to
> > other tasks?
>
> Agree that design comes first. FWIW, I'm attaching the v9 patch set
> that I have with me. It can't be a perfect patch set unless the design
> is finalized.
>
> > I analyzed more on the issues stated in [1] for replacing LIST_SLOTS
> > with SELECT query. On rethinking, it might not be a good idea to
> > replace this cmd with SELECT in Launcher code-path
>
> I think there are open fundamental design aspects, before optimizing
> LIST_SLOTS, see below. I'm sure we can come back to this later.
>
> > Secondly, I was thinking if the design proposed in the patch is the
> > best one. No doubt, it is the most simplistic design and thus may
> > .. Any feedback is appreciated.
>
> Here are my thoughts about this feature:
>
> Current design:
>
> 1. On primary, never allow walsenders associated with logical
> replication slots to go ahead of physical standbys that are candidates
> for future primary after failover. This enables subscribers to connect
> to new primary after failover.
> 2. On all candidate standbys, periodically sync logical slots from
> primary (creating the slots if necessary) with one slot sync worker
> per logical slot.
>
> Important considerations:
>
> 1. Does this design guarantee the row versions required by subscribers
> aren't removed on candidate standbys as raised here -
> https://www.postgresql.org/message-id/20220218222319.yozkbhren7vkjbi5%40alap3.anarazel.de?
>
> It seems safe with logical decoding on standbys feature. Also, a
> test-case from upthread is already in patch sets (in v9 too)
> https://www.postgresql.org/message-id/CAAaqYe9FdKODa1a9n%3Dqj%2Bw3NiB9gkwvhRHhcJNginuYYRCnLrg%40mail.gmail.com.
> However, we need to verify the use cases extensively.
>

Agreed.

> 2. All candidate standbys will start one slot sync worker per logical
> slot which might not be scalable.
>

Yeah, that doesn't sound like a good idea but IIRC, the proposed patch
is using one worker per database (for all slots corresponding to a
database).

> Is having one (or a few more - not
> necessarily one for each logical slot) worker for all logical slots
> enough?
>

I guess for a large number of slots the is a possibility of a large
gap in syncing the slots which probably means we need to retain
corresponding WAL for a much longer time on the primary. If we can
prove that the gap won't be large enough to matter then this would be
probably worth considering otherwise, I think we should find a way to
scale the number of workers to avoid the large gap.

-- 
With Regards,
Amit Kapila.




Re: Use COPY for populating all pgbench tables

2023-07-23 Thread Michael Paquier
On Sun, Jul 23, 2023 at 08:21:51PM +0900, Michael Paquier wrote:
> Cool.  I have applied the new tests for now to move on with this
> thread.

I have done a few more things on this patch today, including
measurements with a local host and large scaling numbers.  One of my
hosts was taking for instance around 36.8s with scale=500 using the
INSERTs and 36.5s with the COPY when loading data (average of 4
runs) with -I dtg.

Greg's upthread point is true as well that for high latency the
server-side generation is the most adapted option, but I don't see a
reason to not switch to a COPY as this option is hidden behind -I,
just for the sake of improving the default option set.  So, at the
end, applied.
--
Michael


signature.asc
Description: PGP signature


Re: logical decoding and replication of sequences, take 2

2023-07-23 Thread Amit Kapila
On Thu, Jul 20, 2023 at 8:22 PM Tomas Vondra
 wrote:
>
> OK, I merged the changes into the patches, with some minor changes to
> the wording etc.
>

I think we can do 0001-Make-test_decoding-ddl.out-shorter-20230720
even without the rest of the patches. Isn't it a separate improvement?

I see that origin filtering (origin=none) doesn't work with this
patch. You can see this by using the following statements:
Node-1:
postgres=# create sequence s;
CREATE SEQUENCE
postgres=# create publication mypub for all sequences;
CREATE PUBLICATION

Node-2:
postgres=# create sequence s;
CREATE SEQUENCE
postgres=# create subscription mysub_sub connection '' publication
mypub with (origin=none);
NOTICE:  created replication slot "mysub_sub" on publisher
CREATE SUBSCRIPTION
postgres=# create publication mypub_sub for all sequences;
CREATE PUBLICATION

Node-1:
create subscription mysub_pub connection '...' publication mypub_sub
with (origin=none);
NOTICE:  created replication slot "mysub_pub" on publisher
CREATE SUBSCRIPTION

SELECT nextval('s') FROM generate_series(1,100);

After that, you can check on the subscriber that sequences values are
overridden with older values:
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
 67 |   0 | t
(1 row)
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
100 |   0 | t
(1 row)
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
133 |   0 | t
(1 row)
postgres=# select * from s;
 last_value | log_cnt | is_called
+-+---
 67 |   0 | t
(1 row)

I haven't verified all the details but I think that is because we
don't set XLOG_INCLUDE_ORIGIN while logging sequence values.

-- 
With Regards,
Amit Kapila.




Re: multiple membership grants and information_schema.applicable_roles

2023-07-23 Thread Pavel Luzanov

On 23.07.2023 23:03, Tom Lane wrote:

CREATE RECURSIVE VIEW APPLICABLE_ROLES ( GRANTEE, ROLE_NAME, IS_GRANTABLE ) AS
 ( ( SELECT GRANTEE, ROLE_NAME, IS_GRANTABLE
 FROM DEFINITION_SCHEMA.ROLE_AUTHORIZATION_DESCRIPTORS
 WHERE ( GRANTEE IN
 ( CURRENT_USER, 'PUBLIC' )
  OR
 GRANTEE IN
 ( SELECT ROLE_NAME
   FROM ENABLED_ROLES ) ) )
   UNION
   ( SELECT RAD.GRANTEE, RAD.ROLE_NAME, RAD.IS_GRANTABLE
 FROM DEFINITION_SCHEMA.ROLE_AUTHORIZATION_DESCRIPTORS RAD
   JOIN
  APPLICABLE_ROLES R
 ON
RAD.GRANTEE = R.ROLE_NAME ) );

The UNION would remove rows only when they are duplicates across all
three columns.


Hm, I think there is one more thing to check in the SQL standard.
Is IS_GRANTABLE a key column for ROLE_AUTHORIZATION_DESCRIPTORS?
If not, duplicates is not possible. Right?

Can't check now, since I don't have access to the SQL standard definition.


I do see what seems like a different issue: the standard appears to expect
that indirect role grants should also be shown (via the recursive CTE),
and we are not doing that.


I noticed this, but the view stays unchanged so long time.
I thought it was done intentionally.

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com