date:20210809

Re: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS

2021-08-09 Thread Michael Paquier

On Mon, Aug 09, 2021 at 11:07:14AM -0400, Robert Haas wrote:
> On Thu, Aug 5, 2021 at 8:02 PM Tom Lane  wrote:
>> I think doing nothing is fine.  Given the lack of complaints, we're
>> more likely to break something than fix anything useful.
> 
> +1.

FWIW, the only interesting case I have in my plugin box for a
background worker that does not attach to shared memory is a template
of worker able to catch signals, to be used as a base for simple
actions.  So that's not really interesting.  Making the SHMEM flag be
something mandatory on HEAD while doing nothing in the back-branches
sounds good to me, so +1.
--
Michael


signature.asc
Description: PGP signature

Re: Skipping logical replication transactions on subscriber side

2021-08-09 Thread Amit Kapila

On Tue, Aug 10, 2021 at 10:37 AM Masahiko Sawada  wrote:
>
> I've attached the latest patches that incorporated all comments I got
> so far. Please review them.
>

I am not able to apply the latest patch
(v6-0001-Add-errcontext-to-errors-happening-during-applyin) on HEAD,
getting the below error:
patching file src/backend/replication/logical/worker.c
Hunk #11 succeeded at 1195 (offset 50 lines).
Hunk #12 succeeded at 1253 (offset 50 lines).
Hunk #13 succeeded at 1277 (offset 50 lines).
Hunk #14 succeeded at 1305 (offset 50 lines).
Hunk #15 succeeded at 1330 (offset 50 lines).
Hunk #16 succeeded at 1362 (offset 50 lines).
Hunk #17 succeeded at 1508 (offset 50 lines).
Hunk #18 succeeded at 1524 (offset 50 lines).
Hunk #19 succeeded at 1645 (offset 50 lines).
Hunk #20 succeeded at 1671 (offset 50 lines).
Hunk #21 succeeded at 1772 (offset 50 lines).
Hunk #22 succeeded at 1828 (offset 50 lines).
Hunk #23 succeeded at 1934 (offset 50 lines).
Hunk #24 succeeded at 1962 (offset 50 lines).
Hunk #25 succeeded at 2399 (offset 50 lines).
Hunk #26 FAILED at 2405.
Hunk #27 succeeded at 3730 (offset 54 lines).
1 out of 27 hunks FAILED -- saving rejects to file
src/backend/replication/logical/worker.c.rej


-- 
With Regards,
Amit Kapila.

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Paquier

On Mon, Aug 09, 2021 at 10:50:29PM +0200, Michael Meskes wrote:
>> On the substance of the issue, one question that I have reading over
>> this thread is whether there's really a bug here at all. It is being
>> represented (and I have not checked whether this is accurate) that
>> the
>> patch affects the behavior of  DECLARE, PREPARE, and EXECUTE, but not
>> DESCRIBE, DEALLOCATE, DECLARE CURSOR .. FOR, or CREATE TABLE AS
>> EXECUTE. However, it also seems that it's not entirely clear what the
>> behavior ought to be in those cases, except perhaps for DESCRIBE. If
>> that's the case, maybe this doesn't really need to be an open item,
>> and maybe we don't therefore need to talk about whether it should be
>> reverted. Maybe it's just a feature that supports certain things now
>> and in the future, after due reflection, perhaps it will support
>> more.
> 
> The way I see it we should commit the patch that makes more statements
> honor the new DECLARE STATEMENT feature. I don't think we can change
> anything for the worse by doing that. And other databases are not
> really strict about this either.

Okay.  So you mean to change DESCRIBE and DEALLOCATE to be able to
handle cached connection names, as of [1]?  For [DE]ALLOCATE, I agree
that it could be a good thing.  declare.pgc seems to rely on that
already but the tests are incorrect as I mentioned in [2].  For
DESCRIBE, that provides data about a result set, I find the assignment
of a connection a bit strange, and even if this would allow the use of
the same statement name for multiple connections, it seems to me that 
there is a risk of breaking existing applications.  There should not
be that many, so perhaps that's fine anyway.

[1]: 
https://www.postgresql.org/message-id/tyapr01mb5866973462d17f2aebd8ebb8f5...@tyapr01mb5866.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/yoqncyfxp868z...@paquier.xyz
--
Michael

signature.asc
Description: PGP signature

Re: Why does the owner of a publication need CREATE privileges on the database?

2021-08-09 Thread Amit Kapila

On Tue, Jul 27, 2021 at 11:29 PM Mark Dilger
 wrote:
>
> The documentation for ALTER PUBLICATION ... OWNER TO ... claims the new owner 
> must have CREATE privilege on the database, though superuser can change the 
> ownership in spite of this restriction.  No explanation is given for this 
> requirement.
>

I am not aware of the original thought process behind this but current
behavior seems reasonable because if users need to have CREATE
privilege on the database while Create Publication, the same should be
true while we change the owner to a new owner. Basically, at any point
in time, the owner of the publication should have CREATE privilege on
the database which contains the publication.

-- 
With Regards,
Amit Kapila.

Re: Added schema level support for publication.

2021-08-09 Thread Amit Kapila

On Mon, Aug 9, 2021 at 9:50 PM Mark Dilger  wrote:
>
> > On Aug 6, 2021, at 1:32 AM, vignesh C  wrote:
> >
> > the attached v19 patch
>
> With v19 applied, a schema owner can publish the contents of a table 
> regardless of ownership or permissions on that table:
>
...
...
>
> It is a bit counterintuitive that schema owners do not have administrative 
> privileges over tables within their schemas, but that's how it is.  The 
> design of this patch seems to assume otherwise.  Perhaps ALTER PUBLICATION 
> ... ADD SCHEMA should be restricted to superusers, just as FOR ALL TABLES?
>

+1. Your suggestion sounds reasonable to me.

> Alternatively, you could add ownership checks per table to mirror the 
> behavior of ALTER PUBLICATION ... ADD TABLE, but that would foreclose the 
> option of automatically updating the list of tables in the publication as new 
> tables are added to the schema, since those new tables would not necessarily 
> belong to the schema owner, and having a error thrown during CREATE TABLE 
> would be quite unfriendly.  I think until this is hammered out, it is safer 
> to require superuser privileges and then we can revisit this issue and loosen 
> the requirement in a subsequent commit.
>

I think the same argument can be made for "FOR ALL TABLES .." as well.
So, let's leave such a requirement for another patch.


-- 
With Regards,
Amit Kapila.

Re: [BUG]Update Toast data failure in logical replication

2021-08-09 Thread Amit Kapila

On Fri, Jul 30, 2021 at 10:21 AM Amit Kapila  wrote:
>
> This problem seems to be from the time Logical
> Replication has been introduced, so adding others (who are generally
> involved in this area) to see what they think about this bug? I think
> people might not be using toasted columns for Replica Identity due to
> which this problem has been reported yet but I feel this is quite a
> fundamental issue and we should do something about this.
>
> Let me summarize the problem for the ease of others.
>
> The logical replica can go out of sync for UPDATES when there is a
> toast column as part of REPLICA IDENTITY. In such cases, updates are
> not replicated if the key column doesn't change because we don't log
> the actual key value for the unchanged toast key. It is neither logged
> as part of old_key_tuple nor for new tuple due to which we are not
> able to find the tuple to be updated on the subscriber-side and the
> update is ignored on the subscriber-side. We log this in DEBUG1 mode
> but I don't think the user can do anything about this and the replica
> will go out-of-sync. This works when the replica identity column value
> is not toasted because then it will be part of the new tuple and we
> use that to fetch the tuple on the subscriber.
>

It seems to me this problem exists from the time we introduced
wal_level = logical in the commit e55704d8b2 [1], or another
possibility is that logical replication commit didn't consider
something to make it work. Andres, Robert, Petr, can you guys please
comment because otherwise, we might miss something here.

[1] -
commit e55704d8b2fe522fbc9435acbb5bc59033478bd5
Author: Robert Haas 
Date:   Tue Dec 10 18:33:45 2013 -0500

Add new wal_level, logical, sufficient for logical decoding.

When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983ef79be4a84fcd0e0ca3b5fcb85dd65.  This makes it possible a
properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns, or,
with REPLICA IDENTITY FULL, even if the table has no key at all. Note
that updates which do not modify the replica identity column won't log
anything extra, making the choice of a good key (i.e. one that will
rarely be changed) important to performance when wal_level=logical is
configured.
..
Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.

-- 
With Regards,
Amit Kapila.

RE: Skipping logical replication transactions on subscriber side

2021-08-09 Thread osumi.takami...@fujitsu.com

On Wednesday, August 4, 2021 8:46 PM Masahiko Sawada  
wrote:
> I'll incorporate those comments in the next version patch.
Hi, when are you going to make and share the updated v6 ?


Best Regards,
Takamichi Osumi

Re: alter table set TABLE ACCESS METHOD

2021-08-09 Thread Michael Paquier

On Tue, Aug 10, 2021 at 12:24:13PM +0900, Michael Paquier wrote:
> So, on a closer look, it happens that this breaks the regression tests
> of sepgsql, as the two following commands in ddl.sql cause a rewrite:
> ALTER TABLE regtest_table_4 ALTER COLUMN y TYPE float;
> ALTER TABLE regtest_ptable_4 ALTER COLUMN y TYPE float;

rhinoceros has reported back, and these are the only two that required
an adjustment, so fixed.
--
Michael


signature.asc
Description: PGP signature

Re: Estimating HugePages Requirements?

2021-08-09 Thread Andres Freund

Hi,

On 2021-08-09 22:57:18 +, Bossart, Nathan wrote:

> @@ -1026,6 +1031,18 @@ PostmasterMain(int argc, char *argv[])
>*/
>   InitializeMaxBackends();
>  
> + if (output_shmem)
> + {
> + char output[64];
> + Size size;
> +
> + size = CreateSharedMemoryAndSemaphores(true);
> + sprintf(output, "%zu", size);
> +
> + puts(output);
> + ExitPostmaster(0);
> + }

I don't like putting this into PostmasterMain(). Either BootstrapMain()
(specifically checker mode) or GucInfoMain() seem like better places.


> -void
> -CreateSharedMemoryAndSemaphores(void)
> +Size
> +CreateSharedMemoryAndSemaphores(bool size_only)
>  {
>   PGShmemHeader *shim = NULL;
>  
> @@ -161,6 +161,9 @@ CreateSharedMemoryAndSemaphores(void)
>   /* might as well round it off to a multiple of a typical page 
> size */
>   size = add_size(size, 8192 - (size % 8192));
>  
> + if (size_only)
> + return size;
> +
>   elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
>  
>   /*
> @@ -288,4 +291,6 @@ CreateSharedMemoryAndSemaphores(void)
>*/
>   if (shmem_startup_hook)
>   shmem_startup_hook();
> +
> + return 0;
>  }

That seems like an ugly API to me. Why don't we split the size
determination and shmem creation functions into two?


Greetings,

Andres Freund

Re: Estimating HugePages Requirements?

2021-08-09 Thread Andres Freund

Hi,

On 2021-08-09 18:58:53 -0500, Justin Pryzby wrote:
> Define shared_buffers as the exact size to be allocated/requested from the OS
> (regardless of whether they're huge pages or not), and have postgres compute
> everything else based on that.  So shared_buffers=2GB would end up being 
> 1950MB
> (or so) of buffer cache.  We'd have to check that after the other allocations,
> there's still at least 128kB left for the buffer cache.  Maybe we'd have to
> bump the minimum value of shared_buffers.

I don't like that. How much "other" shared memory we're going to need is
very hard to predict and depends on extensions, configuration options
like max_locks_per_transaction, max_connections to a significant
degree. This way the user ends up needing to guess at least as much as
before to get to a sensible shared_buffers.

Greetings,

Andres Freund

Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

2021-08-09 Thread Amit Kapila

On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand  wrote:
>
> > BTW, I see this as an Open Item for PG-14 [1] which seems wrong to me
> > as this is a bug from previous versions. I am not sure who added it
>
> Me neither.
>
> > but do you see any reason for this to consider as an open item for
> > PG-14?
>
> No, I don't see any reasons as this is a bug from previous versions.
>

Thanks for the confirmation. I have moved this to the section: "Older
bugs affecting stable branches".

-- 
With Regards,
Amit Kapila.

Re: Small documentation improvement for ALTER SUBSCRIPTION

2021-08-09 Thread Amit Kapila

On Tue, Aug 10, 2021 at 6:31 AM Masahiko Sawada  wrote:
>
> On Mon, Aug 9, 2021 at 1:01 PM Peter Smith  wrote:
> >
> > On Mon, Aug 9, 2021 at 12:46 PM Amit Kapila  wrote:
>
> But "REFRESH PUBLICATION refresh_option" seems wrong in terms of SQL
> syntax, not?
>
> Given there could be multiple options how about using
> "refresh_options"? That is, the sentence
> will be:
>
> Additionally, refresh_options as described
> under REFRESH PUBLICATION may be specified,
> except in the case of DROP PUBLICATION.
>

Normally (at least on this doc page), we use this tag for some defined
option, syntax and as refresh_options is none of them, it would look a
bit awkward.

-- 
With Regards,
Amit Kapila.

Re: alter table set TABLE ACCESS METHOD

2021-08-09 Thread Michael Paquier

On Sat, Aug 07, 2021 at 07:18:19PM +0900, Michael Paquier wrote:
> As a matter of curiosity, I have checked how it would look to handle
> the no-op case for the sub-commands other than SET TABLESPACE, and one
> would need something like the attached, with a new flag for
> AlteredTableInfo.  That does not really look good, but it triggers
> properly the object access hook when SET LOGGED/UNLOGGED/ACCESS METHOD
> are no-ops, so let's just handle the case using the version from
> upthread.  I'll do that at the beginning of next week.

So, on a closer look, it happens that this breaks the regression tests
of sepgsql, as the two following commands in ddl.sql cause a rewrite:
ALTER TABLE regtest_table_4 ALTER COLUMN y TYPE float;
ALTER TABLE regtest_ptable_4 ALTER COLUMN y TYPE float;

I have been fighting with SE/Linux for a couple of hours to try to
figure out how to run our regression tests, but gave up after running
into various segmentation faults even after following all the steps of
the documentation to set up things.  Perhaps that's because I just set
up the environment with a recent Debian, I don't know.  Instead of
running those tests, I have fallen back to my own module and ran all
the SQLs of sepgsql to find out places where there are rewrites where
I spotted those two places.

One thing I have noticed is that sepgsql-regtest.te fails to compile
because /usr/share/selinux/devel/Makefile does not understand
auth_read_passwd().  Looking at some of the SE/Linux repos, perhaps
this ought to be auth_read_shadow() instead to be able to work with a
newer version?

Anyway, as the addition of this InvokeObjectPostAlterHook() is
consistent for a rewrite caused by SET LOGGED/UNLOGGED/ACCESS METHOD I
have applied the patch.  I'll fix rhinoceros once it reports back the
diffs in output.
--
Michael

signature.asc
Description: PGP signature

Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-09 Thread Amit Kapila

On Tue, Aug 10, 2021 at 8:05 AM Masahiko Sawada  wrote:
>
> On Sat, Aug 7, 2021 at 2:36 PM Amit Kapila  wrote:
> >
> > On Fri, Aug 6, 2021 at 9:57 PM Japin Li  wrote:
> > >
> > > >
> > > > Hmm yes, it cannot cover all cases. I had somehow misunderstood that
> > > > the subscriber knows which relations are associated with which
> > > > publications. Given that the subscriber doesn’t know which relations
> > > > are associated with which publications (and the set of subscribed
> > > > relations on the subscriber becomes out of date once the publication
> > > > is updated), I think it's impossible to achieve that DROP PUBLICATION
> > > > drops relations that are associated with only the publication being
> > > > dropped.
> > > >
> > > >> Do we have better ideas to fix or shall we just
> > > >> say that we will refresh based on whatever current set of relations
> > > >> are present in publication to be dropped?
> > > >
> > > > I don't have better ideas. It seems to me that it's prudent that we
> > > > accept the approach in v3 patch and refresh all publications in DROP
> > > > PUBLICATION cases.
> > > >
> > >
> > > Maybe refreshing all publications in PUBLICATION cases is simple way to
> > > fix the problem.
> > >
> >
> > Actually, doing it both will make the behavior consistent but doing it
> > just for Drop might be preferable by some users.
>
> I agree that ADD/DROP PUBLICATION is convenient to just add/drop
> publications instead of providing the complete publication list. But
> I'm concerned that during developing this feature most people didn't
> like the behavior of refreshing new and existing publications. Also,
> there was pointing out that there will be an overhead of a SQL with
> more publications when AlterSubscription_refresh() is called with all
> the existing publications.
>

True, but I think at that time we didn't know this design problem with
'drop publication'.

> If we feel this behavior is unnatural, the
> users will feel the same. I personally think there is a benefit only
> in terms of syntax.
>

Right, and I think in this particular case, the new syntax is much
easier to use for users.

> And we can work on the part of refreshing only
> publications being added/dropped on v15 development.
>

Yeah, unless we change design drastically we might not be able to do a
refresh for dropped publications, for add it is possible. It seems
most of the people responded on this thread that we can be consistent
in terms of refreshing for add/drop at this stage but we can still go
with another option where we can refresh only newly added publications
for add but for drop refresh all publications.

Peter E., do you have any opinion on this matter?

-- 
With Regards,
Amit Kapila.

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-09 Thread Andres Freund

Hi,

On 2021-08-09 16:02:33 -0400, Alvaro Herrera wrote:
> On 2021-Jul-27, Andres Freund wrote:
> 
> > Isn't this going to create a *lot* of redundant sampling?  Especially if you
> > have any sort of nested partition tree. In the most absurd case a partition
> > with n parents will get sampled n times, solely due to changes to itself.
> 
> It seems to me that you're barking up the wrong tree on this point.
> This problem you describe is not something that was caused by this
> patch; ANALYZE has always worked like this.  We have discussed the idea
> of avoiding redundant sampling, but it's clear that it is not a simple
> problem, and solving it was not in scope for this patch.

I don't agree. There's a difference between this happening after a manual
ANALYZE on partition roots, and this continuously happening in production
workloads due to auto-analyzes...

Greetings,

Andres Freund

Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-09 Thread Masahiko Sawada

On Sat, Aug 7, 2021 at 2:36 PM Amit Kapila  wrote:
>
> On Fri, Aug 6, 2021 at 9:57 PM Japin Li  wrote:
> >
> > >
> > > Hmm yes, it cannot cover all cases. I had somehow misunderstood that
> > > the subscriber knows which relations are associated with which
> > > publications. Given that the subscriber doesn’t know which relations
> > > are associated with which publications (and the set of subscribed
> > > relations on the subscriber becomes out of date once the publication
> > > is updated), I think it's impossible to achieve that DROP PUBLICATION
> > > drops relations that are associated with only the publication being
> > > dropped.
> > >
> > >> Do we have better ideas to fix or shall we just
> > >> say that we will refresh based on whatever current set of relations
> > >> are present in publication to be dropped?
> > >
> > > I don't have better ideas. It seems to me that it's prudent that we
> > > accept the approach in v3 patch and refresh all publications in DROP
> > > PUBLICATION cases.
> > >
> >
> > Maybe refreshing all publications in PUBLICATION cases is simple way to
> > fix the problem.
> >
>
> Actually, doing it both will make the behavior consistent but doing it
> just for Drop might be preferable by some users.

I agree that ADD/DROP PUBLICATION is convenient to just add/drop
publications instead of providing the complete publication list. But
I'm concerned that during developing this feature most people didn't
like the behavior of refreshing new and existing publications. Also,
there was pointing out that there will be an overhead of a SQL with
more publications when AlterSubscription_refresh() is called with all
the existing publications. If we feel this behavior is unnatural, the
users will feel the same. I personally think there is a benefit only
in terms of syntax. And we can work on the part of refreshing only
publications being added/dropped on v15 development. But it might be
better to consider also if there will be use cases for users to
add/remove publications in the subscription even with such downsides.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Tom Lane

Mark Dilger  writes:
> I ran a lot of tests with the patch, and the asserts have all cleared up, but 
> I don't know how to think about the user facing differences.  If we had a 
> good reason for raising an error for these back-references, maybe that'd be 
> fine, but it seems to just be an implementation detail.

I thought about this some more, and I'm coming around to the idea that
throwing an error is the wrong thing.  As a contrary example, consider

(.)|(\1\1)

We don't throw an error for this, and neither does Perl, even though
the capturing parens can never be defined in the branch where the
backrefs are.  So it seems hard to argue that this is okay but the
other thing isn't.  Another interesting example is

(.){0}(\1){0}

I think that the correct interpretation is that this is a valid
regexp matching an empty string (i.e., zero repetitions of each
part), even though neither capture group will be defined.
That's different from

(.){0}(\1)

which can never match.

So I took another look at the code, and it doesn't seem that hard
to make it act this way.  The attached passes regression, but
I've not beat on it with random strings.

regards, tom lane

diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index ae3a7b6a38..d9840171a3 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -1089,11 +1089,23 @@ parseqatom(struct vars *v,
 	/* annoying special case:  {0} or {0,0} cancels everything */
 	if (m == 0 && n == 0)
 	{
-		if (atom != NULL)
-			freesubre(v, atom);
-		if (atomtype == '(')
-			v->subs[subno] = NULL;
-		delsub(v->nfa, lp, rp);
+		/*
+		 * If we had capturing subexpression(s) within the atom, we don't want
+		 * to destroy them, because it's legal (if useless) to back-ref them
+		 * later.  Hence, just unlink the atom from lp/rp and then ignore it.
+		 */
+		if (atom != NULL && (atom->flags & CAP))
+		{
+			delsub(v->nfa, lp, atom->begin);
+			delsub(v->nfa, atom->end, rp);
+		}
+		else
+		{
+			/* Otherwise, we can clean up any subre infrastructure we made */
+			if (atom != NULL)
+freesubre(v, atom);
+			delsub(v->nfa, lp, rp);
+		}
 		EMPTYARC(lp, rp);
 		return top;
 	}
diff --git a/src/test/modules/test_regex/expected/test_regex.out b/src/test/modules/test_regex/expected/test_regex.out
index 5a6cdf47c2..96c0e6fa59 100644
--- a/src/test/modules/test_regex/expected/test_regex.out
+++ b/src/test/modules/test_regex/expected/test_regex.out
@@ -3506,6 +3506,21 @@ select * from test_regex('((.))(\2)', 'xyy', 'oRP');
  {yy,NULL,NULL,NULL}
 (2 rows)
 
+-- expectMatch	21.39 NPQR	{((.)){0}(\2){0}}	xyz	{}	{}	{}	{}
+select * from test_regex('((.)){0}(\2){0}', 'xyz', 'NPQR');
+ test_regex 
+
+ {3,REG_UBACKREF,REG_UBOUNDS,REG_UNONPOSIX,REG_UEMPTYMATCH}
+ {"",NULL,NULL,NULL}
+(2 rows)
+
+-- expectNomatch	21.40 PQR	{((.)){0}(\2)}	xxx
+select * from test_regex('((.)){0}(\2)', 'xxx', 'PQR');
+ test_regex 
+
+ {3,REG_UBACKREF,REG_UBOUNDS,REG_UNONPOSIX}
+(1 row)
+
 -- doing 22 "multicharacter collating elements"
 -- # again ugh
 -- MCCEs are not implemented in Postgres, so we skip all these tests
diff --git a/src/test/modules/test_regex/sql/test_regex.sql b/src/test/modules/test_regex/sql/test_regex.sql
index 3419564203..29806e9417 100644
--- a/src/test/modules/test_regex/sql/test_regex.sql
+++ b/src/test/modules/test_regex/sql/test_regex.sql
@@ -1020,6 +1020,10 @@ select * from test_regex('((.))(\2){0}', 'xy', 'RPQ');
 select * from test_regex('((.))(\2)', 'xyy', 'RP');
 -- expectMatch	21.38 oRP	((.))(\2)	xyy	yy	{}	{}	{}
 select * from test_regex('((.))(\2)', 'xyy', 'oRP');
+-- expectMatch	21.39 NPQR	{((.)){0}(\2){0}}	xyz	{}	{}	{}	{}
+select * from test_regex('((.)){0}(\2){0}', 'xyz', 'NPQR');
+-- expectNomatch	21.40 PQR	{((.)){0}(\2)}	xxx
+select * from test_regex('((.)){0}(\2)', 'xxx', 'PQR');
 
 -- doing 22 "multicharacter collating elements"
 -- # again ugh

Re: Small documentation improvement for ALTER SUBSCRIPTION

2021-08-09 Thread Peter Smith

On Tue, Aug 10, 2021 at 11:01 AM Masahiko Sawada  wrote:
>
> On Mon, Aug 9, 2021 at 1:01 PM Peter Smith  wrote:
> >
> > On Mon, Aug 9, 2021 at 12:46 PM Amit Kapila  wrote:
> > >
> > > On Sun, Aug 8, 2021 at 10:21 AM Peter Smith  wrote:
> > > >
> > > > On Sat, Aug 7, 2021 at 4:33 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Thu, Jul 8, 2021 at 6:31 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > When reading the doc of ALTER SUBSCRIPTION I realized that 'refresh
> > > > > > options' in the following paragraph is not tagged:
> > > > > >
> > > > > > ---
> > > > > > Additionally, refresh options as described under REFRESH PUBLICATION
> > > > > > may be specified, except in the case of DROP PUBLICATION.
> > > > > > ---
> > > > > >
> > > > > > When I read it for the first time, I got confused because we 
> > > > > > actually
> > > > > > have the 'refresh' option and this description in the paragraph of 
> > > > > > the
> > > > > > 'refresh' option. I think we can improve it by changing to
> > > > > > 'refresh_option'. Thoughts?
> > > > > >
> > > > >
> > > > > I see that one can get confused but how about changing it to
> > > > > "Additionally, refresh options as described under REFRESH
> > > > > PUBLICATION (refresh_option) may
> > > > > be specified,.."? I think keeping "refresh options" in the text would
> > > > > be good because there could be multiple such options.
> > > > >
> > > >
> > > > I feel like it would be better to reword it in some way that avoids
> > > > using parentheses because they look like part of the syntax instead of
> > > > just part of the sentence.
> > > >
> > >
> > > Fair enough, feel free to propose if you find something better or if
> > > you think the current text in the docs is good.
> > >
> >
>
> Thank you for the comments!
>
> > IMO just the same as your suggestion but without the parens would be good. 
> > e.g.
> >
> > "Additionally, refresh options as described under REFRESH
> > PUBLICATION refresh_option may be
> > specified,.."
>
> But "REFRESH PUBLICATION refresh_option" seems wrong in terms of SQL
> syntax, not?
>

Because the sentence says "... as described under ..." I thought it
was clear enough it was referring to the documentation below and not
the SQL syntax.

> Given there could be multiple options how about using
> "refresh_options"? That is, the sentence
> will be:
>
> Additionally, refresh_options as described
> under REFRESH PUBLICATION may be specified,
> except in the case of DROP PUBLICATION.
>

+1  LGTM

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Mark Dilger

> On Aug 9, 2021, at 4:31 PM, Tom Lane  wrote:
> 
> This patch should work OK in HEAD and v14, but it will need
> a bit of fooling-about for older branches I think, given that
> they fill v->subs[] a little differently.

Note that I tested your patch *before* master, so the changes look backwards.

I tested this fix and it seems good to me.  Some patterns that used to work (by 
chance?) now raise an error, such as:

 select 'bpgouiwcquu' ~ '[e])*?)){0,0}?(\3))';
-ERROR:  invalid regular expression: invalid backreference number
+ ?column? 
+--
+ f
+(1 row)

I ran a lot of tests with the patch, and the asserts have all cleared up, but I 
don't know how to think about the user facing differences.  If we had a good 
reason for raising an error for these back-references, maybe that'd be fine, 
but it seems to just be an implementation detail.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Mark Dilger

> On Aug 9, 2021, at 6:17 PM, Mark Dilger  wrote:
> 
> Well, this doesn't die either:

Meaning it doesn't die in the part of the pattern qualified by {0} either.  It 
does die in the other part.  Sorry again for the confusion.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: replay of CREATE TABLESPACE eats data at wal_level=minimal

2021-08-09 Thread Noah Misch

On Mon, Aug 09, 2021 at 01:08:42PM -0400, Robert Haas wrote:
> To reproduce, initialize a cluster with wal_level=minimal and
> max_wal_senders=0. Then from psql:
> 
> \! mkdir /tmp/goose
> 
> CHECKPOINT;
> CREATE TABLESPACE goose LOCATION '/tmp/goose';
> SET wal_skip_threshold=0;
> BEGIN;
> CREATE TABLE wild (a int, b text) TABLESPACE goose;
> INSERT INTO wild VALUES (1, 'chase');
> COMMIT;
> SELECT * FROM wild;
> 
> As expected, you will see one row in table 'wild'. Now perform an
> immediate shutdown. Restart the server. Table 'wild' is now empty.

Thanks for finding the problem.  It's a bad problem.

> The problem appears to be that tblspc_redo() calls
> create_tablespace_directories(), which says:
> 
> /*
>  * Our theory for replaying a CREATE is to forcibly drop the target
>  * subdirectory if present, and then recreate it. This may be more
>  * work than needed, but it is simple to implement.
>  */
> 
> Unfortunately, this theory (which dates to
> c86f467d18aa58e18fd85b560b46d8de014e6017, vintage 2010, by Bruce) is
> correct only with wal_level>minimal. At wal_level='minimal', we can
> replay the record to recreate the relfilenode, but not the records
> that would have created the contents. However, note that if the table
> is smaller than wal_skip_threshold, then we'll log full-page images of
> the contents at commit time even at wal_level='minimal' after which we
> have no problem. As far as I can see, this bug has "always" existed,
> but before c6b92041d38512a4176ed76ad06f713d2e6c01a8 (2020, Noah) you
> would have needed a different test case. Specifically, you would have
> needed to use COPY to put the row in the table, and you would have
> needed to omit setting wal_skip_threshold since it didn't exist yet.

Agreed.

> I don't presently have a specific idea about how to fix this.

Can't recovery just not delete the directory, create it if doesn't exist, and
be happy if it does exist?  Like the attached WIP.  If we think it's possible
for a crash during mkdir to leave a directory having the wrong permissions,
adding a chmod would be in order.
diff --git a/src/backend/commands/tablespace.c 
b/src/backend/commands/tablespace.c
index a54239a..2928ecf 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -589,7 +589,6 @@ create_tablespace_directories(const char *location, const 
Oid tablespaceoid)
 {
char   *linkloc;
char   *location_with_version_dir;
-   struct stat st;
 
linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
location_with_version_dir = psprintf("%s/%s", location,
@@ -614,39 +613,22 @@ create_tablespace_directories(const char *location, const 
Oid tablespaceoid)
location)));
}
 
-   if (InRecovery)
-   {
-   /*
-* Our theory for replaying a CREATE is to forcibly drop the 
target
-* subdirectory if present, and then recreate it. This may be 
more
-* work than needed, but it is simple to implement.
-*/
-   if (stat(location_with_version_dir, &st) == 0 && 
S_ISDIR(st.st_mode))
-   {
-   if (!rmtree(location_with_version_dir, true))
-   /* If this failed, MakePGDirectory() below is 
going to error. */
-   ereport(WARNING,
-   (errmsg("some useless files may 
be left behind in old database directory \"%s\"",
-   
location_with_version_dir)));
-   }
-   }
-
/*
 * The creation of the version directory prevents more than one 
tablespace
 * in a single location.
 */
if (MakePGDirectory(location_with_version_dir) < 0)
{
-   if (errno == EEXIST)
-   ereport(ERROR,
-   (errcode(ERRCODE_OBJECT_IN_USE),
-errmsg("directory \"%s\" already in 
use as a tablespace",
-   
location_with_version_dir)));
-   else
+   if (errno != EEXIST)
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not create directory 
\"%s\": %m",

location_with_version_dir)));
+   else if (!InRecovery)
+   ereport(ERROR,
+   (errcode(ERRCODE_OBJECT_IN_USE),
+errmsg("directory \"%s\" already in 
use as a tablespace",
+   
location_with_version_dir)));
}
 
/*

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Mark Dilger

> On Aug 9, 2021, at 6:11 PM, Tom Lane  wrote:
> 
> Hm.  I'm not sure that this example proves anything about Perl's handling
> of the situation, since you didn't use a backref.

Well, this doesn't die either:

if ('foo' =~ m/((??{ die; })(.)(??{ die $1; })){0}((??{ die "got here"; })|\2)/)
{
print "matched\n";
}

The point is that the regex engine never walks the part of the pattern that {0} 
qualifies.  I thought it was more clear in the prior example, because that 
example proves that the engine does get as far as capturing.  This example also 
does that, and with a backref, because it dies with "got here".

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Tom Lane

Mark Dilger  writes:
>> On Aug 9, 2021, at 4:31 PM, Tom Lane  wrote:
>> There is a potentially interesting definitional question:
>> what exactly ought this regexp do?
>>  ((.)){0}\2
>> Because the capturing paren sets are zero-quantified, they will
>> never be matched to any characters, so the backref can never
>> have any defined referent.

> Perl regular expressions are not POSIX, but if there is a principled reason 
> POSIX should differ from perl on this, we should be clear what that is:

> if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
> {
> print "captured 1 $1\n" if defined $1;
> print "captured 2 $2\n" if defined $2;
> print "captured 3 $3\n" if defined $3;
> print "captured 4 $4\n" if defined $4;
> print "match = $match\n" if defined $match;
> }

Hm.  I'm not sure that this example proves anything about Perl's handling
of the situation, since you didn't use a backref.  I tried both

if ('foo' =~ m/((.)){0}\1/)

if ('foo' =~ m/((.)){0}\2/)

and while neither throws an error, they don't succeed either.
So AFAICS Perl is acting in the way I'm attributing to POSIX.
But maybe we should actually read POSIX ...

>> ... I guess Spencer did think about this to some extent -- he
>> just forgot about the possibility of nested parens.

> Ugg.  That means our code throws an error where perl does not, pretty
> well negating my point above.  If we're already throwing an error for
> this type of thing, I agree we should be consistent about it.  My
> personal preference would have been to do the same thing as perl, but it
> seems that ship has already sailed.

Removing an error case is usually an easier sell than adding one.
However, the fact that the simplest case (viz, '(.){0}\1') has always
thrown an error and nobody has complained in twenty-ish years suggests
that nobody much cares.

regards, tom lane

Re: Small documentation improvement for ALTER SUBSCRIPTION

2021-08-09 Thread Masahiko Sawada

On Mon, Aug 9, 2021 at 1:01 PM Peter Smith  wrote:
>
> On Mon, Aug 9, 2021 at 12:46 PM Amit Kapila  wrote:
> >
> > On Sun, Aug 8, 2021 at 10:21 AM Peter Smith  wrote:
> > >
> > > On Sat, Aug 7, 2021 at 4:33 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Thu, Jul 8, 2021 at 6:31 PM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > When reading the doc of ALTER SUBSCRIPTION I realized that 'refresh
> > > > > options' in the following paragraph is not tagged:
> > > > >
> > > > > ---
> > > > > Additionally, refresh options as described under REFRESH PUBLICATION
> > > > > may be specified, except in the case of DROP PUBLICATION.
> > > > > ---
> > > > >
> > > > > When I read it for the first time, I got confused because we actually
> > > > > have the 'refresh' option and this description in the paragraph of the
> > > > > 'refresh' option. I think we can improve it by changing to
> > > > > 'refresh_option'. Thoughts?
> > > > >
> > > >
> > > > I see that one can get confused but how about changing it to
> > > > "Additionally, refresh options as described under REFRESH
> > > > PUBLICATION (refresh_option) may
> > > > be specified,.."? I think keeping "refresh options" in the text would
> > > > be good because there could be multiple such options.
> > > >
> > >
> > > I feel like it would be better to reword it in some way that avoids
> > > using parentheses because they look like part of the syntax instead of
> > > just part of the sentence.
> > >
> >
> > Fair enough, feel free to propose if you find something better or if
> > you think the current text in the docs is good.
> >
>

Thank you for the comments!

> IMO just the same as your suggestion but without the parens would be good. 
> e.g.
>
> "Additionally, refresh options as described under REFRESH
> PUBLICATION refresh_option may be
> specified,.."

But "REFRESH PUBLICATION refresh_option" seems wrong in terms of SQL
syntax, not?

Given there could be multiple options how about using
"refresh_options"? That is, the sentence
will be:

Additionally, refresh_options as described
under REFRESH PUBLICATION may be specified,
except in the case of DROP PUBLICATION.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Tom Lane

Mark Dilger  writes:
>> On Aug 9, 2021, at 12:14 PM, Tom Lane  wrote:
>> Pushed, but while re-reading it before commit I noticed that there's
>> some more fairly low-hanging fruit in regexp_replace().

> I've been reviewing and testing this (let-regexp_replace-use-NOSUB.patch) 
> since you sent it 4 hours ago, and I can't seem to break it.  There are 
> pre-existing problems in the regex code, but this doesn't seem to add any new 
> breakage.

Pushed that bit, thanks for testing!

I plan to not do anything about the (()){0} bug until after the release
window, since that will need to be back-patched.  That bug's gotta be
twenty years old, so while I kinda wish we'd found it a few days
earlier, waiting another 3 months won't make much difference.

regards, tom lane

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Mark Dilger

> On Aug 9, 2021, at 5:14 PM, Mark Dilger  wrote:
> 
>our $match;
>if ('foo' =~ m/((.)(??{ die; })){0}(..)/)

I left in a stray variable.  A prior version of this script was assigning to 
$match where it now has die.  Sorry for any confusion.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Mark Dilger

> On Aug 9, 2021, at 4:31 PM, Tom Lane  wrote:
> 
> There is a potentially interesting definitional question:
> what exactly ought this regexp do?
> 
>   ((.)){0}\2
> 
> Because the capturing paren sets are zero-quantified, they will
> never be matched to any characters, so the backref can never
> have any defined referent.

Perl regular expressions are not POSIX, but if there is a principled reason 
POSIX should differ from perl on this, we should be clear what that is:

#!/usr/bin/perl

use strict;
use warnings;

our $match;
if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
{
print "captured 1 $1\n" if defined $1;
print "captured 2 $2\n" if defined $2;
print "captured 3 $3\n" if defined $3;
print "captured 4 $4\n" if defined $4;
print "match = $match\n" if defined $match;
}

This will print "captured 3 fo", proving that although the regular expression 
is parsed with the (..) bound to the third capture group, the first two capture 
groups never run.  If you don't believe that, change the {0} to {1} and observe 
that the script dies.

> So I think throwing an
> error is an appropriate response.  The existing code will
> throw such an error for
> 
>   ((.)){0}\1
> 
> so I guess Spencer did think about this to some extent -- he
> just forgot about the possibility of nested parens.

Ugg.  That means our code throws an error where perl does not, pretty well 
negating my point above.  If we're already throwing an error for this type of 
thing, I agree we should be consistent about it.  My personal preference would 
have been to do the same thing as perl, but it seems that ship has already 
sailed.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-09 Thread Alvaro Herrera

Hello,

On 2021-Jul-22, Andres Freund wrote:

> 1) Somehow it seems like a violation to do stuff like
>get_partition_ancestors() in pgstat.c. It's nothing I can't live with, but
>it feels a bit off. Would likely not be too hard to address, e.g. by just
>putting some of pgstat_report_anl_ancestors in partition.c instead.

I understand the complain about this being a modularity violation -- the
point being that pgstat.c has no business accessing system catalogs at all.
Before this function, all pgstat_report_* functions were just assembling
a message from counters accumulated somewhere and sending the bytes to
the collector, and this new function is a deviation from that.

It seems that we could improve this by having a function (maybe in
partition.c as you propose), something like

static void
report_partition_ancestors(Oid relid)
{
ancestors = get_partition_ancestors( ... );
array = palloc(sizeof(Oid) * list_length(ancestors));
foreach(lc, ancestors)
{
array[i++] = lfirst_oid(lc);
}
pgstat_report_partition_ancestors(oid, array);
}

and then pgstat.c works with the given array without having to consult
system catalogs.

> 2) Why does it make sense that autovacuum sends a stats message for every
>partition in the system that had any [changes] since the last autovacuum
>cycle? On a database with a good number of objects / a short naptime we'll
>often end up sending messages for the same set of tables from separate
>workers, because they don't yet see the concurrent
>tabentry->changes_since_analyze_reported.

The traffic could be large, yeah, and I agree it seems undesirable.  If
collector kept a record of the list of ancestors of each table, then we
wouldn't need to do this (we would have to know if collector knows a
particular partition or not, though ... I have no ideas on that.)

> 3) What is the goal of the autovac_refresh_stats() after the loop doing
>pgstat_report_anl_ancestors()? I think it'll be common that the stats
>collector hasn't even processed the incoming messages by that point, not to
>speak of actually having written out a new stats file. If it took less than
>10ms (PGSTAT_RETRY_DELAY) to get to autovac_refresh_stats(),
>backend_read_statsfile() will not wait for a new stats file to be written
>out, and we'll just re-read the state we previously did.
> 
>It's pretty expensive to re-read the stats file in some workloads, so I'm a
>bit concerned that we end up significantly increasing the amount of stats
>updates/reads, without actually gaining anything reliable?

This is done once per autovacuum run and the point is precisely to let
the next block absorb the updates that were sent.  In manual ANALYZE we
do it to inform future autovacuum runs.

Note that the PGSTAT_RETRY_DELAY limit is used by the autovac launcher
only, and this code is running in the worker; we do flush out the old
data.  Yes, it's expensive, but we're not doing it once per table, just
once per worker run.

> 4) In the shared mem stats patch I went to a fair bit of trouble to try to get
>rid of pgstat_vacuum_stat() (which scales extremely poorly to larger
>systems). For that to work pending stats can only be "staged" while holding
>a lock on a relation that prevents the relation from being concurrently
>dropped (pending stats increment a refcount for the shared stats object,
>which ensures that we don't loose track of the fact that a stats object has
>been dropped, even when stats only get submitted later).
> 
>I'm not yet clear on how to make this work for
>pgstat_report_anl_ancestors() - but I probably can find a way. But it does
>feel a bit off to issue stats stuff for tables we're not sure still exist.

I assume you refer to locking the *partition*, right?  You're not
talking about locking the ancestor mentioned in the message.  I don't
know how does the shmem-collector work, but it shouldn't be a problem
that an ancestor goes away (ALTER TABLE parent DETACH; DROP TABLE
parent); as long as you've kept a lock on the partition, it should be
fine.  Or am I misinterpreting what you mean?

> I'll go and read through the thread, but my first thought is that having a
> hashtable in do_autovacuum() that contains stats for partitioned tables would
> be a good bit more efficient than the current approach? We already have a
> hashtable for each toast table, compared to that having a hashtable for each
> partitioned table doesn't seem like it'd be a problem?

> With a small bit of extra work that could even avoid the need for the
> additional pass through pg_class. Do the partitioned table data-gathering as
> part of the "collect main tables to vacuum" pass, and then do one of

I'll have to re-read the thread to remember why did I make it a separate
pass.  I think I did it that way because otherwise there was a
requirement on the pg_class scan order.  (Some earlier vers

Re: Estimating HugePages Requirements?

2021-08-09 Thread Justin Pryzby

On Thu, Jun 10, 2021 at 07:23:33PM -0500, Justin Pryzby wrote:
> On Wed, Jun 09, 2021 at 10:55:08PM -0500, Don Seiler wrote:
> > On Wed, Jun 9, 2021, 21:03 P C  wrote:
> > 
> > > I agree, its confusing for many and that confusion arises from the fact
> > > that you usually talk of shared_buffers in MB or GB whereas hugepages have
> > > to be configured in units of 2mb. But once they understand they realize 
> > > its
> > > pretty simple.
> > >
> > > Don, we have experienced the same not just with postgres but also with
> > > oracle. I havent been able to get to the root of it, but what we usually 
> > > do
> > > is, we add another 100-200 pages and that works for us. If the SGA or
> > > shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
> > > hundred MBs  may be wasted (because the moment you configure hugepages, 
> > > the
> > > operating system considers it as used and does not use it any more) but
> > > nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb to
> > > 1gb does not hurt really.
> > 
> > I don't have a problem with the math, just wanted to know if it was
> > possible to better estimate what the actual requirements would be at
> > deployment time. My fallback will probably be you did and just pad with an
> > extra 512MB by default.
> 
> It's because the huge allocation isn't just shared_buffers, but also
> wal_buffers:
> 
> | The amount of shared memory used for WAL data that has not yet been written 
> to disk.
> | The default setting of -1 selects a size equal to 1/32nd (about 3%) of 
> shared_buffers, ...
> 
> .. and other stuff:

I wonder if this shouldn't be solved the other way around:

Define shared_buffers as the exact size to be allocated/requested from the OS
(regardless of whether they're huge pages or not), and have postgres compute
everything else based on that.  So shared_buffers=2GB would end up being 1950MB
(or so) of buffer cache.  We'd have to check that after the other allocations,
there's still at least 128kB left for the buffer cache.  Maybe we'd have to
bump the minimum value of shared_buffers.

> src/backend/storage/ipc/ipci.c
>* Size of the Postgres shared-memory block is estimated via
>* moderately-accurate estimates for the big hogs, plus 100K for the
>* stuff that's too small to bother with estimating.
>*
>* We take some care during this phase to ensure that the total size
>* request doesn't overflow size_t.  If this gets through, we don't
>* need to be so careful during the actual allocation phase.
>*/
>   size = 10;
>   size = add_size(size, PGSemaphoreShmemSize(numSemas));
>   size = add_size(size, SpinlockSemaSize());
>   size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
>   
>  sizeof(ShmemIndexEnt)));
>   size = add_size(size, dsm_estimate_size());
>   size = add_size(size, BufferShmemSize());
>   size = add_size(size, LockShmemSize());
>   size = add_size(size, PredicateLockShmemSize());
>   size = add_size(size, ProcGlobalShmemSize());
>   size = add_size(size, XLOGShmemSize());
>   size = add_size(size, CLOGShmemSize());
>   size = add_size(size, CommitTsShmemSize());
>   size = add_size(size, SUBTRANSShmemSize());
>   size = add_size(size, TwoPhaseShmemSize());
>   size = add_size(size, BackgroundWorkerShmemSize());
>   size = add_size(size, MultiXactShmemSize());
>   size = add_size(size, LWLockShmemSize());
>   size = add_size(size, ProcArrayShmemSize());
>   size = add_size(size, BackendStatusShmemSize());
>   size = add_size(size, SInvalShmemSize());
>   size = add_size(size, PMSignalShmemSize());
>   size = add_size(size, ProcSignalShmemSize());
>   size = add_size(size, CheckpointerShmemSize());
>   size = add_size(size, AutoVacuumShmemSize());
>   size = add_size(size, ReplicationSlotsShmemSize());
>   size = add_size(size, ReplicationOriginShmemSize());
>   size = add_size(size, WalSndShmemSize());
>   size = add_size(size, WalRcvShmemSize());
>   size = add_size(size, PgArchShmemSize());
>   size = add_size(size, ApplyLauncherShmemSize());
>   size = add_size(size, SnapMgrShmemSize());
>   size = add_size(size, BTreeShmemSize());
>   size = add_size(size, SyncScanShmemSize());
>   size = add_size(size, AsyncShmemSize());
> #ifdef EXEC_BACKEND
>   size = add_size(size, ShmemBackendArraySize());
> #endif
> 
>   /* freeze the addin request size and include it */
>   addin_request_allowed = false;
>   size = add_size(size, total_addin_request);
> 
> /* might as well round it off to a multiple of a typical page size */
> size = add_size(size, 8192 - (size % 8192));
> 
> BTW, I think it'd be nice if this were a NOTICE:
> | elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m",

Re: Estimating HugePages Requirements?

2021-08-09 Thread Bossart, Nathan

On 8/9/21, 4:05 PM, "Zhihong Yu"  wrote:
> -extern void CreateSharedMemoryAndSemaphores(void);
> +extern Size CreateSharedMemoryAndSemaphores(bool size_only);
>
> Should the parameter be enum / bitmask so that future addition would not 
> change the method signature ?

I don't have a strong opinion about this.  I don't feel that it's
really necessary, but if reviewers want a bitmask instead, I can
change it.

Nathan

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Mark Dilger




> On Aug 9, 2021, at 12:14 PM, Tom Lane  wrote:
> 
> Pushed, but while re-reading it before commit I noticed that there's
> some more fairly low-hanging fruit in regexp_replace().  As I had it
> in that patch, it never used REG_NOSUB because of the possibility
> that the replacement string uses "\N".  However, we're already
> pre-checking the replacement string to see if it has backslashes
> at all, so while we're at it we can check for \N to discover if we
> actually need any subexpression match data or not.  We do need to
> refactor a little to postpone calling pg_regcomp until after we
> know that, but I think that makes replace_text_regexp's API less
> ugly not more so.
> 
> While I was at it, I changed the search-for-backslash loops to
> use memchr rather than handwritten looping.  Their use of
> pg_mblen was pretty unnecessary given we only need to find
> backslashes, and we can assume the backend encoding is ASCII-safe.
> 
> Using a bunch of random cases generated by your little perl
> script, I see maybe 10-15% speedup on test cases that don't
> use \N in the replacement string, while it's about a wash
> on cases that do.  (If I'd been using a multibyte encoding,
> maybe the memchr change would have made a difference, but
> I didn't try that.)

I've been reviewing and testing this (let-regexp_replace-use-NOSUB.patch) since 
you sent it 4 hours ago, and I can't seem to break it.  There are pre-existing 
problems in the regex code, but this doesn't seem to add any new breakage.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Tom Lane

Mark Dilger  writes:
> +select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]');
> +server closed the connection unexpectedly

Here's a quick draft patch for this.  Basically it moves the
responsibility for clearing v->subs[] pointers into the freesubre()
recursion, so that it will happen for contained capturing parens
not only the top level.

There is a potentially interesting definitional question:
what exactly ought this regexp do?

((.)){0}\2

Because the capturing paren sets are zero-quantified, they will
never be matched to any characters, so the backref can never
have any defined referent.  I suspect that study of the POSIX
spec would lead to the conclusion that this is a legal regexp
but it will never match anything.  Implementing that would be
tedious though, and what's more it seems very unlikely that
the user wanted any such behavior.  So I think throwing an
error is an appropriate response.  The existing code will
throw such an error for

((.)){0}\1

so I guess Spencer did think about this to some extent -- he
just forgot about the possibility of nested parens.

This patch should work OK in HEAD and v14, but it will need
a bit of fooling-about for older branches I think, given that
they fill v->subs[] a little differently.

regards, tom lane

diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index ae3a7b6a38..ae1ff670f9 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -550,8 +550,6 @@ freev(struct vars *v,
 {
 	if (v->re != NULL)
 		rfree(v->re);
-	if (v->subs != v->sub10)
-		FREE(v->subs);
 	if (v->nfa != NULL)
 		freenfa(v->nfa);
 	if (v->tree != NULL)
@@ -562,6 +560,8 @@ freev(struct vars *v,
 		freecvec(v->cv);
 	if (v->cv2 != NULL)
 		freecvec(v->cv2);
+	if (v->subs != v->sub10)
+		FREE(v->subs);
 	if (v->lacons != NULL)
 		freelacons(v->lacons, v->nlacons);
 	ERR(err);	/* nop if err==0 */
@@ -1091,8 +1091,8 @@ parseqatom(struct vars *v,
 	{
 		if (atom != NULL)
 			freesubre(v, atom);
-		if (atomtype == '(')
-			v->subs[subno] = NULL;
+		/* freesubre() will clean up any v->subs[] pointers into the atom */
+		assert(atomtype != '(' || v->subs[subno] == NULL);
 		delsub(v->nfa, lp, rp);
 		EMPTYARC(lp, rp);
 		return top;
@@ -2127,9 +2127,24 @@ freesrnode(struct vars *v,		/* might be NULL */
 	if (sr == NULL)
 		return;
 
+	/*
+	 * If the node is referenced in v->subs[], clear that to avoid leaving a
+	 * dangling pointer.  This should only ever matter when parseqatom() is
+	 * throwing away a {0,0}-quantified atom that contains capturing parens.
+	 * Doing this will cause later backrefs to such parens to fail, which is
+	 * maybe not strictly POSIX but seems more helpful than allowing a backref
+	 * that can never have a defined referent.
+	 */
+	if (sr->capno > 0 && v != NULL)
+	{
+		assert(v->subs[sr->capno] == sr);
+		v->subs[sr->capno] = NULL;
+	}
+
 	if (!NULLCNFA(sr->cnfa))
 		freecnfa(&sr->cnfa);
 	sr->flags = 0;/* in particular, not INUSE */
+	sr->capno = 0;
 	sr->child = sr->sibling = NULL;
 	sr->begin = sr->end = NULL;
 
diff --git a/src/test/modules/test_regex/expected/test_regex.out b/src/test/modules/test_regex/expected/test_regex.out
index 5a6cdf47c2..850e0c8ef5 100644
--- a/src/test/modules/test_regex/expected/test_regex.out
+++ b/src/test/modules/test_regex/expected/test_regex.out
@@ -3506,6 +3506,10 @@ select * from test_regex('((.))(\2)', 'xyy', 'oRP');
  {yy,NULL,NULL,NULL}
 (2 rows)
 
+-- # throwing an error for this is arguably not per POSIX
+-- expectError	21.39 -		{((.)){0}(\2){0}}	ESUBREG
+select * from test_regex('((.)){0}(\2){0}', '', '-');
+ERROR:  invalid regular expression: invalid backreference number
 -- doing 22 "multicharacter collating elements"
 -- # again ugh
 -- MCCEs are not implemented in Postgres, so we skip all these tests
diff --git a/src/test/modules/test_regex/sql/test_regex.sql b/src/test/modules/test_regex/sql/test_regex.sql
index 3419564203..378f22b67b 100644
--- a/src/test/modules/test_regex/sql/test_regex.sql
+++ b/src/test/modules/test_regex/sql/test_regex.sql
@@ -1020,6 +1020,9 @@ select * from test_regex('((.))(\2){0}', 'xy', 'RPQ');
 select * from test_regex('((.))(\2)', 'xyy', 'RP');
 -- expectMatch	21.38 oRP	((.))(\2)	xyy	yy	{}	{}	{}
 select * from test_regex('((.))(\2)', 'xyy', 'oRP');
+-- # throwing an error for this is arguably not per POSIX
+-- expectError	21.39 -		{((.)){0}(\2){0}}	ESUBREG
+select * from test_regex('((.)){0}(\2){0}', '', '-');
 
 -- doing 22 "multicharacter collating elements"
 -- # again ugh

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Peter Geoghegan

On Mon, Aug 9, 2021 at 3:51 PM Bruce Momjian  wrote:
> > I think that there was a snowballing effect here. We both made
> > mistakes that compounded. I apologize for my role in this whole mess.
>
> I do think all committers need to understand the role of the RMT so they
> can respond appropriately.  Do we need to communicate this better?

I think that it makes sense to codify the practical expectations that
the community has of existing committers, at least to some degree. I
mean why wouldn't we? The resulting document (an addition to the
"committers" Postgres wiki page?) would inevitably leave certain
questions open to interpretation. That seems okay to me.

I don't feel qualified to write this myself. Just my opinion.

-- 
Peter Geoghegan

Re: Estimating HugePages Requirements?

2021-08-09 Thread Zhihong Yu

On Mon, Aug 9, 2021 at 3:57 PM Bossart, Nathan  wrote:

> On 6/9/21, 8:09 PM, "Bossart, Nathan"  wrote:
> > On 6/9/21, 3:51 PM, "Mark Dilger"  wrote:
> >>> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan 
> wrote:
> >>>
> >>> I'd be happy to clean it up and submit it for
> >>> discussion in pgsql-hackers@ if there is interest.
> >>
> >> Yes, I'd like to see it.  Thanks for offering.
> >
> > Here's the general idea.  It still needs a bit of polishing, but I'm
> > hoping this is enough to spark some discussion on the approach.
>
> Here's a rebased version of the patch.
>
> Nathan
>
> Hi,

-extern void CreateSharedMemoryAndSemaphores(void);
+extern Size CreateSharedMemoryAndSemaphores(bool size_only);

Should the parameter be enum / bitmask so that future addition would not
change the method signature ?

Cheers

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Bruce Momjian

On Mon, Aug  9, 2021 at 03:42:45PM -0700, Peter Geoghegan wrote:
> > I'd like to apologize for not answering your email the way I should
> > have. Honestly it never occurred to me. Maybe that's because I'm used
> > to other procedures, or because I never had to converse with the RMT,
> > or maybe, quite simple, because I lacked the time to think it through,
> > the original issue that kind of started this whole mess.
> 
> I think that there was a snowballing effect here. We both made
> mistakes that compounded. I apologize for my role in this whole mess.

I do think all committers need to understand the role of the RMT so they
can respond appropriately.  Do we need to communicate this better?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Peter Geoghegan

Michael,

On Mon, Aug 9, 2021 at 3:03 PM Michael Meskes  wrote:
> This explains why it felt so difficult to make myself clear. I was
> feeling exactly the same, just the other way round.

My own special brand of miscommunication was also involved. I happen
to be sensitive to the perception that I yield any authority that I
might have (as an RMT member, as a committer, whatever) in a way that
is arbitrary or unfair. And so I wrote way too much about why that
wasn't actually the case here. I now realize that that wasn't really
relevant.

> I never though that. Maybe we should have quickly talked things out.
> Email tends to be a bad medium for communication, especially when it
> goes wrong. :)

Indeed. That might well have happened if we had been set up for it already.

> I'd like to apologize for not answering your email the way I should
> have. Honestly it never occurred to me. Maybe that's because I'm used
> to other procedures, or because I never had to converse with the RMT,
> or maybe, quite simple, because I lacked the time to think it through,
> the original issue that kind of started this whole mess.

I think that there was a snowballing effect here. We both made
mistakes that compounded. I apologize for my role in this whole mess.

-- 
Peter Geoghegan

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Andrew Dunstan



On 8/9/21 6:15 PM, Bruce Momjian wrote:
> On Tue, Aug 10, 2021 at 12:03:24AM +0200, Michael Meskes wrote:
>> I'd like to apologize for not answering your email the way I should
>> have. Honestly it never occurred to me. Maybe that's because I'm used
>> to other procedures, or because I never had to converse with the RMT,
>> or maybe, quite simple, because I lacked the time to think it through,
>> the original issue that kind of started this whole mess.
> Agreed.  When the RMT contacts me, I assume it is something that is time
> and release critical so I give them as much detail as I can, and a
> timeline when they will get more information.  If you are not focused on
> the RMT process, it might not be clear why that is important.
>

That's what you should be doing. If nothing else comes from this
colloquy it should make all committers aware of the process. The reason
we have an RMT, as I understand it, is to prevent the situation we had
years ago when things sometimes dragged on almost interminably after
feature freeze.


cheers


andrew

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Tom Lane

I wrote:
> Hmmm ... yeah, I see it too.  This points up something I'd wondered
> about before, which is whether the code that "cancels everything"
> after detecting {0} is really OK.  It throws away the outer subre
> *and children* without worrying about what might be inside, and
> here we see that that's not good enough --- there's still a v->subs
> pointer to the first capturing paren set, which we just deleted,
> so that the \1 later on messes up.  I'm not sure why the back
> branches are managing not to crash, but that might just be a memory
> management artifact.

... yeah, it is.  For me, this variant hits the assertion in all
branches:

regression=# select regexp_split_to_array('', '((.)){0}(\2){0}');
server closed the connection unexpectedly

So that's a pre-existing (and very long-standing) bug.  I'm not
sure if it has any serious impact in non-assert builds though.
Failure to clean out some disconnected arcs probably has no
real effect on the regex's behavior later.

regards, tom lane

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Meskes

> > Again agreed, please keep in mind, though, that I didn't notice I
> > was
> > being chased until Peter's first email.
> 
> I was asked by the RMT to contact one of your employees, and I did,
> to
> tell you that the RMT was looking for feedback from you on an ecpg
> issue.  Not sure if that was before or after Peter's email.

I think that was before, at that point I still thought it was nothing
time sensitive. And unfortunately it didn't register that RMT was
involved at all.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Bruce Momjian

On Tue, Aug 10, 2021 at 12:03:24AM +0200, Michael Meskes wrote:
> I'd like to apologize for not answering your email the way I should
> have. Honestly it never occurred to me. Maybe that's because I'm used
> to other procedures, or because I never had to converse with the RMT,
> or maybe, quite simple, because I lacked the time to think it through,
> the original issue that kind of started this whole mess.

Agreed.  When the RMT contacts me, I assume it is something that is time
and release critical so I give them as much detail as I can, and a
timeline when they will get more information.  If you are not focused on
the RMT process, it might not be clear why that is important.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Bruce Momjian

On Mon, Aug  9, 2021 at 06:00:00PM -0400, Bruce Momjian wrote:
> > Again agreed, please keep in mind, though, that I didn't notice I was
> > being chased until Peter's first email.
> 
> I was asked by the RMT to contact one of your employees, and I did, to
> tell you that the RMT was looking for feedback from you on an ecpg
> issue.  Not sure if that was before or after Peter's email.

The date of that request was July 28 and I was told by your employee
that you would be informed that afternoon.  If you want the employee's
name, I will provide it in a private email.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Meskes

Peter,

> I think that this must be a cultural thing. I can see how somebody
> would see the third person style as overly formal or stilted. An
> interpretation like that does make sense to me. But I knew of no
> reason why you might find that style made the message offensive. It
> was never intended to denigrate.

This explains why it felt so difficult to make myself clear. I was
feeling exactly the same, just the other way round.

> I don't know you all that well, but we have talked for quite a while
> on a few occasions. I got the sense that you are a decent person from
> these conversations. I have no possible reason to denigrate or insult
> you. In general I try to be respectful, and if I ever fail it's not
> because I didn't care at all. Anybody that knows me well knows that I
> am not a mean spirited person.

I never though that. Maybe we should have quickly talked things out.
Email tends to be a bad medium for communication, especially when it
goes wrong. :)

> If this is just an unfortunate misunderstanding, as I suspect it is,
> then I would be happy to let it go, and treat it as something to
> learn
> from.

Agreed, me too. 

I'd like to apologize for not answering your email the way I should
have. Honestly it never occurred to me. Maybe that's because I'm used
to other procedures, or because I never had to converse with the RMT,
or maybe, quite simple, because I lacked the time to think it through,
the original issue that kind of started this whole mess.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Bruce Momjian

On Mon, Aug  9, 2021 at 11:48:07PM +0200, Michael Meskes wrote:
> No, of course not. And sorry for not being precise enough, I only
> objected to the prediction part, but I agree, I take the objection
> back. I guess it's as difficult for Peter to understand why this is
> offensive as it is for me to not see it as such.

OK, good.

> > Let me be practical here --- the more someone has to be chased for a
> > reply, the less confidence they have in that person.  If the RMT
> > contacts you about something, and obviously they have had to take
> > usual
> > efforts to contact you, the more it is on you to give a full report
> > and
> > a timeline of when you will address the issue.  If they had to chase
> > you
> > around, and you gave them a short answer, the less confidence they
> > have
> > in this getting resolved in a timely manner.
> 
> Again agreed, please keep in mind, though, that I didn't notice I was
> being chased until Peter's first email.

I was asked by the RMT to contact one of your employees, and I did, to
tell you that the RMT was looking for feedback from you on an ecpg
issue.  Not sure if that was before or after Peter's email.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Meskes

> > > I don't want to upset anybody for any reason. I regret that my
> > > words
> > > have upset you, but I think that they were misinterpreted in a
> > > way
> > > that I couldn't possibly have predicted. The particular aspect of
> > 
> > I strongly object to that. It's pretty obvious to me that
> > addressing
> > people in third person is very offending.
> 
> So, you object to him referring to you in the third person in an
> email,
> and you object to him saying it was "misinterpreted".  Are you going
> to
> object to my email too?

No, of course not. And sorry for not being precise enough, I only
objected to the prediction part, but I agree, I take the objection
back. I guess it's as difficult for Peter to understand why this is
offensive as it is for me to not see it as such.

> I think it might have been in the third person because at that point,
> Peter didn't expect a reply from you, and put you on the "TO" line
> merely as a courtesy.  He could have put out an email about reverting
> the patch without you on the email header at all, I guess --- then he
> could have referred to you without offending you.

Right, that was my only problem originally. It seemed difficult to
bring that point over.

> Let me be practical here --- the more someone has to be chased for a
> reply, the less confidence they have in that person.  If the RMT
> contacts you about something, and obviously they have had to take
> usual
> efforts to contact you, the more it is on you to give a full report
> and
> a timeline of when you will address the issue.  If they had to chase
> you
> around, and you gave them a short answer, the less confidence they
> have
> in this getting resolved in a timely manner.

Again agreed, please keep in mind, though, that I didn't notice I was
being chased until Peter's first email.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

Re: Slow standby snapshot

2021-08-09 Thread Michail Nikolaev

Hello, Andres.

Thanks for the feedback again.

> The problem is that we don't want to add a lot of work
> KnownAssignedXidsAdd/Remove, because very often nobody will build a snapshot
> for that moment and building a sorted, gap-free, linear array of xids isn't
> cheap. In my experience it's more common to be bottlenecked by replay CPU
> performance than on replica snapshot building.

Yes, but my patch adds almost the smallest possible amount for
KnownAssignedXidsAdd/Remove - a single write to the array by index.
It differs from the first version in this thread which is based on linked lists.
The "next valid offset" is just "optimistic optimization" - it means
"you could safely skip KnownAssignedXidsNext[i] while finding the next
valid".
But KnownAssignedXidsNext is not updated by Add/Remove.

> During recovery, where there's only one writer to the procarray / known xids,
> it might not be hard to opportunistically build a dense version of the known
> xids whenever a snapshot is built.

AFAIU the patch does exactly the same.
On the first snapshot building, offsets to the next valid entry are
set. So, a dense version is created on-demand.
And this version is reused (even partly if something was removed) on
the next snapshot building.

> I'm not entirely sure what datastructure would work best. I can see something
> like a radix tree work well, or a skiplist style approach. Or a hashtable:

We could try to use some other structure (for example - linked hash
map) - but the additional cost (memory management, links, hash
calculation) will probably significantly reduce performance.
And it is a much harder step to perform.

So, I think "next valid offset" optimization is a good trade-off for now:
* KnownAssignedXidsAdd/Remove are almost not affected in their complexity
* KnownAssignedXidsGetAndSetXmin is eliminated from the CPU top on
typical read scenario - so, more TPS, less ProcArrayLock contention
* it complements GetSnapshotData() scalability - now on standby
* changes are small

Thanks,
Michail.

Re: Asynchronous and "direct" IO support for PostgreSQL.

2021-08-09 Thread Melanie Plageman

On Wed, Jul 28, 2021 at 1:37 PM Melanie Plageman
 wrote:
>
> On Tue, Feb 23, 2021 at 5:04 AM Andres Freund  wrote:
> >
> > ## AIO API overview
> >
> > The main steps to use AIO (without higher level helpers) are:
> >
> > 1) acquire an "unused" AIO: pgaio_io_get()
> >
> > 2) start some IO, this is done by functions like
> >pgaio_io_start_(read|write|fsync|flush_range)_(smgr|sb|raw|wal)
> >
> >The (read|write|fsync|flush_range) indicates the operation, whereas
> >(smgr|sb|raw|wal) determines how IO completions, errors, ... are handled.
> >
> >(see below for more details about this design choice - it might or not be
> >right)
> >
> > 3) optionally: assign a backend-local completion callback to the IO
> >(pgaio_io_on_completion_local())
> >
> > 4) 2) alone does *not* cause the IO to be submitted to the kernel, but to be
> >put on a per-backend list of pending IOs. The pending IOs can be 
> > explicitly
> >be flushed pgaio_submit_pending(), but will also be submitted if the
> >pending list gets to be too large, or if the current backend waits for 
> > the
> >IO.
> >
> >The are two main reasons not to submit the IO immediately:
> >- If adjacent, we can merge several IOs into one "kernel level" IO during
> >  submission. Larger IOs are considerably more efficient.
> >- Several AIO APIs allow to submit a batch of IOs in one system call.
> >
> > 5) wait for the IO: pgaio_io_wait() waits for an IO "owned" by the current
> >backend. When other backends may need to wait for an IO to finish,
> >pgaio_io_ref() can put a reference to that AIO in shared memory (e.g. a
> >BufferDesc), which can be waited for using pgaio_io_wait_ref().
> >
> > 6) Process the results of the request. If a callback was registered in 3),
> >this isn't always necessary. The results of AIO can be accessed using
> >pgaio_io_result() which returns an integer where negative numbers are
> >-errno, and positive numbers are the [partial] success conditions
> >(e.g. potentially indicating a short read).
> >
> > 7) release ownership of the io (pgaio_io_release()) or reuse the IO for
> >another operation (pgaio_io_recycle())
> >
> >
> > Most places that want to use AIO shouldn't themselves need to care about
> > managing the number of writes in flight, or the readahead distance. To help
> > with that there are two helper utilities, a "streaming read" and a 
> > "streaming
> > write".
> >
> > The "streaming read" helper uses a callback to determine which blocks to
> > prefetch - that allows to do readahead in a sequential fashion but 
> > importantly
> > also allows to asynchronously "read ahead" non-sequential blocks.
> >
> > E.g. for vacuum, lazy_scan_heap() has a callback that uses the visibility 
> > map
> > to figure out which block needs to be read next. Similarly 
> > lazy_vacuum_heap()
> > uses the tids in LVDeadTuples to figure out which blocks are going to be
> > needed. Here's the latter as an example:
> > https://github.com/anarazel/postgres/commit/a244baa36bfb252d451a017a273a6da1c09f15a3#diff-3198152613d9a28963266427b380e3d4fbbfabe96a221039c6b1f37bc575b965R1906
> >
>
> Attached is a patch on top of the AIO branch which does bitmapheapscan
> prefetching using the PgStreamingRead helper already used by sequential
> scan and vacuum on the AIO branch.
>
> The prefetch iterator is removed and the main iterator in the
> BitmapHeapScanState node is now used by the PgStreamingRead helper.
>
...
>
> Oh, and I haven't done testing to see how effective the prefetching is
> -- that is a larger project that I have yet to tackle.
>

I have done some testing on how effective it is now.

I've also updated the original patch to count the first page (in the
lossy/exact page counts mentioned down-thread) as well as to remove
unused prefetch fields and comments.
I've also included a second patch which adds IO wait time information to
EXPLAIN output when used like:
  EXPLAIN (buffers, analyze) SELECT ...

The same commit also introduces a temporary dev GUC
io_bitmap_prefetch_depth which I am using to experiment with the
prefetch window size.

I wanted to share some results from changing the prefetch window to
demonstrate how prefetching is working.

The short version of my results is that the prefetching works:

- with the prefetch window set to 1, the IO wait time is 1550 ms
- with the prefetch window set to 128, the IO wait time is 0.18 ms

DDL and repro details below:

On Andres' AIO branch [1] with my bitmap heapscan prefetching patch set
applied built with the following build flags:
-02 -fno-omit-frame-pointer --with-liburing

And these non-default PostgreSQL settings:
  io_data_direct=1
  io_data_force_async=off
  io_method=io_uring
  log_min_duration_statement=0
  log_duration=on
  set track_io_timing to on;

  set max_parallel_workers_per_gather to 0;
  set enable_seqscan to off;
  set enable_indexscan to off;
  set enable_bitmapscan to on;

  set effective_io_concurren

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Tom Lane

Mark Dilger  writes:
> I can still trigger the old bug for which we thought we'd pushed a fix.  The 
> test case below crashes on master (e12694523e7e4482a052236f12d3d8b58be9a22c), 
> and also on the fixed version "Make regexp engine's backref-related 
> compilation state more bulletproof." 
> (cb76fbd7ec87e44b3c53165d68dc2747f7e26a9a).

> Can you test if it crashes for you, too?  I'm not sure I see why this one 
> fails when millions of others pass.

> The backtrace is still complaining about regc_nfa.c:1265:

> +select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]');
> +server closed the connection unexpectedly

Hmmm ... yeah, I see it too.  This points up something I'd wondered
about before, which is whether the code that "cancels everything"
after detecting {0} is really OK.  It throws away the outer subre
*and children* without worrying about what might be inside, and
here we see that that's not good enough --- there's still a v->subs
pointer to the first capturing paren set, which we just deleted,
so that the \1 later on messes up.  I'm not sure why the back
branches are managing not to crash, but that might just be a memory
management artifact.

regards, tom lane

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Peter Geoghegan

On Mon, Aug 9, 2021 at 1:38 PM Michael Meskes  wrote:
> > I don't want to upset anybody for any reason. I regret that my words
> > have upset you, but I think that they were misinterpreted in a way
> > that I couldn't possibly have predicted. The particular aspect of
>
> I strongly object to that. It's pretty obvious to me that addressing
> people in third person is very offending.

I think that this must be a cultural thing. I can see how somebody
would see the third person style as overly formal or stilted. An
interpretation like that does make sense to me. But I knew of no
reason why you might find that style made the message offensive. It
was never intended to denigrate.

I don't know you all that well, but we have talked for quite a while
on a few occasions. I got the sense that you are a decent person from
these conversations. I have no possible reason to denigrate or insult
you. In general I try to be respectful, and if I ever fail it's not
because I didn't care at all. Anybody that knows me well knows that I
am not a mean spirited person.

If this is just an unfortunate misunderstanding, as I suspect it is,
then I would be happy to let it go, and treat it as something to learn
from.

-- 
Peter Geoghegan

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Bruce Momjian

On Mon, Aug  9, 2021 at 10:38:07PM +0200, Michael Meskes wrote:
> > Clearly we disagree about this. I don't think that there is anything
> > to be gained from discussing this any further, though. I suggest that
> > we leave it at that.
> 
> Agreed.
> 
> > I don't want to upset anybody for any reason. I regret that my words
> > have upset you, but I think that they were misinterpreted in a way
> > that I couldn't possibly have predicted. The particular aspect of
> 
> I strongly object to that. It's pretty obvious to me that addressing
> people in third person is very offending.

So, you object to him referring to you in the third person in an email,
and you object to him saying it was "misinterpreted".  Are you going to
object to my email too?

I think everyone can accept that you interpreted what Peter said as
offensive, but you must also give the same acceptance that someone might
not consider it offensive.  For example, I did not read it as offensive
at all.

I think it might have been in the third person because at that point,
Peter didn't expect a reply from you, and put you on the "TO" line
merely as a courtesy.  He could have put out an email about reverting
the patch without you on the email header at all, I guess --- then he
could have referred to you without offending you.

> > How could anybody on the RMT judge what was going on sensibly? There
> > was *zero* information from you (the original committer, our point of
> > contact) about an item that is in a totally unfamiliar part of the
> > code to every other committer. We were effectively forced to make
> > very
> > conservative assumptions about the deadline. I think that it's very
> > likely that this could have been avoided if only you'd engaged to
> > some
> > degree -- if you had said it was a short deadline then we'd likely
> > have taken your word for it, as the relevant subject matter expert
> > and
> > committer in charge of the item. But we were never given that choice.
> 
> The same holds the other way round, I only understood later that you
> wanted more information. Had I known that earlier, I would have gladly
> given them. 

Let me be practical here --- the more someone has to be chased for a
reply, the less confidence they have in that person.  If the RMT
contacts you about something, and obviously they have had to take usual
efforts to contact you, the more it is on you to give a full report and
a timeline of when you will address the issue.  If they had to chase you
around, and you gave them a short answer, the less confidence they have
in this getting resolved in a timely manner.

It is the RMT's responsibility to resolve things in a timely manner, and
since they have contacted you, you should be going out of your way to at
least give them confidence that it will be dealt with by you, rather
than them.  Whether the problem is them not asking for a timeline or you
not offering one, the real solution would have been to provide a
timeline to them when they contacted you, since if the RMT is contacting
you, it is a serious issue.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Mark Dilger

Tom,

I can still trigger the old bug for which we thought we'd pushed a fix.  The 
test case below crashes on master (e12694523e7e4482a052236f12d3d8b58be9a22c), 
and also on the fixed version "Make regexp engine's backref-related compilation 
state more bulletproof." (cb76fbd7ec87e44b3c53165d68dc2747f7e26a9a).

Can you test if it crashes for you, too?  I'm not sure I see why this one fails 
when millions of others pass.

The backtrace is still complaining about regc_nfa.c:1265:

+select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]');
+server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+connection to server was lost

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread David G. Johnston

On Mon, Aug 9, 2021 at 1:38 PM Michael Meskes  wrote:

> > I don't want to upset anybody for any reason. I regret that my words
> > have upset you, but I think that they were misinterpreted in a way
> > that I couldn't possibly have predicted. The particular aspect of
>
> I strongly object to that. It's pretty obvious to me that addressing
> people in third person is very offending.
>

And using the third person to avoid making things personal, and properly
represent one's role as a representative as opposed to an individual, is
something at least two of us consider to be "professional".  If others
taking on a professional/formal tone with you is offending I politely
suggest you need to at least cut them some slack when you haven't informed
them of this previously.  Cultural differences happen in both directions.

> How could anybody on the RMT judge what was going on sensibly? There
> > was *zero* information from you (the original committer, our point of
> > contact) about an item that is in a totally unfamiliar part of the
> > code to every other committer. We were effectively forced to make
> > very
> > conservative assumptions about the deadline. I think that it's very
> > likely that this could have been avoided if only you'd engaged to
> > some
> > degree -- if you had said it was a short deadline then we'd likely
> > have taken your word for it, as the relevant subject matter expert
> > and
> > committer in charge of the item. But we were never given that choice.
>
> The same holds the other way round, I only understood later that you
> wanted more information. Had I known that earlier, I would have gladly
> given them.

There is clearly an expectation from the RMT, and at least myself, that:

"The RMT discussed this recently. We are concerned about this issue,
including how it has been handled so far.

If you're unable to resolve the issue (or at least commit time to
resolving the issue) then the RMT will call for the revert of the
original feature patch."

is expected to elicit a response from the comitter in a timely fashion.
Really, any email sent to -hackers from the RMT about a specific commit; or
even any email sent to -hackers by a core team member, is expected to be
responded to in a timely manner.  These teams should not be getting
involved with the day-to-day operations and being responsive to them is
part of the obligation of being a committer.

In hindsight probably the quoted email above should have been worded.

"If you're unable to resolve the issue, or communicate a timely plan for
doing so, within the next week please revert the patch."

Making it clear that the committer should be the one performing the
revert.  Then, absent feedback or a revert, the second email and the RMT
team performing the revert, would be appropriate.

David J.

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Meskes

Hi,

> FWIW, I don't think that the phrasing of Peter's email is
> disrespectful. As I read it, it simply states that the RMT has made a

As I said before, it might be a cultural difference. What I don't
understand is, that a simple "Hi Michael, this is what the RMT
decided:" would have been sufficient to make this email okay. I take
offense in being addressed in third person only.

> strongly that being a member of the RMT is a pretty thankless task,

That I agree with.

> On the substance of the issue, one question that I have reading over
> this thread is whether there's really a bug here at all. It is being
> represented (and I have not checked whether this is accurate) that
> the
> patch affects the behavior of  DECLARE, PREPARE, and EXECUTE, but not
> DESCRIBE, DEALLOCATE, DECLARE CURSOR .. FOR, or CREATE TABLE AS
> EXECUTE. However, it also seems that it's not entirely clear what the
> behavior ought to be in those cases, except perhaps for DESCRIBE. If
> that's the case, maybe this doesn't really need to be an open item,
> and maybe we don't therefore need to talk about whether it should be
> reverted. Maybe it's just a feature that supports certain things now
> and in the future, after due reflection, perhaps it will support
> more.

The way I see it we should commit the patch that makes more statements
honor the new DECLARE STATEMENT feature. I don't think we can change
anything for the worse by doing that. And other databases are not
really strict about this either.

> I would be interested in hearing your view, and that of others, on
> whether this is really a bug at all.

I think the question is more which version of the patch we commit as it
does increase the functionality of ecpg.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Meskes

> question with a question mark. Despite the fact that it is generally
> understood that "committers own their own items", and that the RMT
> exists as a final check on that.

This does not contradict my opinion, but anyway. 

> Clearly we disagree about this. I don't think that there is anything
> to be gained from discussing this any further, though. I suggest that
> we leave it at that.

Agreed.

> I don't want to upset anybody for any reason. I regret that my words
> have upset you, but I think that they were misinterpreted in a way
> that I couldn't possibly have predicted. The particular aspect of

I strongly object to that. It's pretty obvious to me that addressing
people in third person is very offending.

> last
> Friday's email that you took exception to was actually intended to
> convey that it was not personal. Remember, my whole ethos is to avoid
> strong RMT intervention when possible, to make it impersonal. My
> framing of things had the opposite effect to the one I'd intended,
> ironically.

Let me stress again that the third person part is the bad thing in my
opinion, not the rest of the words.
 
> How could anybody on the RMT judge what was going on sensibly? There
> was *zero* information from you (the original committer, our point of
> contact) about an item that is in a totally unfamiliar part of the
> code to every other committer. We were effectively forced to make
> very
> conservative assumptions about the deadline. I think that it's very
> likely that this could have been avoided if only you'd engaged to
> some
> degree -- if you had said it was a short deadline then we'd likely
> have taken your word for it, as the relevant subject matter expert
> and
> committer in charge of the item. But we were never given that choice.

The same holds the other way round, I only understood later that you
wanted more information. Had I known that earlier, I would have gladly
given them. 

> > Well, you did lay out what the decision would be and I fully agreed
> > with it. So again, what was there to do? Had you asked me if I
> > agreed,
> > I would told you.
> 
> If the patch being reverted was so inconsequential to you that you
> didn't even feel the need to write a brief email about it, why did
> you
> commit it in the first place? I just don't understand this at all.

I'm getting very tired of you accusing me of things I neither said nor
did. Please stop doing that or show me the email where I said the patch
was "inconsequential"? As for writing a brief email, please read all
the other emails in this thread, I've explained myself repeatedly
already.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Robert Haas

On Sat, Aug 7, 2021 at 4:13 AM Michael Meskes  wrote:
> I get it that the goal is to release PostgreSQL 14 and I also get it
> that nobody is interested in the reasons for my slow reaction. I even,
> at least to an extend, understand why nobody tried reaching out to me
> directly. However, what I cannot understand at all is the tone of this
> email. Is this the new way of communication in the PostgreSQL project?
>
> Just to be more precise, I find it highly offensive that you address an
> email only to me (everyone else was on CC) and yet do not even try to
> talk to me, but instead talk about me as a third person. I find this
> very disrespectful.

Hi,

FWIW, I don't think that the phrasing of Peter's email is
disrespectful. As I read it, it simply states that the RMT has made a
decision to revert the patch unless certain assurances are given
before a certain date. I don't expect anyone will particularly like
receiving such an email, because nobody likes to be threatened with a
revert, but I don't think there is anything rude about it. Either you
are willing to commit to resolving the problem by a date that the RMT
finds acceptable, or you aren't. If you are, great. If you aren't, the
patch is going to get reverted. That sucks, but it's nothing against
you personally; it's just what happens sometimes. I also feel rather
strongly that being a member of the RMT is a pretty thankless task,
involving going through a lot of patches that you may not care about
and trying to make decisions that will benefit the project, even while
knowing that some people may not like them. We should give people who
are willing to offer such service the benefit of the doubt.

On the substance of the issue, one question that I have reading over
this thread is whether there's really a bug here at all. It is being
represented (and I have not checked whether this is accurate) that the
patch affects the behavior of  DECLARE, PREPARE, and EXECUTE, but not
DESCRIBE, DEALLOCATE, DECLARE CURSOR .. FOR, or CREATE TABLE AS
EXECUTE. However, it also seems that it's not entirely clear what the
behavior ought to be in those cases, except perhaps for DESCRIBE. If
that's the case, maybe this doesn't really need to be an open item,
and maybe we don't therefore need to talk about whether it should be
reverted. Maybe it's just a feature that supports certain things now
and in the future, after due reflection, perhaps it will support more.

I would be interested in hearing your view, and that of others, on
whether this is really a bug at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

2021-08-09 Thread Greg Sabino Mullane

On Sat, Aug 7, 2021 at 2:01 PM Bossart, Nathan  wrote:

> Here is a rebased version of the patch.
>

Giving this a review.

Patch applies cleanly and `make check` works as of
e12694523e7e4482a052236f12d3d8b58be9a22c

Overall looks very nice and tucks MaxBackends safely away.
I have a few suggestions:

> + size = add_size(size, mul_size(GetMaxBackends(0),
sizeof(BTOneVacInfo)));

The use of GetMaxBackends(0) looks weird - can we add another constant in
there
for the "default" case? Or just have GetMaxBackends() work?

> + GMB_MAX_PREPARED_XACTS = 1 << 0, /* include max_prepared_xacts */
s/include/add in/

>  +typedef enum GMBOption
> +{
> + GMB_MAX_PREPARED_XACTS = 1 << 0, /* include max_prepared_xacts */
> + GMB_NUM_AUXILIARY_PROCS = 1 << 1 /* include NUM_AUXILIARY_PROCS */
> +} GMBOption;

Is a typedef enum really needed? As opposed to something like this style:

#define WL_LATCH_SET (1 << 0)
#define WL_SOCKET_READABLE   (1 << 1)
#define WL_SOCKET_WRITEABLE  (1 << 2)

> -  (MaxBackends + max_prepared_xacts + 1));
> +  (GetMaxBackends(GMB_MAX_PREPARED_XACTS) + 1));

This is a little confusing - there is no indication to the reader that this
is an additive function. Perhaps a little more intuitive name:

> +  (GetMaxBackends(GMB_PLUS_MAX_PREPARED_XACTS) + 1));

However, the more I went through this patch, the more the GetMaxBackends(0)
nagged at me. The vast majority of the calls are with "0". I'd argue for
just having no arguments at all, which removes a bit of code and actually
makes things like this easier to read:

Original change:
> - uint32  TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS +
max_prepared_xacts;
> + uint32  TotalProcs = GetMaxBackends(GMB_NUM_AUXILIARY_PROCS |
GMB_MAX_PREPARED_XACTS);

Versus:
> - uint32  TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS +
max_prepared_xacts;
> + uint32  TotalProcs = GetMaxBackends() + NUM_AUXILIARY_PROCS +
max_prepared_xacts;

> + * This must be called after modules have had the change to alter GUCs in
> + * shared_preload_libraries, and before shared memory size is determined.
s/change/chance/;

> +void
> +SetMaxBackends(int max_backends)
> +{
> + if (MaxBackendsInitialized)
> +  elog(ERROR, "MaxBackends already initialized");

Is this going to get tripped by a call from restore_backend_variables?

Cheers,
Greg

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-09 Thread Alvaro Herrera

Hi

On 2021-Jul-27, Andres Freund wrote:

> Isn't this going to create a *lot* of redundant sampling?  Especially if you
> have any sort of nested partition tree. In the most absurd case a partition
> with n parents will get sampled n times, solely due to changes to itself.

It seems to me that you're barking up the wrong tree on this point.
This problem you describe is not something that was caused by this
patch; ANALYZE has always worked like this.  We have discussed the idea
of avoiding redundant sampling, but it's clear that it is not a simple
problem, and solving it was not in scope for this patch.

> Additionally, while analyzing all child partitions for a partitioned tables
> are AccessShareLock'ed at once. If a partition hierarchy has more than one
> level, it actually is likely that multiple autovacuum workers will end up
> processing the ancestors separately.  This seems like it might contribute to
> lock exhaustion issues with larger partition hierarchies?

I agree this seems a legitimate problem.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: when the startup process doesn't (logging startup delays)

2021-08-09 Thread Robert Haas

On Mon, Aug 9, 2021 at 11:20 AM Nitin Jadhav
 wrote:
> Modified the reset_startup_progress_timeout() to take the second
> parameter which indicates whether it is called for initialization or
> for resetting. Without this parameter there is a problem if we call
> init_startup_progress() more than one time if there is no call to
> ereport_startup_progress() in between as the code related to disabling
> the timer has been removed.

I'd really like to avoid this. I don't see why it's necessary. You say
it causes a problem, but you don't explain what problem it causes.
enable_timeout_at() will disable the timer if not already done. I
think all we need to do is set scheduled_startup_progress_timeout = 0
before calling reset_startup_progress_timeout() in the "init" case and
don't do that for the non-init case. If that's not quite right, maybe
you can work out something that does work. But adding an is_init flag
to a function and having no common code between the is_init = true
case and the is_init = false case is exactly the kind of thing that I
don't want here. I want as much common code as possible.

> > This makes sense, but I think I'd like to have all the functions in
> > this patch use names_like_this() rather than NamesLikeThis().
>
> I have changed all the function names accordingly. But I would like to
> know why it should be names_like_this() as there are many functions
> with the format NamesLikeThis(). I would like to understand when to
> use what, just to incorporate in the future patches.

There is, unfortunately, no hard-and-fast rule, but we want to
maintain as much consistency with the existing style as we can. I
wasn't initially sure what would work best for this particular patch,
but after we committed to a name like ereport_startup_progress() that
to me was a strong hint in favor of using names_like_this()
throughout. It seems impossible to imagine punctuating it as
EreportStartupProgress() or something since that would be wildly
inconsistent with the existing function name, and there seems to be no
good reason why this patch can't be internally consistent.

To some degree, we tend to use names_like_this() for low-level
functions and NamesLikeThis() for higher-level functions, but that is
not a very consistent practice.

> > reset_startup_progress_timeout(TimeStampTz now)
> > {
> >   next_timeout = last_startup_progress_timeout + interval;
> >   if (next_timeout < now)
> >   {
> >  // Either the timeout was processed so late that we missed an entire 
> > cycle,
> >  // or the system clock was set backwards.
> >  next_timeout = now + interval;
> >   }
> >   enable_timeout_at(next_timeout);
> >   last_startup_progress_timeout = next_timeout;
> > }
>
> As per the above logic, I would like to discuss 2 cases.
>
> Case-1:
> First scheduled timeout is after 1 sec. But the time out occurred
> after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is
> scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
> The condition (next_timeout < now) gets evaluated to false and
> everything goes smooth.
>
> Case-2:
> First scheduled timeout is after 1 sec. But the timeout occurred after
> 2.5 sec. So the log msg prints after 2.5 sec. Now next time is
> scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
> So the condition (next_timeout < now) will fail and the next_timeout
> will get reset to 3.5 sec (2.5 + 1) and it continues. Is this
> behaviour ok or should we set the next_timeout to 3 sec. Please share
> your thoughts on this.

I can't quite follow this, because it seems like you are sometimes
viewing the interval as 1 second and sometimes as 2 seconds. Maybe you
could clarify that, and perhaps show example output?

My feeling is that the timer will almost always be slightly late, but
it will very rarely be extremely late, and it will also very rarely be
early (only if someone resets the system clock). So let's consider
those two cases separately. If the timer is a little bit late each
time, we want to avoid drift, so we want to shorten the next sleep
time by the amount that the previous interrupt was late. If the
interval is 1000ms and the interrupt fires 1ms late then we should
sleep 999ms the next time; if 2ms late, 998ms. That way, although
there will be some variation in which the messages are logged, the
drift won't accumulate over time and even after many minutes of
recovery the messages will be printed at ABOUT the same number of
milliseconds after the second every time, instead of drifting further
and further off course.

But this strategy cannot be used if the drift is larger than the
interval. If we are trying to log a message every 1000ms and the timer
doesn't fire for 14ms, we can wait only 986ms the next time. If it
doesn't fire for 140ms, we can wait only 860ms the next time. But if
the timer doesn't fire for 1400ms, we cannot wait for -400ms the next
time. So what should we do? My proposal is to just wait for the
configured interval, 1

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Peter Geoghegan

On Mon, Aug 9, 2021 at 11:45 AM Michael Meskes  wrote:
> If you want me to answer, how about asking a question? Or telling me
> that you'd like some feedback? I don't see how I should know that you
> expect a reply to a simple statement of facts.

I expressed concern in fairly strong terms, and received no answer for
a full week. I consider that "ignoring the RMT", but you take issue
with that characterization because my email didn't ask an explicit
question with a question mark. Despite the fact that it is generally
understood that "committers own their own items", and that the RMT
exists as a final check on that.

Clearly we disagree about this. I don't think that there is anything
to be gained from discussing this any further, though. I suggest that
we leave it at that.

> > Okay, I understand that now.
>
> And? Do you care at all?

I don't want to upset anybody for any reason. I regret that my words
have upset you, but I think that they were misinterpreted in a way
that I couldn't possibly have predicted. The particular aspect of last
Friday's email that you took exception to was actually intended to
convey that it was not personal. Remember, my whole ethos is to avoid
strong RMT intervention when possible, to make it impersonal. My
framing of things had the opposite effect to the one I'd intended,
ironically.

> Sure, I don't question that. Again, I knew about the issue, only
> misjudged it in the beginning. Your email from July 30 did show me that
> it was more urgent but still didn't create the impression that there
> was such a short deadline. In my opinion my prior email already
> explained that I was on it, but couldn't give an estimate.

How could anybody on the RMT judge what was going on sensibly? There
was *zero* information from you (the original committer, our point of
contact) about an item that is in a totally unfamiliar part of the
code to every other committer. We were effectively forced to make very
conservative assumptions about the deadline. I think that it's very
likely that this could have been avoided if only you'd engaged to some
degree -- if you had said it was a short deadline then we'd likely
have taken your word for it, as the relevant subject matter expert and
committer in charge of the item. But we were never given that choice.

> Well, you did lay out what the decision would be and I fully agreed
> with it. So again, what was there to do? Had you asked me if I agreed,
> I would told you.

If the patch being reverted was so inconsequential to you that you
didn't even feel the need to write a brief email about it, why did you
commit it in the first place? I just don't understand this at all.

-- 
Peter Geoghegan

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-09 Thread Mark Dilger



> On Jul 20, 2021, at 11:28 AM, Tomas Vondra  
> wrote:
> 
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> <0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch>

Hi Tomas,

I tested this patch against master looking for types of clauses that uniformly 
get worse with the patch applied.  I found some.

The tests are too large to attach, but the scripts that generate them are not.  
To perform the tests:

git checkout master
perl ./gentest.pl > src/test/regress/sql/gentest.sql
cat /dev/null > src/test/regress/expected/gentest.out
echo "test: gentest" >> src/test/regress/parallel_schedule
./configure && make && make check
cp src/test/regress/results/gentest.out 
src/test/regress/expected/gentest.out
patch -p 1 < 
0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch
make check
cat src/test/regress/regression.diffs | perl ./check.pl

This shows patterns of conditions that get worse, such as:

better:0, worse:80:  A < B and A <> A or not A < A
better:0, worse:80:  A < B and not A <= A or A <= A
better:0, worse:80:  A < B or A = A
better:0, worse:80:  A < B or A = A or not A >= A
better:0, worse:80:  A < B or A >= A
better:0, worse:80:  A < B or A >= A and not A <> A
better:0, worse:80:  A < B or not A < A
better:0, worse:80:  A < B or not A <> A
better:0, worse:80:  A < B or not A <> A or A <= A
better:0, worse:80:  A < B or not A >= A or not A < A

It seems things get worse when the conditions contain a column compared against 
itself.  I suspect that is being handled incorrectly.

#!/usr/bin/perl

use strict;
use warnings;

my ($query, $where, $before, $after, @before, @after);
my $better = 0;
my $worse = 0;
my %where = ();
my %gripe;
while (<>)
{
 	if (/from check_estimated_rows\('(.*)'\)/)
	{
		$query = $1;
		if ($query =~ m/where (.*)$/)
		{
			$where = $1;
			my %columns = map { $_ => 1 } ($where =~ m/(column_\d+)/g);
			my @normal = ('A'..'Z');
			my @columns = sort keys %columns;
			for my $i (0..$#columns)
			{
my $old = $columns[$i];
my $new = $normal[$i];
$where =~ s/\b$old\b/$new/g;
			}
		}
	}
	elsif (m/^-(\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+))\s*$/)
	{
		($before, @before) = ($1, $2, $3, $4);
	}
	elsif (m/^\+(\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+))\s*$/)
	{
		($after, @after) = ($1, $2, $3, $4);

		$where{$where}->{better} ||= 0;
		$where{$where}->{worse} ||= 0;

		# Don't count the difference as meaningful unless we're more than 5 better or worse than before
		if ($after[2] > 5 + $before[2])
		{
			$worse++;
			$where{$where}->{worse}++;
		}
		elsif ($after[2] + 5 < $before[2])
		{
			$better++;
			$where{$where}->{better}++;
		}

		if (!exists $gripe{$where} && $where{$where}->{better} == 0 && $where{$where}->{worse} > 50)
		{
			print "TERRIBLE:\n";
			print "\tQUERY: $query\n";
			print "\tBEFORE: $before\n";
			print "\tAFTER:  $after\n\n";
			$gripe{$query} = 1;
		}
	}
}

foreach my $where (sort keys %where)
{
	if ($where{$where}->{better} < $where{$where}->{worse})
	{
		print("better:", $where{$where}->{better}, ", worse:", $where{$where}->{worse}, ":  $where\n");
	}
}
print("\n\nTOTAL:\n");
print("\tbetter: $better\n");
print("\tworse: $worse\n");
#!/usr/bin/perl

use strict;
use warnings;

our ($tblnum, $tblname, $colnum, $colname);

# Generate the where clauses to be used on all tables
our (%wherepattern1, %wherepattern2, %wherepattern3);
my @ops = (">", "<", ">=", "<=", "=", "<>");
my @conj = ("and", "or", "and not", "or not");
for (1..100)
{
	my $op1 = $ops[int(rand(@ops))];
	my $op2 = $ops[int(rand(@ops))];
	my $op3 = $ops[int(rand(@ops))];
	my $conj1 = $conj[int(rand(@conj))];
	my $conj2 = $conj[int(rand(@conj))];

	$wherepattern1{"\%s $op1 \%s"} = 1;
	$wherepattern2{"\%s $op1 \%s $conj1 \%s $op2 \%s"} = 1;
	$wherepattern3{"\%s $op1 \%s $conj1 \%s $op2 \%s $conj2 \%s $op3 \%s"} = 1;
}

sub next_table {
	$tblnum++;
	$tblname = "table_$tblnum";
	$colnum = 0;
	$colname = "column_$colnum";
}

sub next_column {
	$colnum++;
	$colname = "column_$colnum";
}

for my $colcnt (2..10)
{
	next_table();
	print("CREATE TABLE $tblname (\n");
	for (1..$colcnt-1)
	{
		next_column();
		print("\t$colname INTEGER,\n");
	}
	next_column();
	print("\t$colname INTEGER\n);\n");
	print("INSERT INTO $tblname (SELECT ",
		  join(", ", map { "gs-$_" } (1..$colcnt)),
		  " FROM generate_series(1,100) gs);\n");
	print("VACUUM FREEZE $tblname;\n");
	for my $colmax (2..$colcnt)
	{
		print("CREATE STATISTICS ${tblname}_stats_${colmax} ON ",
			  join(", ", map { "column_$_" } (1..$colmax)),
			  " FROM $tblname;\n");
	}
	print("ANALYZE $tblname;\n");
}

# Restart the table sequence
$tblnum = 0;

for my $colcnt (2..10)
{
	next_table();

	for (1..100)
	{
		my $a = sprintf("column_%d", 1+int(rand($colcnt)));
		my $b = sprintf("column_%d", 1+int(rand($colcnt)));
		my $c = sprintf("column_%d", 1+int(rand($colcnt)));

		foreach my $where1 (keys %wherepattern1)
		{
			m

Re: Another regexp performance improvement: skip useless paren-captures

2021-08-09 Thread Tom Lane

Mark Dilger  writes:
> The patch looks ready to commit.  I don't expect to test it any further 
> unless you have something in particular you'd like me to focus on.

Pushed, but while re-reading it before commit I noticed that there's
some more fairly low-hanging fruit in regexp_replace().  As I had it
in that patch, it never used REG_NOSUB because of the possibility
that the replacement string uses "\N".  However, we're already
pre-checking the replacement string to see if it has backslashes
at all, so while we're at it we can check for \N to discover if we
actually need any subexpression match data or not.  We do need to
refactor a little to postpone calling pg_regcomp until after we
know that, but I think that makes replace_text_regexp's API less
ugly not more so.

While I was at it, I changed the search-for-backslash loops to
use memchr rather than handwritten looping.  Their use of
pg_mblen was pretty unnecessary given we only need to find
backslashes, and we can assume the backend encoding is ASCII-safe.

Using a bunch of random cases generated by your little perl
script, I see maybe 10-15% speedup on test cases that don't
use \N in the replacement string, while it's about a wash
on cases that do.  (If I'd been using a multibyte encoding,
maybe the memchr change would have made a difference, but
I didn't try that.)

regards, tom lane

diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 268cee1cbe..3b7a607437 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -630,11 +630,10 @@ textregexreplace_noopt(PG_FUNCTION_ARGS)
 	text	   *s = PG_GETARG_TEXT_PP(0);
 	text	   *p = PG_GETARG_TEXT_PP(1);
 	text	   *r = PG_GETARG_TEXT_PP(2);
-	regex_t*re;
-
-	re = RE_compile_and_cache(p, REG_ADVANCED, PG_GET_COLLATION());
 
-	PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, 0, 1));
+	PG_RETURN_TEXT_P(replace_text_regexp(s, p, r,
+		 REG_ADVANCED, PG_GET_COLLATION(),
+		 0, 1));
 }
 
 /*
@@ -648,7 +647,6 @@ textregexreplace(PG_FUNCTION_ARGS)
 	text	   *p = PG_GETARG_TEXT_PP(1);
 	text	   *r = PG_GETARG_TEXT_PP(2);
 	text	   *opt = PG_GETARG_TEXT_PP(3);
-	regex_t*re;
 	pg_re_flags flags;
 
 	/*
@@ -672,10 +670,9 @@ textregexreplace(PG_FUNCTION_ARGS)
 
 	parse_re_flags(&flags, opt);
 
-	re = RE_compile_and_cache(p, flags.cflags, PG_GET_COLLATION());
-
-	PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, 0,
-		 flags.glob ? 0 : 1));
+	PG_RETURN_TEXT_P(replace_text_regexp(s, p, r,
+		 flags.cflags, PG_GET_COLLATION(),
+		 0, flags.glob ? 0 : 1));
 }
 
 /*
@@ -694,7 +691,6 @@ textregexreplace_extended(PG_FUNCTION_ARGS)
 	int			n = 1;
 	text	   *flags = PG_GETARG_TEXT_PP_IF_EXISTS(5);
 	pg_re_flags re_flags;
-	regex_t*re;
 
 	/* Collect optional parameters */
 	if (PG_NARGS() > 3)
@@ -723,11 +719,10 @@ textregexreplace_extended(PG_FUNCTION_ARGS)
 	if (PG_NARGS() <= 4)
 		n = re_flags.glob ? 0 : 1;
 
-	/* Compile the regular expression */
-	re = RE_compile_and_cache(p, re_flags.cflags, PG_GET_COLLATION());
-
 	/* Do the replacement(s) */
-	PG_RETURN_TEXT_P(replace_text_regexp(s, (void *) re, r, start - 1, n));
+	PG_RETURN_TEXT_P(replace_text_regexp(s, p, r,
+		 re_flags.cflags, PG_GET_COLLATION(),
+		 start - 1, n));
 }
 
 /* This is separate to keep the opr_sanity regression test from complaining */
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 348b5566de..acb8741734 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4359,34 +4359,36 @@ replace_text(PG_FUNCTION_ARGS)
 }
 
 /*
- * check_replace_text_has_escape_char
+ * check_replace_text_has_escape
  *
- * check whether replace_text contains escape char.
+ * Returns 0 if text contains no backslashes that need processing.
+ * Returns 1 if text contains backslashes, but not regexp submatch specifiers.
+ * Returns 2 if text contains regexp submatch specifiers (\1 .. \9).
  */
-static bool
-check_replace_text_has_escape_char(const text *replace_text)
+static int
+check_replace_text_has_escape(const text *replace_text)
 {
+	int			result = 0;
 	const char *p = VARDATA_ANY(replace_text);
 	const char *p_end = p + VARSIZE_ANY_EXHDR(replace_text);
 
-	if (pg_database_encoding_max_length() == 1)
-	{
-		for (; p < p_end; p++)
-		{
-			if (*p == '\\')
-return true;
-		}
-	}
-	else
+	while (p < p_end)
 	{
-		for (; p < p_end; p += pg_mblen(p))
+		/* Find next escape char, if any. */
+		p = memchr(p, '\\', p_end - p);
+		if (p == NULL)
+			break;
+		p++;
+		/* Note: a backslash at the end doesn't require extra processing. */
+		if (p < p_end)
 		{
-			if (*p == '\\')
-return true;
+			if (*p >= '1' && *p <= '9')
+return 2;		/* Found a submatch specifier, so done */
+			result = 1;			/* Found some other sequence, keep looking */
+			p++;
 		}
 	}
-
-	return false;
+	return result;
 }
 
 /*
@@ -4403,25 +4405,17 @@ appendStringInfoRegexpSub

using an end-of-recovery record in all cases

2021-08-09 Thread Robert Haas

On Thu, Jul 29, 2021 at 11:49 AM Robert Haas  wrote:
> On Wed, Jul 28, 2021 at 3:28 PM Andres Freund  wrote:
> > > Imagine if instead of
> > > all the hairy logic we have now we just replaced this whole if
> > > (IsInRecovery) stanza with this:
> > >
> > > if (InRecovery)
> > > CreateEndOfRecoveryRecord();
> > >
> > > That would be WAY easier to reason about than the rat's nest we have
> > > here today. Now, I am not sure what it would take to get there, but I
> > > think that is the direction we ought to be heading.
> >
> > What are we going to do in the single user ([1]) case in this awesome 
> > future?
> > I guess we could just not create a checkpoint until single user mode is shut
> > down / creates a checkpoint for other reasons?
>
> It probably depends on who writes this thus-far-hypothetical patch,
> but my thought is that we'd unconditionally request a checkpoint after
> writing the end-of-recovery record, same as we do now if (promoted).
> If we happened to be in single-user mode, then that checkpoint request
> would turn into performing a checkpoint on the spot in the one and
> only process we've got, but StartupXLOG() wouldn't really need to care
> what happens under the hood after it called RequestCheckpoint().

I decided to try writing a patch to use an end-of-recovery record
rather than a checkpoint record in all cases. The patch itself was
pretty simple but didn't work. There are two different reasons why it
didn't work, which I'll explain in a minute. I'm not sure whether
there are any other problems; these are the only two things that cause
problems with 'make check-world', but that's hardly a guarantee of
anything. Anyway, I thought it would be useful to report these issues
first and hopefully get some feedback.

The first problem I hit was that GetRunningTransactionData() does
Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)).
While that does seem like a superficially reasonable thing to assert,
StartupXLOG() initializes latestCompletedXid by calling
TransactionIdRetreat on the value extracted from checkPoint.nextXid,
and the first-ever checkpoint that is written at the beginning of WAL
has a nextXid of 3, so when starting up from that checkpoint nextXid
is 2, which is not a normal XID. When we try to create the next
checkpoint, CreateCheckPoint() does LogStandbySnapshot() which calls
GetRunningTransactionData() and the assertion fails. In the current
code, we avoid this more or less accidentally because
LogStandbySnapshot() is skipped when starting from a shutdown
checkpoint. If we write an end-of-recovery record and then trigger a
checkpoint afterwards, then we don't avoid the problem. Although I'm
impishly tempted to suggest that we #define SecondNormalTransactionId
4 and then use that to create the first checkpoint instead of
FirstNormalTransactionId -- naturally with no comments explaining why
we're doing it -- I think the real problem is that the assertion is
just wrong. CurrentRunningXacts->latestCompletedXid shouldn't be
InvalidTransactionId or BootstrapTransactionId, but
FrozenTransactionId is a legal, if corner-case, value, at least as the
code stands today.

Unfortunately we can't just relax the assertion, because the
XLOG_RUNNING_XACTS record will eventually be handed to
ProcArrayApplyRecoveryInfo() for processing ... and that function
contains a matching assertion which would in turn fail. It in turn
passes the value to MaintainLatestCompletedXidRecovery() which
contains yet another matching assertion, so the restriction to normal
XIDs here looks pretty deliberate. There are no comments, though, so
the reader is left to guess why. I see one problem:
MaintainLatestCompletedXidRecovery uses FullXidRelativeTo, which
expects a normal XID. Perhaps it's best to just dodge the entire issue
by skipping LogStandbySnapshot() if latestCompletedXid happens to be
2, but that feels like a hack, because AFAICS the real problem is that
StartupXLog() doesn't agree with the rest of the code on whether 2 is
a legal case, and maybe we ought to be storing a value that doesn't
need to be computed via TransactionIdRetreat().

The second problem I hit was a preexisting bug where, under
wal_level=minimal, redo of a "create tablespace" record can blow away
data of which we have no other copy. See
http://postgr.es/m/CA+TgmoaLO9ncuwvr2nN-J4VEP5XyAcy=zkihxqzbbfrxxgx...@mail.gmail.com
for details. That bug happens to make src/test/recovery's
018_wal_optimize.pl fail one of the tests, because the test starts and
stops the server repeatedly, with the result that with the attached
patch, we just keep writing end-of-recovery records and never getting
time to finish a checkpoint before the next shutdown, so every test
replays the CREATE TABLESPACE record and everything that previous
tests did. The "wal_level = minimal, SET TABLESPACE commit
subtransaction" fails because it's the only one that (1) uses the
tablespace for a new table, (2) commits, and (3) runs before a
checkpoint is ma

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Meskes

> My email of July 30 was itself pretty strongly worded, but went
> unanswered for a full week. Not even a token response. If that
> doesn't
> count as "ignoring the RMT", then what does? How much time has to
> pass
> before radio silence begins to count as "ignoring the RMT", in your
> view of things? A month? More?

If you want me to answer, how about asking a question? Or telling me
that you'd like some feedback? I don't see how I should know that you
expect a reply to a simple statement of facts.

> Okay, I understand that now.

And? Do you care at all?

> As Andrew pointed out, there is a general expectation that committers
> take care of their own open items. That doesn't mean that they are
> obligated to personally fix bugs. Just that the final responsibility
> to make sure that the issue is resolved rests with the committer.
> This
> is one of the few hard rules that we have.

Sure, I don't question that. Again, I knew about the issue, only
misjudged it in the beginning. Your email from July 30 did show me that
it was more urgent but still didn't create the impression that there
was such a short deadline. In my opinion my prior email already
explained that I was on it, but couldn't give an estimate.

> As I've said before, RMT-driven revert is something that I see as an
> option of last resort. The RMT charter doesn't go quite that far, but
> I'd argue that my interpretation is quite natural given how committer
> responsibility works in general. In other words, I personally believe
> that our bottom-up approach is on balance a good one, and should be
> preserved.

Fair enough, to me a revert is a revert, no matter who issues it.

> Maybe the issue is muddied by the fact that we each have different
> views of the community process, at a high level (I'm unsure). Unlike
> you, I don't believe that RMT-driven revert is "a reasonable
> procedure". I myself see the RMT's power to resolve open items in
> this
> way as a necessary evil. Something to be avoided at all costs. Why
> should people that don't know anything about ecpg make decisions
> about
> ecpg? In general they should not.

Well, you did lay out what the decision would be and I fully agreed
with it. So again, what was there to do? Had you asked me if I agreed,
I would told you. 

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Peter Geoghegan

On Mon, Aug 9, 2021 at 12:10 AM Michael Meskes  wrote:
> How do you know I didn't spend 15 minutes looking at the patch and the
> whole email thread? I surely did and it was more than 15 minutes, but
> not enough to give a meaningful comment. If you can do it in 15
> minutes, great for you, I cannot.

That was just an example of a token response. I don't know anything about ecpg.

> Besides, I have not ignored the RMT. I don't know why you keep
> insisting on something that is simply not true.

My email of July 30 was itself pretty strongly worded, but went
unanswered for a full week. Not even a token response. If that doesn't
count as "ignoring the RMT", then what does? How much time has to pass
before radio silence begins to count as "ignoring the RMT", in your
view of things? A month? More?

> At the risk of repeating myself, my concern is *only* the rude and
> disrespectful way of addressing me in the third person while talking to
> me directly. Again, I though I made that clear in my first email about
> the topic and every one that followed.

Okay, I understand that now.

> I was *never* asked for *any* response in *any* email, save the
> original technical discussion, which I did answer with telling people
> that I'm lacking time but will eventually get to it. Just to be
> precise, your email from July 30 told me and everyone how this will be
> handled. A reasonable procedure, albeit not one we'd like to see
> happen. But why should I answer and what? It's not that you bring this
> up as a discussion point, but as a fact.

As Andrew pointed out, there is a general expectation that committers
take care of their own open items. That doesn't mean that they are
obligated to personally fix bugs. Just that the final responsibility
to make sure that the issue is resolved rests with the committer. This
is one of the few hard rules that we have.

As I've said before, RMT-driven revert is something that I see as an
option of last resort. The RMT charter doesn't go quite that far, but
I'd argue that my interpretation is quite natural given how committer
responsibility works in general. In other words, I personally believe
that our bottom-up approach is on balance a good one, and should be
preserved.

Maybe the issue is muddied by the fact that we each have different
views of the community process, at a high level (I'm unsure). Unlike
you, I don't believe that RMT-driven revert is "a reasonable
procedure". I myself see the RMT's power to resolve open items in this
way as a necessary evil. Something to be avoided at all costs. Why
should people that don't know anything about ecpg make decisions about
ecpg? In general they should not.

-- 
Peter Geoghegan

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Meskes



Dear Kuroda-san,

> I perfectly missed mails and 8/9 was a national holiday.
> I must apologize about anything. Very sorry for inconvenience.

No need to, non of it is your fault.

> I summarize the thread.

Thank you so much, this is very, very helpful.

> As you might know DECLARE STATEMENT has been added from PG14, but I
> found that connection-control feature cannot be used for DEALLOCATE
> and DESCRIBE statement (More details, please see patches or ask me).
> But we have a question - what statement should use the associated
> connection? Obviously DEALLOCATE statement should follow the linked
> connection because the statement uses only one sql identifier. In
> DESCRIBE or any other descriptor-related statements, however, I think
> it is non-obvious because they have also descriptor-name. Currently I
> made patches that includes about DESCRIBE, but I'm wondering this
> approach is correct. I want you to ask your opinion about the scope
> of
> DECLARE STATEMENT.

I've been reading through quite a few documents to come up with a good
reason one way or the other, but as you already pointed out yourself,
other database systems seem to not be consequent about the usage
either. 

Unfortunately I didn't find my copy of the SQL standard. But then I
kind of doubt it has much to say about this particular issue.

Anyway, I'm currently leaning towards including DESCRIBE in the list of
statements that are influenced by DECLARE STATEMENT. My reason is that
DESCRIBE can be issued with an AT clause already, regardless whether it
makes sense or not. Or in other words, if we allow it to get a
connection name one way, why should we forbid the other way. To me this
seems to be more confusing than the current situation.

The alternative would be to forbid using an AT clause with DESCRIBE,
which would definitely be a compatibility change, although, again, not
a functional one.

> Coding is not hard hence I think we can fix this until the end of
> Sep.
> if we set a policy correctly and have reviewers.

Fully agreed. That's why a short glance at the patch didn't suffice to
answer this. I didn't see any issues with the patch so far. Please send
it to me once its finished (or is it already?) and I'll give it a run,
too.

Hopefully I caught up on all emails and didn't miss parts of the
thread.

Thanks,
Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

2021-08-09 Thread Andres Freund

Hi,

On 2021-08-09 13:54:25 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2021-08-09 13:43:03 -0400, Robert Haas wrote:
> >> On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera  
> >> wrote:
> > How common is to get a failure?  I know I've run tests under
> > EXEC_BACKEND and not seen any failures.  Not many runs though.
> 
> > I get check-world failures in about 1/2-1/3 of the runs, and a plain check
> > fails in maybe 1/4 of the cases. It's pretty annoying because it often isn't
> > trivial to distinguish whether I've broken something or whether it's
> > randomization related...
> 
> I don't have numbers, but I do know that on Linux EXEC_BACKEND builds fail
> often enough to be annoying if you don't disable ASLR.  If we can do
> something not-too-invasive about that, it'd be great.

I now not sure if personality(NO_RANDOMIZE) in postmaster is actually
sufficient. It does seem to drastically reduce the frequency of issues, but
there's still conceivable failures where postmaster's layout would class with
the non-randomized children - obviously personality(NO_RANDOMIZE) cannot
retroactively change the layout of the already running binary.

So maybe we should put it into pg_ctl?

Greetings,

Andres Freund

Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2021-08-09 Thread Dagfinn Ilmari Mannsåker

Hi Suraj,

Suraj Khamkar  writes:

> Hello Dagfinn,
>
> I had a look at your patch and below are my review comments.
> Please correct me if I am missing something.
>
>1. For me the patch does not apply cleanly. I have been facing the error
>of trailing whitespaces.

I do not get these errors, neither with the patch file I still have
locally, or by saving the attachment from my original email.  Are you
sure something in your download process hasn't converted it to Windows
line endings (\r\n), or otherwise mangled the whitespace?

>2. We can remove space in before \ and below change
>+" UNION ALL SELECT 'PUBLIC'" \
>
>Should be,
>+" UNION ALL SELECT 'PUBLIC' "\

The patch doesn't add any lines that end with quote-space-backslash.
As for the space before the quote, that is not necessary either, since the
next line starts with a space after the quote.  Either way, the updated
version of the patch doesn't add any new lines with continuation after a
string constant, so the point is moot.

>3. role_specification has CURRENT_ROLE, CURRENT_USER and SESSION_USER.
>But current changes are missing CURRENT_ROLE.

Ah, I was looking at the documentation for 13, but CURRENT_ROLE is only
allowed in this context as of 14.  Fixed.

>4. I'm not sure about this but do we need to enable tab completion for IF
>NOT EXIST?
>
>5. I think we are not handling IF NOT EXIST that's why it's not
>completing tab completion
>for AUTHORIZATION.

As you note, psql currently doesn't currently tab-complete IF NOT EXISTS
for any command, so that would be a subject for a separate patch.

>6. As we are here we can also enable missing tab completion for ALTER
>SCHEMA.
>After OWNER TO we should also get CURRENT_ROLE, CURRENT_USER and
>SESSION_USER.

I did an audit of all the uses of Query_for_list_of_roles, and there
turned out be several more that accept CURRENT_ROLE, CURRENT_USER and
SESSION_USER that they weren't tab-completed for.  I also renamed the
constant to Query_for_list_of_owner_roles, but I'm not 100% happy with
that name either.

>7. Similarly, as we can drop multiple schemas' simultaneously, we should
>enable tab completion for
>comma with CASCADE and RESTRICT
>postgres@53724=#DROP SCHEMA sch
>CASCADE   RESTRICT

The tab completion code for DROP is generic for all object types (see
the words_after_create array and the create_or_drop_command_generator
function), so that should be done genericallly, and is thus outside the
scope for this patch.

> Thanks.

Thanks for the review.  Updated patch attached, with the CURRENT/SESSION
ROLE/USER changes for other commands separated out.

- ilmari

>From 68ebc42bf142f843dd3bf5741e085ec0397c8e0b Mon Sep 17 00:00:00 2001
From: tanghy 
Date: Mon, 9 Aug 2021 18:42:12 +0100
Subject: [PATCH v3 1/2] Tab-complete CURRENT_ROLE, CURRENT_USER and
 SESSION_USER everywhere

All commands that manipulate ownership let you specify the current or
session user by the above aliases, but the tab completion code had
only got that memo in some places.
---
 src/bin/psql/tab-complete.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 064892bade..4ddc8134a0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -758,15 +758,16 @@ static const SchemaQuery Query_for_list_of_collations = {
 "   FROM pg_catalog.pg_roles "\
 "  WHERE substring(pg_catalog.quote_ident(rolname),1,%d)='%s'"

-#define Query_for_list_of_grant_roles \
-" SELECT pg_catalog.quote_ident(rolname) "\
-"   FROM pg_catalog.pg_roles "\
-"  WHERE substring(pg_catalog.quote_ident(rolname),1,%d)='%s'"\
-" UNION ALL SELECT 'PUBLIC'"\
+#define Query_for_list_of_owner_roles \
+Query_for_list_of_roles \
 " UNION ALL SELECT 'CURRENT_ROLE'"\
 " UNION ALL SELECT 'CURRENT_USER'"\
 " UNION ALL SELECT 'SESSION_USER'"

+#define Query_for_list_of_grant_roles \
+Query_for_list_of_owner_roles \
+" UNION ALL SELECT 'PUBLIC'"
+
 /* the silly-looking length condition is just to eat up the current word */
 #define Query_for_index_of_table \
 "SELECT pg_catalog.quote_ident(c2.relname) "\
@@ -1613,7 +1614,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("SET TABLESPACE", "OWNED BY");
 	/* ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED BY */
 	else if (TailMatches("ALL", "IN", "TABLESPACE", MatchAny, "OWNED", "BY"))
-		COMPLETE_WITH_QUERY(Query_for_list_of_roles);
+		COMPLETE_WITH_QUERY(Query_for_list_of_owner_roles);
 	/* ALTER TABLE,INDEX,MATERIALIZED VIEW ALL IN TABLESPACE xxx OWNED BY xxx */
 	else if (TailMatches("ALL", "IN", "TABLESPACE", MatchAny, "OWNED", "BY", MatchAny))
 		COMPLETE_WITH("SET TABLESPACE");
@@ -2257,7 +2258,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("USER");
 	/* complete ALTER GROUP  ADD|DROP USER with a user name */
 	else if (Matches("ALTER", "GROUP", Ma

Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

2021-08-09 Thread Tom Lane

Andres Freund  writes:
> On 2021-08-09 13:43:03 -0400, Robert Haas wrote:
>> On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera  
>> wrote:
> How common is to get a failure?  I know I've run tests under
> EXEC_BACKEND and not seen any failures.  Not many runs though.

> I get check-world failures in about 1/2-1/3 of the runs, and a plain check
> fails in maybe 1/4 of the cases. It's pretty annoying because it often isn't
> trivial to distinguish whether I've broken something or whether it's
> randomization related...

I don't have numbers, but I do know that on Linux EXEC_BACKEND builds fail
often enough to be annoying if you don't disable ASLR.  If we can do
something not-too-invasive about that, it'd be great.

regards, tom lane

Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

2021-08-09 Thread Andres Freund

Hi,

On 2021-08-09 13:43:03 -0400, Robert Haas wrote:
> On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera  wrote:
> > How common is to get a failure?  I know I've run tests under
> > EXEC_BACKEND and not seen any failures.  Not many runs though.

I get check-world failures in about 1/2-1/3 of the runs, and a plain check
fails in maybe 1/4 of the cases. It's pretty annoying because it often isn't
trivial to distinguish whether I've broken something or whether it's
randomization related...

The frequency of failures isn't stable over time, making it a bit harder to
know what's going on.

> On macOS, failures are extremely common. Sometimes I have to run
> simple tests many times to get even one success. The proposal on the
> table won't help with that problem since it's Linux-specific, but if
> there's any way to do something similar on macOS it would be a _huge_
> help.

There does seem to be a way of doing so, because I found blog posts talking
about how to get e.g. lldb to *enable* ASLR, which it disables by default...

Greetings,

Andres Freund

Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

2021-08-09 Thread Robert Haas

On Mon, Aug 9, 2021 at 1:30 PM Alvaro Herrera  wrote:
> How common is to get a failure?  I know I've run tests under
> EXEC_BACKEND and not seen any failures.  Not many runs though.

On macOS, failures are extremely common. Sometimes I have to run
simple tests many times to get even one success. The proposal on the
table won't help with that problem since it's Linux-specific, but if
there's any way to do something similar on macOS it would be a _huge_
help.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

2021-08-09 Thread Alvaro Herrera

On 2021-Aug-06, Andrew Dunstan wrote:

> On 8/5/21 11:29 PM, Andres Freund wrote:

> > I was wondering if we should have postmaster do 
> > personality(ADDR_NO_RANDOMIZE)
> > for EXEC_BACKEND builds? It seems nicer to make it automatically work than
> > have people remember that they need to call "setarch --addr-no-randomize 
> > make check".

How common is to get a failure?  I know I've run tests under
EXEC_BACKEND and not seen any failures.  Not many runs though.

> > Not that it actually matters for EXEC_BACKEND, but theoretically doing
> > personality(ADDR_NO_RANDOMIZE) in postmaster is a tad more secure than doing
> > it via setarch, as in the personality() case postmaster's layout itself is
> > still randomized...

True.  I think the security aspect is not critically important, since
hopefully nobody should be using such builds for production.


> (Thinks: do we have non-Windows buildfarm members doing EXEC_BACKEND?)

culicidae does that.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)

replay of CREATE TABLESPACE eats data at wal_level=minimal

2021-08-09 Thread Robert Haas

To reproduce, initialize a cluster with wal_level=minimal and
max_wal_senders=0. Then from psql:

\! mkdir /tmp/goose

CHECKPOINT;
CREATE TABLESPACE goose LOCATION '/tmp/goose';
SET wal_skip_threshold=0;
BEGIN;
CREATE TABLE wild (a int, b text) TABLESPACE goose;
INSERT INTO wild VALUES (1, 'chase');
COMMIT;
SELECT * FROM wild;

As expected, you will see one row in table 'wild'. Now perform an
immediate shutdown. Restart the server. Table 'wild' is now empty. The
problem appears to be that tblspc_redo() calls
create_tablespace_directories(), which says:

/*
 * Our theory for replaying a CREATE is to forcibly drop the target
 * subdirectory if present, and then recreate it. This may be more
 * work than needed, but it is simple to implement.
 */

Unfortunately, this theory (which dates to
c86f467d18aa58e18fd85b560b46d8de014e6017, vintage 2010, by Bruce) is
correct only with wal_level>minimal. At wal_level='minimal', we can
replay the record to recreate the relfilenode, but not the records
that would have created the contents. However, note that if the table
is smaller than wal_skip_threshold, then we'll log full-page images of
the contents at commit time even at wal_level='minimal' after which we
have no problem. As far as I can see, this bug has "always" existed,
but before c6b92041d38512a4176ed76ad06f713d2e6c01a8 (2020, Noah) you
would have needed a different test case. Specifically, you would have
needed to use COPY to put the row in the table, and you would have
needed to omit setting wal_skip_threshold since it didn't exist yet.

I don't presently have a specific idea about how to fix this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread kuroda.hay...@fujitsu.com

Dear Horiguchi-san,

> How Pro*C behaves in that case? If the second command ends with an
> error, I think we are free to error out the second command before
> execution. If it works... do you know what is happening at the time?

You asked that how Oracle works when the followings precompiled and
executed, don't it?
> > > EXEC SQL AT conn1 DECLARE stmt STATEMENT;
> > > EXEC SQL AT conn2 EXECUTE stmt INTO ..;

 While precompiling, it does not throw any errors. While executing,
the second statement will execute at conn1 without warnings.
So the added warning is postgres-extended.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread kuroda.hay...@fujitsu.com

Dear Wang,

Good reporting!

> I'm not sure, but how about modify the ecpg.trailer:
> > statement: ecpgstart at toplevel_stmt ';' { connection = NULL; }
> > | ecpgstart toplevel_stmt ';'
> I think there are lots of 'connection = NULL;' in source, and we should find a
good location to reset the connection.

Thank you for giving a solution! I will consider the idea and
integrate it if it's OK.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread kuroda.hay...@fujitsu.com

Dear Hackers,

I perfectly missed mails and 8/9 was a national holiday.
I must apologize about anything. Very sorry for inconvenience.

> The RMT's first responsibility is to ensure that PostgreSQL 14 is
> released on schedule. We would prefer to avoid a revert, but we cannot
> allow this to drag on indefinitely.

Of cause I will try to avoid it but I can understand doing your business.

Dear Meskes,

I summarize the thread.
As you might know DECLARE STATEMENT has been added from PG14, but I
found that connection-control feature cannot be used for DEALLOCATE
and DESCRIBE statement (More details, please see patches or ask me).
But we have a question - what statement should use the associated
connection? Obviously DEALLOCATE statement should follow the linked
connection because the statement uses only one sql identifier. In
DESCRIBE or any other descriptor-related statements, however, I think
it is non-obvious because they have also descriptor-name. Currently I
made patches that includes about DESCRIBE, but I'm wondering this
approach is correct. I want you to ask your opinion about the scope of
DECLARE STATEMENT.
Coding is not hard hence I think we can fix this until the end of Sep.
if we set a policy correctly and have reviewers.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Added schema level support for publication.

2021-08-09 Thread Mark Dilger

> On Aug 6, 2021, at 1:32 AM, vignesh C  wrote:
> 
> the attached v19 patch

With v19 applied, a schema owner can publish the contents of a table regardless 
of ownership or permissions on that table:

+CREATE ROLE user1;
+GRANT CREATE ON DATABASE regression TO user1;
+CREATE ROLE user2;
+GRANT CREATE ON DATABASE regression TO user2;
+SET SESSION AUTHORIZATION user1;
+CREATE SCHEMA user1schema;
+GRANT CREATE, USAGE ON SCHEMA user1schema TO user2;
+RESET SESSION AUTHORIZATION;
+SET SESSION AUTHORIZATION user2;
+CREATE TABLE user1schema.user2private (junk text);
+REVOKE ALL PRIVILEGES ON user1schema.user2private FROM PUBLIC;
+REVOKE ALL PRIVILEGES ON user1schema.user2private FROM user1;
+CREATE TABLE user1schema.user2public (junk text);
+GRANT SELECT ON user1schema.user2public TO PUBLIC;
+RESET SESSION AUTHORIZATION;
+SET SESSION AUTHORIZATION user1;
+SELECT * FROM user1schema.user2private;
+ERROR:  permission denied for table user2private
+SELECT * FROM user1schema.user2public;
+ junk 
+--
+(0 rows)
+
+CREATE PUBLICATION user1pub;
+WARNING:  wal_level is insufficient to publish logical changes
+HINT:  Set wal_level to logical before creating subscriptions.
+ALTER PUBLICATION user1pub
+   ADD TABLE user1schema.user2public;
+ERROR:  must be owner of table user2public
+ALTER PUBLICATION user1pub
+   ADD TABLE user1schema.user2private, user1schema.user2public;
+ERROR:  must be owner of table user2private
+SELECT * FROM pg_catalog.pg_publication_tables
+   WHERE pubname = 'user1pub';
+ pubname | schemaname | tablename 
+-++---
+(0 rows)
+
+ALTER PUBLICATION user1pub ADD SCHEMA user1schema;
+SELECT * FROM pg_catalog.pg_publication_tables
+   WHERE pubname = 'user1pub';
+ pubname  | schemaname  |  tablename   
+--+-+--
+ user1pub | user1schema | user2private
+ user1pub | user1schema | user2public
+(2 rows)

It is a bit counterintuitive that schema owners do not have administrative 
privileges over tables within their schemas, but that's how it is.  The design 
of this patch seems to assume otherwise.  Perhaps ALTER PUBLICATION ... ADD 
SCHEMA should be restricted to superusers, just as FOR ALL TABLES?

Alternatively, you could add ownership checks per table to mirror the behavior 
of ALTER PUBLICATION ... ADD TABLE, but that would foreclose the option of 
automatically updating the list of tables in the publication as new tables are 
added to the schema, since those new tables would not necessarily belong to the 
schema owner, and having a error thrown during CREATE TABLE would be quite 
unfriendly.  I think until this is hammered out, it is safer to require 
superuser privileges and then we can revisit this issue and loosen the 
requirement in a subsequent commit.

What do you think?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] OpenSSL: Mark underlying BIO with the appropriate type flags

2021-08-09 Thread Daniel Gustafsson

> On 6 Aug 2021, at 12:16, Itamar Gafni  wrote:

> Previous to OpenSSL version 1.1.0, the BIO methods object would be copied 
> directly from the existing socket type and then its read\write functions 
> would be replaced.
> With 1.1.0 and up, the object is created from scratch and then all its 
> methods are initialized to be the ones of the socket type, except read/write 
> which are custom.
> In this newer way, a new type is given to it by calling “BIO_get_new_index”, 
> but the related type flags aren’t added.

According to the documentation (I haven't tested it yet but will) I think you
are right that the type should be set with the appropriate BIO_TYPE_ flags.

For OpenSSL 1.0.1 and 1.0.2, wouldn't we need to set the .type with a randomly
chosen index anded with BIO_TYPE_DESCRIPTOR and BIO_TYPE_SOURCE_SINK as well?

--
Daniel Gustafsson   https://vmware.com/

Re: when the startup process doesn't (logging startup delays)

2021-08-09 Thread Nitin Jadhav

Modified the reset_startup_progress_timeout() to take the second
parameter which indicates whether it is called for initialization or
for resetting. Without this parameter there is a problem if we call
init_startup_progress() more than one time if there is no call to
ereport_startup_progress() in between as the code related to disabling
the timer has been removed.

reset_startup_progress_timeout(TimeStampTz now, bool is_init)
{
   if (is_init)
  next_timeout = now + interval;
   else
   {
 next_timeout = scheduled_startup_progress_timeout + interval;
 if (next_timeout < now)
 {
// Either the timeout was processed so late that we missed an
entire cycle,
// or the system clock was set backwards.
next_timeout = now + interval;
 }
 enable_timeout_at(next_timeout);
 scheduled_startup_progress_timeout = next_timeout;
   }
}

I have incorporated this in the attached patch. Please correct me if I am wrong.

> This makes sense, but I think I'd like to have all the functions in
> this patch use names_like_this() rather than NamesLikeThis().

I have changed all the function names accordingly. But I would like to
know why it should be names_like_this() as there are many functions
with the format NamesLikeThis(). I would like to understand when to
use what, just to incorporate in the future patches.

> Hmm, yeah, that seems good, but given this change, maybe the variables
> need a little renaming. Like change last_startup_progress_timeout to
> scheduled_startup_progress_timeout, perhaps.

Right. Changed the variable name.

> I guess this one needs to return a Boolean, actually.

Yes.

> reset_startup_progress_timeout(TimeStampTz now)
> {
>   next_timeout = last_startup_progress_timeout + interval;
>   if (next_timeout < now)
>   {
>  // Either the timeout was processed so late that we missed an entire 
> cycle,
>  // or the system clock was set backwards.
>  next_timeout = now + interval;
>   }
>   enable_timeout_at(next_timeout);
>   last_startup_progress_timeout = next_timeout;
> }

As per the above logic, I would like to discuss 2 cases.

Case-1:
First scheduled timeout is after 1 sec. But the time out occurred
after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is
scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
The condition (next_timeout < now) gets evaluated to false and
everything goes smooth.

Case-2:
First scheduled timeout is after 1 sec. But the timeout occurred after
2.5 sec. So the log msg prints after 2.5 sec. Now next time is
scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
So the condition (next_timeout < now) will fail and the next_timeout
will get reset to 3.5 sec (2.5 + 1) and it continues. Is this
behaviour ok or should we set the next_timeout to 3 sec. Please share
your thoughts on this.

Thanks & Regards,
Nitin Jadhav

On Thu, Aug 5, 2021 at 7:49 PM Robert Haas  wrote:
>
> On Thu, Aug 5, 2021 at 7:41 AM Nitin Jadhav
>  wrote:
> > This seems a little confusing. As we are setting
> > last_startup_progress_timeout = now and then calling
> > reset_startup_progress_timeout() which will calculate the next_time
> > based on the value of last_startup_progress_timeout initially and
> > checks whether next_timeout is less than now. It doesn't make sense to
> > me. I feel we should calculate the next_timeout based on the time that
> > it is supposed to fire next time. So we should set
> > last_startup_progress_timeout = next_timeout after enabling the timer.
> > Also I feel with the 2 functions mentioned above, we also need
> > InitStartupProgress() which sets the initial value to
> > startupProcessOpStartTime which is used to calculate the time
> > difference and display in the logs. I could see those functions like
> > below.
> >
> > InitStartupProgress(void)
> > {
> > startupProcessOpStartTime = GetCurrentTimestamp();
> > ResetStartupProgressTimeout(startupProcessOpStartTime);
> > }
>
> This makes sense, but I think I'd like to have all the functions in
> this patch use names_like_this() rather than NamesLikeThis().
>
> > reset_startup_progress_timeout(TimeStampTz now)
> > {
> >   next_timeout = last_startup_progress_timeout + interval;
> >   if (next_timeout < now)
> >   {
> >  // Either the timeout was processed so late that we missed an entire 
> > cycle,
> >  // or the system clock was set backwards.
> >  next_timeout = now + interval;
> >   }
> >   enable_timeout_at(next_timeout);
> >   last_startup_progress_timeout = next_timeout;
> > }
>
> Hmm, yeah, that seems good, but given this change, maybe the variables
> need a little renaming. Like change last_startup_progress_timeout to
> scheduled_startup_progress_timeout, perhaps.
>
> > startup_progress_timeout_has_expired()
> > {
> >if (!startup_progress_timer_expired)
> >  return;
> >   now = GetCurrentTimestamp();
> >   // compute timestamp difference based on startupProcessOpStartTime
> >   reset_startup

Re: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS

2021-08-09 Thread Robert Haas

On Thu, Aug 5, 2021 at 8:02 PM Tom Lane  wrote:
> I think doing nothing is fine.  Given the lack of complaints, we're
> more likely to break something than fix anything useful.

+1.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-09 Thread Robert Haas

On Wed, Aug 4, 2021 at 10:03 PM Greg Nancarrow  wrote:
> In setting up the snapshot for the execution state used in command
> execution, GetTransactionSnapshot() is called and (possibly a copy of)
> the returned snapshot is pushed as the ActiveSnapshot.

I mean, there are LOTS of PushActiveSnapshot() calls in the code. A
lot of those specifically say
PushActiveSnapshot(GetTransactionSnapshot()) but others are pushing
snapshots obtained from various other places. I really don't think it
can possibly be correct in general to assume that the snapshot on top
of the active snapshot stack is the same as the transaction snapshot.

> So why (current Postgres code, no patches applied) in setting up for
> parallel-worker execution (in InitializeParallelDSM) does the Postgres
> code then acquire ANOTHER TransactionSnapshot (by calling
> GetTransactionSnashot(), which could return CurrentSnapshot or a new
> snapshot) and serialize that, as well as serializing what the
> ActiveSnapshot points to, and then restore those in the workers as two
> separate snapshots? Is it a mistake? Or if intentional and correct,
> how so?

Well, I already agreed that in cases where GetTransactionSnapshot()
will acquire an altogether new snapshot, we shouldn't call it, but
beyond that I don't see why you think this is wrong. I mean, suppose
we only call GetTransactionSnapshot() at parallel worker when
IsolationUsesXactSnapshot(). In that case, CurrentSnapshot is a
durable, transaction-lifetime piece of backend-local state that can
affect the results of future calls to GetTransactionSnapshot(), and
therefore seems to need to be replicated to workers. Separately,
regardless of IsolationUsesXactSnapshot(), the active snapshot is
accessible via calls to GetActiveSnapshot() and therefore should also
be replicated to workers. Now, I don't know of any necessary reason
why those two things need to be the same, because again, there are
PushActiveSnapshot() calls all over the place, and they're not all
pushing the transaction snapshot. So therefore it makes sense to me
that we need to replicate those two things separately - the active
snapshot in the leader becomes the active snapshot in the workers and
the transaction snapshot in the leader becomes the transaction
snapshot in the worker.

Now there is evidently something wrong with this line of reasoning
because the code is buggy and my proposed fix doesn't work, but I
don't know what is wrong with it. You seem to think that it's crazy
that we try to replicate the active snapshot to the active snapshot
and the transaction snapshot to the transaction snapshot, but that
did, and still does, seem really sane to me. The only part that now
seems clearly wrong to me is that !IsolationUsesXactSnapshot() case
takes an *extra* snapshot, but since fixing that didn't fix the bug,
there's evidently more to the problem than that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Extension updates and pg_upgrade

2021-08-09 Thread Bruce Momjian

Our minor releases coming out this week will create a script when
pg_upgrade finishes that contains ALTER EXTENSION ... UPDATE commands
for extension that show updates in pg_available_extensions.  However, I
am unclear if all extensions that have updates are reported in
pg_available_extensions or support ALTER EXTENSION ... UPDATE, e.g.,
PostGIS.  Do the new pg_upgrade docs need to be more vague about this?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: row filtering for logical replication

2021-08-09 Thread Dilip Kumar

On Tue, Aug 3, 2021 at 4:25 PM Amit Kapila  wrote:
>
> On Tue, Jul 27, 2021 at 9:56 AM Dilip Kumar  wrote:
>
> > Yeah, that's a big problem, seems like the expression evaluation
> > machinery directly going and detoasting the externally stored data
> > using some random snapshot.  Ideally, in walsender we can never
> > attempt to detoast the data because there is no guarantee that those
> > data are preserved.  Somehow before going to the expression evaluation
> > machinery, I think we will have to deform that tuple and need to do
> > something for the externally stored data otherwise it will be very
> > difficult to control that inside the expression evaluation.
> >
>
> True, I think it would be possible after we fix the issue reported in
> another thread [1] where we will log the key values as part of
> old_tuple_key for toast tuples even if they are not changed. We can
> have a restriction that in the WHERE clause that user can specify only
> Key columns for Updates similar to Deletes. Then, we have the data
> required for filter columns basically if the toasted key values are
> changed, then they will be anyway part of the old and new tuple and if
> they are not changed then they will be part of the old tuple.

Right.

 I have
> not checked the implementation part of it but theoretically, it seems
> possible.

Yeah, It would be possible to because at least after fixing [1] we
would have the required column data.  The only thing I am worried
about is while applying the filter on the new tuple the toasted
unchanged key data will not be a part of the new tuple.  So we can not
directly call the expression evaluation machinary, basically, somehow
we need to deform the new tuple and then replace the data from the old
tuple before passing it to expression evaluation.  Anyways this is an
implementation part so we can look into that while implementing.

 If my understanding is correct then it becomes necessary to
> solve the other bug [1] to solve this part of the problem for this
> patch.

Right.

The other possibility is to disallow columns (datatypes) that
> can lead to toasted data (at least for Updates) which doesn't sound
> like a good idea to me.

Yeah, that will be a big limitation, then we won't be able to allow
expression on any varlena types.

 Do you have any other ideas for this problem?

As of now no other better idea to suggest.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB611342D0A92D4F4BF26C0F47FB229%40OS0PR01MB6113.jpnprd01.prod.outlook.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Advanced Questions about PostgreSQL

2021-08-09 Thread Andrew Dunstan



On 8/9/21 1:33 AM, David Rowley wrote:
> On Mon, 9 Aug 2021 at 17:14, A Z  wrote:
>>
>> 2) How may I get PostgreSQL to output the create table statement(s) for one 
>> or more tables inside one database, without issuing instructions via the 
>> command line, but only inside a database login, as a query or pl/sql?



Watch this space.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

2021-08-09 Thread Amit Kapila

On Mon, Aug 9, 2021 at 3:37 PM Drouvot, Bertrand  wrote:
>
> Hi Amit,
>
> On 8/9/21 10:37 AM, Amit Kapila wrote:
> > On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand  
> > wrote:
> >> Please find enclosed a patch proposal to:
> >>
> >> * Avoid the failed assertion on current master and generate the error 
> >> message instead (should the code reach that stage).
> >> * Reset the toast_hash in case of relation rewrite with toast (so that the 
> >> logical decoding in the above repro is working).
> >>
> > I think instead of resetting toast_hash for this case why don't we set
> > 'relrewrite' for toast tables as well during rewrite? If we do that
> > then we will simply skip assembling toast chunks for the toast table.
>
> Thanks for looking at it!
>
> I do agree, that would be even better than the current patch approach:
> I'll work on it.
>
> > In make_new_heap(), we are calling NewHeapCreateToastTable() to create
> > toast table where we can pass additional information (probably
> > 'toastid'), if required to set 'relrewrite'. Additionally, let's add a
> > test case if possible for this.
> + 1 for the test case, it will be added in the next version of the patch.
>

Thanks, please see, if you can prepare patches for the back-branches as well.

-- 
With Regards,
Amit Kapila.

Re: Column Filtering in Logical Replication

2021-08-09 Thread Amit Kapila

On Mon, Aug 9, 2021 at 3:59 PM Amit Kapila  wrote:
>
> On Mon, Aug 9, 2021 at 1:36 AM Rahila Syed  wrote:
> >
> >> Having said that, I'm not sure I agree with this design decision; what I
> >> think this is doing is hiding from the user the fact that they are
> >> publishing columns that they don't want to publish.  I think as a user I
> >> would rather get an error in that case:
> >
> >
> >>   ERROR:  invalid column list in published set
> >>   DETAIL:  The set of published commands does not include all the replica 
> >> identity columns.
> >
> >
> >> or something like that.  Avoid possible nasty surprises of security-
> >> leaking nature.
> >
> >
> > Ok, Thank you for your opinion. I agree that giving an explicit error in 
> > this case will be safer.
> >
>
> +1 for an explicit error in this case.
>
> Can you please explain why you have the restriction for including
> replica identity columns and do we want to put a similar restriction
> for the primary key? As far as I understand, if we allow default
> values on subscribers for replica identity, then probably updates,
> deletes won't work as they need to use replica identity (or PK) to
> search the required tuple. If so, shouldn't we add this restriction
> only when a publication has been defined for one of these (Update,
> Delete) actions?
>
> Another point is what if someone drops the column used in one of the
> publications? Do we want to drop the entire relation from publication
> or just remove the column filter or something else?
>
> Do we want to consider that the columns specified in the filter must
> not have NOT NULL constraint? Because, otherwise, the subscriber will
> error out inserting such rows?
>

I noticed that other databases provide this feature [1] and they allow
users to specify "Columns that are included in Filter" or specify "All
columns to be included in filter except for a subset of columns". I am
not sure if want to provide both ways in the first version but at
least we should consider it as a future extensibility requirement and
try to choose syntax accordingly.

[1] - 
https://docs.oracle.com/en/cloud/paas/goldengate-cloud/gwuad/selecting-columns.html#GUID-9A851C8B-48F7-43DF-8D98-D086BE069E20

-- 
With Regards,
Amit Kapila.

RE: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION

2021-08-09 Thread houzj.f...@fujitsu.com

On Monday, August 9, 2021 11:10 AM Amit Kapila  wrote:
> 
> On Sat, Aug 7, 2021 at 6:53 PM houzj.f...@fujitsu.com 
> wrote:
> >
> > On Sat, Aug 7, 2021 1:36 PM Amit Kapila  wrote:
> > > On Fri, Aug 6, 2021 at 9:57 PM Japin Li  wrote:
> > >
> > > Do you mean to say that do it for both Add and Drop or just for Drop?
> > > Actually, doing it both will make the behavior consistent but doing
> > > it just for Drop might be preferable by some users. I think it is
> > > better to be consistent here but I am fine if you and others feel it
> > > is better to refresh all publications only in Drop case.
> >
> > Personally, I also think it will be better to make the behavior consistent.
> > Attach the new version patch make both ADD and DROP behave the same as
> > SET PUBLICATION which refresh all the publications.
> >
> 
> There is a bug reported on pgsql-bugs [1] which seems to be happening due to
> the same reason. Can you please test that the other bug is also fixed with 
> your
> patch?
> 
> [1] -
> https://www.postgresql.org/message-id/17132-6a702189086c6243%40postgres
> ql.org

Thanks for the reminder, I have checked and tested the reported bug.
The reported bug is that when drop and then re-add a publication on subscriber 
side,
the table in the publication wasn't synced. And after applying the patch here, 
the table
will be synced as expected if re-added(behaves the same as SET PUBLICATION).

Best regards,
Hou zj

Re: Column Filtering in Logical Replication

2021-08-09 Thread Amit Kapila

On Mon, Aug 9, 2021 at 1:36 AM Rahila Syed  wrote:
>
>> Having said that, I'm not sure I agree with this design decision; what I
>> think this is doing is hiding from the user the fact that they are
>> publishing columns that they don't want to publish.  I think as a user I
>> would rather get an error in that case:
>
>
>>   ERROR:  invalid column list in published set
>>   DETAIL:  The set of published commands does not include all the replica 
>> identity columns.
>
>
>> or something like that.  Avoid possible nasty surprises of security-
>> leaking nature.
>
>
> Ok, Thank you for your opinion. I agree that giving an explicit error in this 
> case will be safer.
>

+1 for an explicit error in this case.

Can you please explain why you have the restriction for including
replica identity columns and do we want to put a similar restriction
for the primary key? As far as I understand, if we allow default
values on subscribers for replica identity, then probably updates,
deletes won't work as they need to use replica identity (or PK) to
search the required tuple. If so, shouldn't we add this restriction
only when a publication has been defined for one of these (Update,
Delete) actions?

Another point is what if someone drops the column used in one of the
publications? Do we want to drop the entire relation from publication
or just remove the column filter or something else?

Do we want to consider that the columns specified in the filter must
not have NOT NULL constraint? Because, otherwise, the subscriber will
error out inserting such rows?

Minor comments:

  pq_sendbyte(out, flags);
-
  /* attribute name */
  pq_sendstring(out, NameStr(att->attname));

@@ -953,6 +1000,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)

  /* attribute mode */
  pq_sendint32(out, att->atttypmod);
+
  }

  bms_free(idattrs);
diff --git a/src/backend/replication/logical/relation.c
b/src/backend/replication/logical/relation.c
index c37e2a7e29..d7a7b00841 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -354,7 +354,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
LOCKMODE lockmode)

  attnum = logicalrep_rel_att_by_name(remoterel,
  NameStr(attr->attname));
-
  entry->attrmap->attnums[i] = attnum;

There are quite a few places in the patch that contains spurious line
additions or removals.

-- 
With Regards,
Amit Kapila.

Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

2021-08-09 Thread Dilip Kumar

On Mon, Aug 9, 2021 at 2:07 PM Amit Kapila  wrote:
>
> On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand  wrote:
> >
> > Please find enclosed a patch proposal to:
> >
> > * Avoid the failed assertion on current master and generate the error 
> > message instead (should the code reach that stage).
> > * Reset the toast_hash in case of relation rewrite with toast (so that the 
> > logical decoding in the above repro is working).
> >
>
> I think instead of resetting toast_hash for this case why don't we set
> 'relrewrite' for toast tables as well during rewrite? If we do that
> then we will simply skip assembling toast chunks for the toast table.
> In make_new_heap(), we are calling NewHeapCreateToastTable() to create
> toast table where we can pass additional information (probably
> 'toastid'), if required to set 'relrewrite'. Additionally, let's add a
> test case if possible for this.

I agree with Amit, that setting relrewrite for the toast relation as
well is better as we can simply avoid processing the toast tuple as
well in such cases.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2021-08-09 Thread Suraj Khamkar

Hello Dagfinn,

I had a look at your patch and below are my review comments.
Please correct me if I am missing something.

   1. For me the patch does not apply cleanly. I have been facing the error
   of trailing whitespaces.
   surajkhamkar@localhost:postgres$ git apply
   v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch
   v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:25: trailing
   whitespace.
   #define Query_for_list_of_schema_roles \
   v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:26: trailing
   whitespace.
   Query_for_list_of_roles \
   v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:30: trailing
   whitespace.
   #define Query_for_list_of_grant_roles \
   v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:31: trailing
   whitespace.
   Query_for_list_of_schema_roles \
   v2-0001-Add-tab-completion-for-CREATE-SCHEMA.patch:32: trailing
   whitespace.
   " UNION ALL SELECT 'PUBLIC'"\
   error: patch failed: src/bin/psql/tab-complete.c:758
   error: src/bin/psql/tab-complete.c: patch does not apply

   2. We can remove space in before \ and below change
   +" UNION ALL SELECT 'PUBLIC'" \

   Should be,
   +" UNION ALL SELECT 'PUBLIC' "\

   3. role_specification has CURRENT_ROLE, CURRENT_USER and SESSION_USER.
   But current changes are missing CURRENT_ROLE.
   postgres@53724=#CREATE SCHEMA AUTHORIZATION
   CURRENT_USER   pg_execute_server_program  pg_read_all_data

   pg_read_all_stats  pg_signal_backend  pg_write_all_data

   SESSION_USER   pg_database_owner  pg_monitor

   pg_read_all_settings   pg_read_server_files
   pg_stat_scan_tables
   pg_write_server_files  surajkhamkar

   4. I'm not sure about this but do we need to enable tab completion for IF
   NOT EXIST?

   5. I think we are not handling IF NOT EXIST that's why it's not
   completing tab completion
   for AUTHORIZATION.

   6. As we are here we can also enable missing tab completion for ALTER
   SCHEMA.
   After OWNER TO we should also get CURRENT_ROLE, CURRENT_USER and
   SESSION_USER.
   postgres@53724=#ALTER SCHEMA sch owner to
   pg_database_owner  pg_monitor
   pg_read_all_settings
   pg_read_server_files   pg_stat_scan_tables
pg_write_server_files
   pg_execute_server_program  pg_read_all_data   pg_read_all_stats

   pg_signal_backend  pg_write_all_data  surajkhamkar

   7. Similarly, as we can drop multiple schemas' simultaneously, we should
   enable tab completion for
   comma with CASCADE and RESTRICT
   postgres@53724=#DROP SCHEMA sch
   CASCADE   RESTRICT

Thanks.

On Sun, Aug 8, 2021 at 2:39 AM Dagfinn Ilmari Mannsåker 
wrote:

> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>
> > ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
> >
> >> Hi Hackers,
> >>
> >> I just noticed there's no tab completion for CREATE SCHEMA
> >> AUTHORIZATION, nor for anything after CREATE SCHEMA .
> >>
> >> Please find attached a patch that adds this.
> >
> > Added to the 2021-09 commit fest:
> https://commitfest.postgresql.org/34/3252/
>
> Here's an updated version that also reduces the duplication between the
> various role list queries.
>
> - ilmari
>
>

Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

2021-08-09 Thread Amit Kapila

On Fri, Jul 9, 2021 at 12:22 PM Drouvot, Bertrand  wrote:
>
> Please find enclosed a patch proposal to:
>
> * Avoid the failed assertion on current master and generate the error message 
> instead (should the code reach that stage).
> * Reset the toast_hash in case of relation rewrite with toast (so that the 
> logical decoding in the above repro is working).
>

I think instead of resetting toast_hash for this case why don't we set
'relrewrite' for toast tables as well during rewrite? If we do that
then we will simply skip assembling toast chunks for the toast table.
In make_new_heap(), we are calling NewHeapCreateToastTable() to create
toast table where we can pass additional information (probably
'toastid'), if required to set 'relrewrite'. Additionally, let's add a
test case if possible for this.

BTW, I see this as an Open Item for PG-14 [1] which seems wrong to me
as this is a bug from previous versions. I am not sure who added it
but do you see any reason for this to consider as an open item for
PG-14?

[1] - https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items
-- 
With Regards,
Amit Kapila.

Re: Reduce the number of special cases to build contrib modules on windows

2021-08-09 Thread David Rowley

On Fri, 30 Jul 2021 at 15:05, David Rowley  wrote:
> Does anyone have any thoughts on where we should draw the line on
> parsing Makefiles?  I'm a little worried that I'm adding pasing just
> for exactly how the Makefiles are today and that it could easily be
> broken if something is adjusted later. I'm not all that skilled with
> make, so I'm struggling to judge this for myself.

After thinking about this some more, I think since we're never going
to make these Perl scripts do everything that make can do, that if the
parsing that's being added seems reasonable and works for what we have
today, there I don't think there is much reason not to just go with
this.

The v10 patch I attached previously output almost identical *.vcxproj
files.  The only change was in libpq_pipeline.vcxproj where the order
of the AdditionalDependencies changed to have ws2_32.lib first rather
than somewhere in the middle.

I've now pushed the final patch in this series.

Thank you to everyone who looked at one or more of these patches.

David

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-09 Thread Michael Meskes

> I think that it's crystal clear what I meant in the email of July 30.
> I meant: it's not okay that you're simply ignoring the RMT. You
> hadn't
> even made a token effort at that point. For example you didn't give
> the proposed fix a cursory 15 minute review, just so we had some very
> rough idea of where things stand. You still haven't.

How do you know I didn't spend 15 minutes looking at the patch and the
whole email thread? I surely did and it was more than 15 minutes, but
not enough to give a meaningful comment. If you can do it in 15
minutes, great for you, I cannot.

The meaning of your email of July 30 was crystal clear, yes. It means
you'd revert the patch if I didn't resolve the issue. This is literally
what it says. If you meant to say "It's not okay that you're simply
ignoring the RMT. You hadn't even made a token effort at that point."
it might have been helpful if you said that, instead of having me guess
if there was a hidden meaning in your email.

Besides, I have not ignored the RMT. I don't know why you keep
insisting on something that is simply not true.

> My understanding of what you're taking issue with (perhaps a flawed
> understanding) is that you think that you have been treated unfairly
> or arbitrarily in general. That's why I brought up the email of July
> 30 yesterday. So my point was: no, you haven't been treated unfairly.

Yes, this is a flawed understanding. I'm sorry you came to that
understanding, I though my emails were pretty clear as to what I was
objecting to.

> If you only take issue with the specific tone and tenor of my email
> from Friday (the email that specified a deadline), and not the
> content
> itself, then maybe the timeline and the wider context are not so
> important.
> 
> I am still unsure about whether your concern is limited to the tone
> of
> the email from Friday, or if you also take exception to the content
> of
> that email (and the wider context).

At the risk of repeating myself, my concern is *only* the rude and
disrespectful way of addressing me in the third person while talking to
me directly. Again, I though I made that clear in my first email about
the topic and every one that followed.

> Perhaps the tone of my email from Friday was unhelpful. Even still, I
> am surprised that you seem to think that it was totally outrageous --
> especially given the context. It was the first email that you

The context never makes a derogative communication okay, at least not
in my opinion.

> responded to *at all* on this thread, with the exception of your
> original terse response. I am not well practised in communicating
> with
> a committer that just doesn't engage with the RMT at all. This is all
> new to me. I admit that I found it awkward to write the email for my
> own reasons.

I was *never* asked for *any* response in *any* email, save the
original technical discussion, which I did answer with telling people
that I'm lacking time but will eventually get to it. Just to be
precise, your email from July 30 told me and everyone how this will be
handled. A reasonable procedure, albeit not one we'd like to see
happen. But why should I answer and what? It's not that you bring this
up as a discussion point, but as a fact.

> I brought up flexibility to point out that this could have been
> avoided. If you needed more time, why didn't you simply ask for it?

The first conversation that brought up the time issue was your email
that started this thread. There you set a deadline which I understand
and accept. But then I never said a word against it, so the question
remains, why accusing me of something I never did?

> Again, the scope of what you're complaining about was (and still is)
> unclear to me.

I'm sorry, but I have no idea how to explain it more clearly. I'm not
asking for any favor or special treatment, I just ask to be treated
like a person.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org

Re: logical replication empty transactions

2021-08-09 Thread Peter Smith

On Sat, Aug 7, 2021 at 12:01 AM Ajin Cherian  wrote:
>
> On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila  wrote:
> >
> > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian  wrote:
> > >
> >
> > Let's first split the patch for prepared and non-prepared cases as
> > that will help to focus on each of them separately.
>
> As a first shot, I have split the patch into prepared and non-prepared cases,

I have reviewed the v12* split patch set.

Apply / build / test was all OK

Below are my code review comments (mostly cosmetic).

//

Comments for v12-0001
=

1. Patch comment

=>

This comment as-is might have been OK before the 2PC code was
committed, but now that the 2PC is part of the HEAD perhaps this
comment needs to be expanded just to say this patch is ONLY for fixing
empty transactions for the cases of non-"streaming" and
non-"two_phase", and the other kinds will be tackled separately.

--

2. src/backend/replication/pgoutput/pgoutput.c - PGOutputTxnData comment

+/*
+ * Maintain a per-transaction level variable to track whether the
+ * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE
+ * is only sent when the first change in a transaction is processed.
+ * This makes it possible to skip transactions that are empty.
+ */

=>

Maybe this is true for the combined v12-0001/v12-0002 case but just
for the v12-0001 patch I think it is nor right to imply that some
skipping of the BEGIN_PREPARE is possible, because IIUC it isn;t
implemented in the *this* patch/

--

3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn whitespace

+ PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));

=>

Misaligned indentation?

--

4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change brackets

+ /*
+ * Output BEGIN if we haven't yet, unless streaming.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ pgoutput_begin(ctx, txn);
+ }

=>

The brackets are not needed for the if with a single statement.

--

5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_truncate
brackets/comment

+ /*
+ * output BEGIN if we haven't yet,
+ * while streaming no need to send BEGIN / BEGIN PREPARE.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ pgoutput_begin(ctx, txn);
+ }

5a. =>

Same as review comment 4. The brackets are not needed for the if with
a single statement.

5b. =>

Notice this code is the same as cited in review comment 4. So probably
the code comment should be consistent/same also?

--

6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_message brackets

+ Assert(txndata);
+ if (!txndata->sent_begin_txn)
+ {
+ pgoutput_begin(ctx, txn);
+ }

=>

The brackets are not needed for the if with a single statement.

--

7. typdefs.list

=> The structure PGOutputTxnData was added in v12-0001, so the
typedefs.list probably should also be updated.

//

Comments for v12-0002
=

8. Patch comment

This patch addresses the above problem by postponing the BEGIN / BEGIN
PREPARE messages until the first change is encountered.
If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
or BEGIN PREPARE / PREPARE  messages for transactions that are empty.
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions that are empty.

8a. =>

I’m not sure this comment is 100% correct for this specific patch. The
whole BEGIN/COMMIT was already handled by the v12-0001 patch, right?
So really this comment should only be mentioning about BEGIN PREPARE
and COMMIT PREPARED I thought.

8b. =>

I think there should also be some mention that this patch is not
handling the "streaming" case of empty tx at all.

--

9. src/backend/replication/logical/proto.c - protocol version

@@ -248,8 +250,10 @@ logicalrep_write_commit_prepared(StringInfo out,
ReorderBufferTXN *txn,
  pq_sendbyte(out, flags);

  /* send fields */
+ pq_sendint64(out, prepare_end_lsn);
  pq_sendint64(out, commit_lsn);
  pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, prepare_time);
  pq_sendint64(out, txn->xact_time.commit_time);
  pq_sendint32(out, txn->xid);

=>

I agree with a previous feedback comment from Amit - Probably there is
some protocol version requirement/implications here because the
message format has been changed in logicalrep_write_commit_prepared
and logicalrep_read_commit_prepared.

e.g. Does this code need to be cognisant of the version and behave
differently accordingly?

--

10. src/backend/replication/pgoutput/pgoutput.c -
pgoutput_begin_prepare flag moved?

+ Assert(txndata);
  OutputPluginPrepareWrite(ctx, !send_replication_origin);
  logicalrep_write_begin_prepare(ctx->out, txn);
+ txndata->sent_begin_txn = true;

  send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
  send_replication_origin);

  O

92 matches

Mail list logo