two_phase commit parameter used in subscription for a publication which is on < 15.

2021-09-27 Thread tushar
Hi,

Please refer to this scenario
CASE 1- HEAD (publication)-> HEAD (Subscription)
CASE 2 - PG 14 (Publication) ->HEAD (Subscription)

Test-case -
Publication  = create table t(n int); create publication p for table t;
Subscription = create table t(n int);
create subscription  s connection 'dbname=postgres host=localhost '
PUBLICATION p WITH (two_phase=1);

Result-
CASE 1-
postgres=# select two_phase from pg_replication_slots where slot_name='s';
 two_phase
---
 t
(1 row)


CASE 2 -
postgres=# select two_phase from pg_replication_slots where slot_name='s';
 two_phase
---
 f
(1 row)

so are we silently ignoring this parameter as it is not supported on v14 ?
and if yes then why not we just throw a message like
ERROR:  unrecognized subscription parameter: "two_phase"

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company


Re: extensible options syntax for replication parser?

2021-09-27 Thread tushar

On 9/25/21 12:08 AM, Robert Haas wrote:

two_phase is new in v15, something you could also find out by checking
the documentation. Now if the patch changes the way two_phase
interacts with older versions, that's a bug in the patch and we should
fix it. But if the same issue exists without the patch then I'm not
sure why you are raising it here rather than on the thread where that
feature was developed.


Right, issue is reproducible on HEAD as well. I should have checked 
that, sorry about it.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: extensible options syntax for replication parser?

2021-09-27 Thread tushar

On 9/27/21 9:29 AM, Ajin Cherian wrote:

And in case you do see a problem, I request you create a seperate
thread for this. I didn't want to derail this patch.


It would be great if we throw an error rather than silently ignoring 
this parameter ,  I opened a separate email for this


https://www.postgresql.org/message-id/CAC6VRoY3SAFeO7kZ0EOVC6mX%3D1ZyTocaecTDTh209W20KCC_aQ%40mail.gmail.com

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





RE: Improve logging when using Huge Pages

2021-09-27 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi, all.
Thank you for your comment.

> Probably I understood your point. But isn't it more confusing to users?
As you say, I think the last patch was rather confusing. For this reason, I 
simply reconsidered it.
The attached patch just outputs a log like your advice on acquiring Huge Page.
It is possible to limit the log output trigger only when huge_pages=try, but is 
it better not to always output it?

Regards,
Noriyoshi Shinoda

-Original Message-
From: Kyotaro Horiguchi [mailto:horikyota@gmail.com] 
Sent: Monday, September 27, 2021 11:40 AM
To: pry...@telsasoft.com
Cc: masao.fu...@oss.nttdata.com; Shinoda, Noriyoshi (PN Japan FSIP) 
; pgsql-hack...@postgresql.org; rjuju...@gmail.com; 
t...@sss.pgh.pa.us
Subject: Re: Improve logging when using Huge Pages

At Tue, 21 Sep 2021 19:23:22 -0500, Justin Pryzby  wrote 
in 
> On Wed, Sep 22, 2021 at 02:03:11AM +0900, Fujii Masao wrote:
> > Another idea is to output "Anonymous shared memory was allocated 
> > with  huge pages" when it's successfully allocated with huge pages, 
> > and to output  "Anonymous shared memory was allocated without huge pages"
> >  when it's successfully allocated without huge pages. I'm not sure 
> > if users  may think even this message is noisy, though.
> 
> +1

+1. Positive phrase looks better.

> Maybe it could show the page size instead of "with"/without:
> "Shared memory allocated with 4k/1MB/1GB pages."

+1.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


huge_pages_log_v6.diff
Description: huge_pages_log_v6.diff


Re: Corruption during WAL replay

2021-09-27 Thread Kyotaro Horiguchi
Thaks for looking this, Robert and Tom.

At Fri, 24 Sep 2021 16:22:28 -0400, Tom Lane  wrote in 
> Robert Haas  writes:
> > On Fri, Sep 24, 2021 at 3:42 PM Tom Lane  wrote:
> >> I think the basic idea is about right, but I'm not happy with the
> >> three-way delayChkpt business; that seems too cute by three-quarters.
> 
> > Nobody, but the version of the patch that I was looking at uses a
> > separate bit for each one:
> 
> > +/* symbols for PGPROC.delayChkpt */
> > +#define DELAY_CHKPT_START (1<<0)
> > +#define DELAY_CHKPT_COMPLETE (1<<1)
> 
> Hm, that's not in the patch version that the CF app claims to be
> latest [1].  It does this:
> 
> +/* type for PGPROC.delayChkpt */
> +typedef enum DelayChkptType
> +{
> + DELAY_CHKPT_NONE = 0,
> + DELAY_CHKPT_START,
> + DELAY_CHKPT_COMPLETE
> +} DelayChkptType;
> 
> which seems like a distinct disimprovement over what you're quoting.
 
Yeah, that is because the latest patch is not attached as *.patch/diff
but *.txt.  I didn't name it as *.patch in order to avoid noise patch
in that thread although it was too late. On the contrary that seems to
have lead in another trouble..

Tom's concern is right.  Actually both the two events can happen
simultaneously but the latest *.patch.txt treats that case as Robert
said.

One advantage of having the two flags as one bitmap integer is it
slightly simplifies the logic in GetVirtualXIDsDelayingChkpt and
HaveVirtualXIDsDelayingChkpt. On the other hand it very slightly
complexifies how to set/reset the flags.

GetVirtualXIDsDelayingChkpt:
+   if ((proc->delayChkpt & type) != 0)

vs

+   if (delayStart)
+   delayflag = proc->delayChkptStart;
+   else
+   delayflag = proc->delayChkptEnd;
+   if (delayflag != 0)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Corruption during WAL replay

2021-09-27 Thread Kyotaro Horiguchi
Thank you for the comments! (Sorry for the late resopnse.)

At Tue, 10 Aug 2021 14:14:05 -0400, Robert Haas  wrote 
in 
> On Thu, Mar 4, 2021 at 10:01 PM Kyotaro Horiguchi
>  wrote:
> > The patch assumed that CHKPT_START/COMPLETE barrier are exclusively
> > used each other, but MarkBufferDirtyHint which delays checkpoint start
> > is called in RelationTruncate while delaying checkpoint completion.
> > That is not a strange nor harmful behavior.  I changed delayChkpt to a
> > bitmap integer from an enum so that both barrier are separately
> > triggered.
> >
> > I'm not sure this is the way to go here, though.  This fixes the issue
> > of a crash during RelationTruncate, but the issue of smgrtruncate
> > failure during RelationTruncate still remains (unless we treat that
> > failure as PANIC?).
> 
> I like this patch. As I understand it, we're currently cheating by
> allowing checkpoints to complete without necessarily flushing all of
> the pages that were dirty at the time we fixed the redo pointer out to
> disk. We think this is OK because we know that those pages are going
> to get truncated away, but it's not really OK because when the system
> starts up, it has to replay WAL starting from the checkpoint's redo
> pointer, but the state of the page is not the same as it was at the
> time when the redo pointer was the end of WAL, so redo fails. In the
> case described in
> http://postgr.es/m/byapr06mb63739b2692dc6dbb3c5f186cab...@byapr06mb6373.namprd06.prod.outlook.com
> modifications are made to the page before the redo pointer is fixed
> and those changes never make it to disk, but the truncation also never
> makes it to the disk either. With this patch, that can't happen,
> because no checkpoint can intervene between when we (1) decide we're
> not going to bother writing those dirty pages and (2) actually
> truncate them away. So either the pages will get written as part of
> the checkpoint, or else they'll be gone before the checkpoint
> completes. In the latter case, I suppose redo that would have modified
> those pages will just be skipped, thus dodging the problem.

I think your understanding is right.

> In RelationTruncate, I suggest that we ought to clear the
> delay-checkpoint flag before rather than after calling
> FreeSpaceMapVacuumRange. Since the free space map is not fully
> WAL-logged, anything we're doing there should be non-critical. Also, I

Agreed and fixed.

> think it might be better if MarkBufferDirtyHint stays closer to the
> existing coding and just uses a Boolean and an if-test to decide
> whether to clear the bit, instead of inventing a new mechanism. I
> don't really see anything wrong with the new mechanism, but I think
> it's better to keep the patch minimal.

Yeah, that was a a kind of silly. Fixed.

> As you say, this doesn't fix the problem that truncation might fail.
> But as Andres and Sawada-san said, the solution to that is to get rid
> of the comments saying that it's OK for truncation to fail and make it
> a PANIC. However, I don't think that change needs to be part of this
> patch. Even if we do that, we still need to do this. And even if we do
> this, we still need to do that.

Ok. Addition to the aboves, I rewrote the comment in RelatinoTruncate.

+* Delay the concurrent checkpoint's completion until this truncation
+* successfully completes, so that we don't establish a redo-point 
between
+* buffer deletion and file-truncate. Otherwise we can leave 
inconsistent
+* file content against the WAL records after the REDO position and 
future
+* recovery fails.

However, a problem for me for now is that I cannot reproduce the
problem.

To avoid further confusion, the attached is named as *.patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index e6c70ed0bc..17357179e3 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3075,8 +3075,8 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	 * crash/basebackup, even though the state of the data directory would
 	 * require it.
 	 */
-	Assert(!MyProc->delayChkpt);
-	MyProc->delayChkpt = true;
+	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
+	MyProc->delayChkpt |= DELAY_CHKPT_START;
 
 	/* WAL log truncation */
 	WriteMTruncateXlogRec(newOldestMultiDB,
@@ -3102,7 +3102,7 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	/* Then offsets */
 	PerformOffsetsTruncation(oldestMulti, newOldestMulti);
 
-	MyProc->delayChkpt = false;
+	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
 
 	END_CRIT_SECTION();
 	LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 2156de187c..b7dc84d6e3 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -463,7 +463,7 @@ MarkAsPrep

Re: Timeout failure in 019_replslot_limit.pl

2021-09-27 Thread Amit Kapila
On Mon, Sep 27, 2021 at 12:13 PM Michael Paquier  wrote:
>
> On Mon, Sep 27, 2021 at 11:53:07AM +0530, Amit Kapila wrote:
> > So, it seems on your machine it has passed  the following condition in
> > secure_write:
> > if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN))
>
> Yep.
>
> > If so, this indicates write failure which seems odd to me and probably
> > something machine-specific or maybe some different settings in your
> > build or machine. BTW, if SSL or GSS is enabled that might have caused
> > it in some way. I think the best way is to debug the secure_write
> > during this occurrence.
>
> Yeah, but we don't use any of them in the context of this test, so
> this is something on a simple send(), no?  Hmm.  That would not be the
> first issue we see with macos these days with interrupted syscalls...
> And actually in this stack I can see that errno gets set to EINTR.
>

If errno is EINTR, then how would the code pass the above if check as
it has a condition ((errno == EWOULDBLOCK || errno == EAGAIN))?


-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-09-27 Thread Greg Nancarrow
On Mon, Sep 27, 2021 at 2:32 PM vignesh C  wrote:
>
> Attached v33 patch has the preprocess_pubobj_list review comment fix
> suggested by Alvaro at [1].

A minor point I noticed in the v33-0002 patch, in the code added to
the listSchemas() function of src/bin/psql/describe.c, shouldn't it
"return false" (not true) if PSQLexec() fails?
Also, since the PQExpBufferData buf is re-used in the added code, it's
handling is a little inconsistent to similar existing code.
See below for suggested update.

Regards,
Greg Nancarrow
Fujitsu Australia


diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 953e1f52cf..1d28809050 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -5077,9 +5077,11 @@ listSchemas(const char *pattern, bool verbose,
bool showSystem)
 appendPQExpBufferStr(&buf, "ORDER BY 1;");

 res = PSQLexec(buf.data);
-termPQExpBuffer(&buf);
 if (!res)
+{
+termPQExpBuffer(&buf);
 return false;
+}

 myopt.nullPrint = NULL;
 myopt.title = _("List of schemas");
@@ -5100,7 +5102,10 @@ listSchemas(const char *pattern, bool verbose,
bool showSystem)
   pattern);
 result = PSQLexec(buf.data);
 if (!result)
-return true;
+{
+termPQExpBuffer(&buf);
+return false;
+}
 else
 pub_schema_tuples = PQntuples(result);

@@ -5132,6 +5137,7 @@ listSchemas(const char *pattern, bool verbose,
bool showSystem)

 printQuery(res, &myopt, pset.queryFout, false, pset.logfile);

+termPQExpBuffer(&buf);
 PQclear(res);

 /* Free the memory allocated for the footer */




Re: Empty string in lexeme for tsvector

2021-09-27 Thread Jean-Christophe Arnu
Le dim. 26 sept. 2021 à 22:41, Jean-Christophe Arnu  a
écrit :

>
>
> Le dim. 26 sept. 2021 à 15:55, Artur Zakirov  a écrit :
>
>> Nice catch! The patch looks good to me.
>> Can you also add a more general test case:
>>
>> =# SELECT $$'' '1' '2'$$::tsvector;
>> ERROR:  syntax error in tsvector: "'' '1' '2'"
>> LINE 1: SELECT $$'' '1' '2'$$::tsvector;
>>
>>
> Thank you, Artur for spotting this test.
> It is now included into this patch.
>
>
>
Two more things :

  * I updated the documentation for array_to_tsvector(), ts_delete() and
setweight() functions (so here's a new patch);
  * I should mention François Ferry from Logilab who first reported the
backup/restore problem that led to this patch.

I think this should be ok, now the doc is up to date.

Kind regards.
-- 
Jean-Christophe Arnu
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 78812b2dbe..01f0b870ca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12896,7 +12896,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple


 Converts an array of lexemes to a tsvector.
-The given strings are used as-is without further processing.
+The given strings are used as-is. Some checks are performed
+on array elements. Empty strings and NULL values
+will cause an error to be raised.


 array_to_tsvector('{fat,cat,rat}'::text[])
@@ -13079,6 +13081,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 Assigns the specified weight to elements
 of the vector that are listed
 in lexemes.
+Some checks are performed on lexemes.
+Empty strings and NULL values
+will cause an error to be raised.


 setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}')
@@ -13256,6 +13261,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 Removes any occurrences of the lexemes
 in lexemes
 from the vector.
+Some checks are performed on lexemes.
+Empty strings and NULL values
+will cause an error to be raised.


 ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 9236ebcc8f..00f80ffcbc 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -329,6 +329,12 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS)
 
 		lex = VARDATA(dlexemes[i]);
 		lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
+
+		if (lex_len == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
+
 		lex_pos = tsvector_bsearch(tsout, lex, lex_len);
 
 		if (lex_pos >= 0 && (j = POSDATALEN(tsout, entry + lex_pos)) != 0)
@@ -609,6 +615,12 @@ tsvector_delete_arr(PG_FUNCTION_ARGS)
 
 		lex = VARDATA(dlexemes[i]);
 		lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ;
+
+		if (lex_len == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
+
 		lex_pos = tsvector_bsearch(tsin, lex, lex_len);
 
 		if (lex_pos >= 0)
@@ -761,13 +773,18 @@ array_to_tsvector(PG_FUNCTION_ARGS)
 
 	deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems);
 
-	/* Reject nulls (maybe we should just ignore them, instead?) */
+	/* Reject nulls and zero length strings (maybe we should just ignore them, instead?) */
 	for (i = 0; i < nitems; i++)
 	{
 		if (nulls[i])
 			ereport(ERROR,
 	(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 	 errmsg("lexeme array may not contain nulls")));
+
+		if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0)
+			ereport(ERROR,
+	(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+	 errmsg("lexeme array may not contain empty strings")));
 	}
 
 	/* Sort and de-dup, because this is required for a valid tsvector. */
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index 2601e312df..f8bf9c6051 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -85,6 +85,10 @@ SELECT 'a:3A b:2a'::tsvector || 'ba:1234 a:1B';
  'a':3A,4B 'b':2A 'ba':1237
 (1 row)
 
+SELECT $$'' '1' '2'$$::tsvector;
+ERROR:  syntax error in tsvector: "'' '1' '2'"
+LINE 1: SELECT $$'' '1' '2'$$::tsvector;
+   ^
 --Base tsquery test
 SELECT '1'::tsquery;
  tsquery 
@@ -1260,6 +1264,8 @@ SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceshi
 
 SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', NULL]);
 ERROR:  lexeme array may not contain nulls
+SELECT ts_delete('base hidden rebel spaceship strike'::tsvector, ARRAY['spaceship','leya','rebel', '']);
+ERROR:  lexeme array may not contain empty strings
 SELECT unnest('base:

Re: Column Filtering in Logical Replication

2021-09-27 Thread Amit Kapila
On Sat, Sep 25, 2021 at 1:15 PM vignesh C  wrote:
>
> On Fri, Sep 24, 2021 at 6:55 PM Alvaro Herrera  
> wrote:
> >
> > On 2021-Sep-23, Amit Kapila wrote:
> >
> > > Alvaro, do you have any thoughts on these proposed grammar changes?
> >
> > Yeah, I think pubobj_name remains a problem in that you don't know its
> > return type -- could be a String or a RangeVar, and the user of that
> > production can't distinguish.  So you're still (unnecessarily, IMV)
> > stashing an object of undetermined type into ->object.
> >
> > I think you should get rid of both pubobj_name and pubobj_expr and do
> > somethine like this:
> >
> > /* FOR TABLE and FOR ALL TABLES IN SCHEMA specifications */
> > PublicationObjSpec: TABLE ColId
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_TABLE;
> > $$->rangevar = 
> > makeRangeVarFromQualifiedName($1, NULL, @1, yyscanner);
> > }
> > | TABLE ColId indirection
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_TABLE;
> > $$->rangevar = 
> > makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
> > }
> > | ALL TABLES IN_P SCHEMA ColId
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_REL_IN_SCHEMA;
> > $$->name = $4;
> > }
> > | ALL TABLES IN_P SCHEMA CURRENT_SCHEMA /* XXX 
> > should this be "IN_P CURRENT_SCHEMA"? */
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_CURRSCHEMA;
> > $$->name = $4;
> > }
> > | ColId
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->name = $1;
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_CONTINUATION;
> > }
> > | ColId indirection
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->rangevar = 
> > makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_CONTINUATION;
> > }
> > | CURRENT_SCHEMA
> > {
> > $$ = 
> > makeNode(PublicationObjSpec);
> > $$->pubobjtype = 
> > PUBLICATIONOBJ_CURRSCHEMA;
> > }
> > ;
>
> Apart from the issue that Hou San pointed, I found one issue with
> introduction of PUBLICATIONOBJ_CURRSCHEMA, I was not able to
> differentiate if it is table or schema in the following cases:
> CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA CURRENT_SCHEMA;
> CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA sch1, CURRENT_SCHEMA;
> CREATE PUBLICATION pub1 FOR table t1, CURRENT_SCHEMA;
> The differentiation is required to differentiate and add a schema or a table.
>

I am not sure what makes you say that we can't distinguish the above
cases when there is already a separate rule for CURRENT_SCHEMA? I
think you can distinguish by tracking the previous objects as we are
already doing in the patch. But one thing that is not clear to me is
is the reason to introduce a new type PUBLICATIONOBJ_CURRSCHEMA when
we use PUBLICATIONOBJ_REL_IN_SCHEMA and PUBLICATIONOBJ_CONTINUATION to
distinguish all cases of CURRENT_SCHEMA. Alvaro might have something
in mind for this which is not apparent and that might have caused
confusion to you as well?

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-09-27 Thread Greg Nancarrow
On Mon, Sep 27, 2021 at 2:32 PM vignesh C  wrote:
>
> Attached v33 patch has the preprocess_pubobj_list review comment fix
> suggested by Alvaro at [1].

In the v33-0003 patch, there's a couple of error-case tests that have
comments copied from success-case tests:

+-- should be able to add table to schema publication
...
+-- should be able to drop table from schema publication
...

These should be changed to something similar to that used for other
error-case tests, like:

+-- fail - can't add a table of the same schema to the schema publication
+-- fail - can't drop a table from the schema publication which isn't
in the publication

Also, for the following error:

ERROR:  cannot add ... to publication
DETAIL:  Table's schema "" is already part of the publication
or part of the specified schema list.

there needs to be a test case to test the "... or part of the
specified schema list" case.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: when the startup process doesn't (logging startup delays)

2021-09-27 Thread Nitin Jadhav
> I really don't know what to say about this. You say that the time is
> measured in milliseconds, and then immediately turn around and say
> "For example, if you set it to 10s". Now we do expect that most people
> will set it to intervals that are measured in seconds rather than
> milliseconds, but saying that setting it to a value measured in
> seconds is an example of setting it in milliseconds is not logical.

Based on the statement "I suggest making the GUC GUC_UNIT_MS rather
than GUC_UNIT_S, but expressing the default in postgresl.conf.sample
as 10s rather than 1ms", I have used the default value in the
postgresl.conf.sample as 10s rather than 1ms. So I just used the
same value in the example too in config.sgml. If it is really getting
confusing, I will change it to 100ms in config.sgml.

> It also looks pretty silly to say that if you set the value to 10s,
> something will happen every 10s. What else would anyone expect to
> happen? You really need to give some thought to how to explain the
> behavior in a way that is clear and logical but not overly wordy.

Added a few lines about that. "For example, if you set it to 1000ms,
then it tries to emit a log every 1000ms. If the log message is not
available at every 100th millisecond, then there is a possibility of
delay in emitting the log. If the delay is more than a cycle or if the
system clock gets set backwards then the next attempt is done based on
the last logging time, otherwise the delay gets adjusted in the next
attempt."

Please correct the explanation if it does not meet your expectations.

> Also, please note that you've got spaces around the literals, which
> does not match the surrounding style and does not render properly in
> HTML.

Fixed.

> It is also not logical to define 0 as meaning that "all messages for
> such operations are logged". What does that even mean? It makes sense
> for something like log_autovacuum_min_duration, because there we are
> talking about logging one message per operation, and we could log
> messages for all operations or just some of them. Here we are talking
> about the time between one message and the next, so talking about "all
> messages" does not really seem to make a lot of sense. Experimentally,
> what 0 actually does is cause the system to spam log lines in a tight
> loop, which cannot be what anyone wants. I think you should make 0
> mean disabled, and a positive value mean log at that interval, and
> disallow -1 altogether.

Made changes which indicate 0 mean disabled, > 0 mean interval in
millisecond and removed -1.

Please find the patch attached.



On Thu, Sep 23, 2021 at 9:44 PM Robert Haas  wrote:
>
> On Wed, Sep 22, 2021 at 10:28 AM Nitin Jadhav
>  wrote:
> > Thanks Justin for the detailed explanation.  Done the necessary changes.
>
> Not really. The documentation here does not make a ton of sense:
>
> + Sets the time interval between each progress update of the 
> operations
> + performed by the startup process. This produces the log messages for
> + those operations which take longer than the specified
> duration. The unit
> + used to specify the value is milliseconds. For example, if
> you set it to
> +  10s , then every  10s
> , a log is
> + emitted indicating which operation is ongoing, and the
> elapsed time from
> + the beginning of the operation. If the value is set to
>  0 ,
> + then all messages for such operations are logged. 
> -1 
> + disables the feature. The default value is  10s 
>
> I really don't know what to say about this. You say that the time is
> measured in milliseconds, and then immediately turn around and say
> "For example, if you set it to 10s". Now we do expect that most people
> will set it to intervals that are measured in seconds rather than
> milliseconds, but saying that setting it to a value measured in
> seconds is an example of setting it in milliseconds is not logical. It
> also looks pretty silly to say that if you set the value to 10s,
> something will happen every 10s. What else would anyone expect to
> happen? You really need to give some thought to how to explain the
> behavior in a way that is clear and logical but not overly wordy.
> Also, please note that you've got spaces around the literals, which
> does not match the surrounding style and does not render properly in
> HTML.
>
> It is also not logical to define 0 as meaning that "all messages for
> such operations are logged". What does that even mean? It makes sense
> for something like log_autovacuum_min_duration, because there we are
> talking about logging one message per operation, and we could log
> messages for all operations or just some of them. Here we are talking
> about the time between one message and the next, so talking about "all
> messages" does not really seem to make a lot of sense. Experimentally,
> what 0 actually does is cause the system to spam log lines in a tight
> loop, which cannot be what 

Re: Column Filtering in Logical Replication

2021-09-27 Thread vignesh C
On Mon, Sep 27, 2021 at 4:41 PM Amit Kapila  wrote:
>
> On Sat, Sep 25, 2021 at 1:15 PM vignesh C  wrote:
> >
> > On Fri, Sep 24, 2021 at 6:55 PM Alvaro Herrera  
> > wrote:
> > >
> > > On 2021-Sep-23, Amit Kapila wrote:
> > >
> > > > Alvaro, do you have any thoughts on these proposed grammar changes?
> > >
> > > Yeah, I think pubobj_name remains a problem in that you don't know its
> > > return type -- could be a String or a RangeVar, and the user of that
> > > production can't distinguish.  So you're still (unnecessarily, IMV)
> > > stashing an object of undetermined type into ->object.
> > >
> > > I think you should get rid of both pubobj_name and pubobj_expr and do
> > > somethine like this:
> > >
> > > /* FOR TABLE and FOR ALL TABLES IN SCHEMA specifications */
> > > PublicationObjSpec: TABLE ColId
> > > {
> > > $$ = 
> > > makeNode(PublicationObjSpec);
> > > $$->pubobjtype = 
> > > PUBLICATIONOBJ_TABLE;
> > > $$->rangevar = 
> > > makeRangeVarFromQualifiedName($1, NULL, @1, yyscanner);
> > > }
> > > | TABLE ColId indirection
> > > {
> > > $$ = 
> > > makeNode(PublicationObjSpec);
> > > $$->pubobjtype = 
> > > PUBLICATIONOBJ_TABLE;
> > > $$->rangevar = 
> > > makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
> > > }
> > > | ALL TABLES IN_P SCHEMA ColId
> > > {
> > > $$ = 
> > > makeNode(PublicationObjSpec);
> > > $$->pubobjtype = 
> > > PUBLICATIONOBJ_REL_IN_SCHEMA;
> > > $$->name = $4;
> > > }
> > > | ALL TABLES IN_P SCHEMA CURRENT_SCHEMA /* XXX 
> > > should this be "IN_P CURRENT_SCHEMA"? */
> > > {
> > > $$ = 
> > > makeNode(PublicationObjSpec);
> > > $$->pubobjtype = 
> > > PUBLICATIONOBJ_CURRSCHEMA;
> > > $$->name = $4;
> > > }
> > > | ColId
> > > {
> > > $$ = 
> > > makeNode(PublicationObjSpec);
> > > $$->name = $1;
> > > $$->pubobjtype = 
> > > PUBLICATIONOBJ_CONTINUATION;
> > > }
> > > | ColId indirection
> > > {
> > > $$ = 
> > > makeNode(PublicationObjSpec);
> > > $$->rangevar = 
> > > makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
> > > $$->pubobjtype = 
> > > PUBLICATIONOBJ_CONTINUATION;
> > > }
> > > | CURRENT_SCHEMA
> > > {
> > > $$ = 
> > > makeNode(PublicationObjSpec);
> > > $$->pubobjtype = 
> > > PUBLICATIONOBJ_CURRSCHEMA;
> > > }
> > > ;
> >
> > Apart from the issue that Hou San pointed, I found one issue with
> > introduction of PUBLICATIONOBJ_CURRSCHEMA, I was not able to
> > differentiate if it is table or schema in the following cases:
> > CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA CURRENT_SCHEMA;
> > CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA sch1, CURRENT_SCHEMA;
> > CREATE PUBLICATION pub1 FOR table t1, CURRENT_SCHEMA;
> > The differentiation is required to differentiate and add a schema or a 
> > table.
> >
>
> I am not sure what makes you say that we can't distinguish the above
> cases when there is already a separate rule for CURRENT_SCHEMA? I
> think you can distinguish by tracking the previous objects as we are
> already doing in the patch. But one thing that is not clear to me is
> is the reason to introduce a new type PUBLICATIONOBJ_CURRSCHEMA when
> we use PUBLICATIONOBJ_REL_IN_SCHEMA and PUBLICATIONOBJ_CONTINUATION to
> distinguish all cases of CURRENT_SCHEMA. Alvaro might have something
> in mind for this which is not apparent and that might have caused
> confusion to you as well?

It is diff

Re: can we add some file(msvc) to gitignore

2021-09-27 Thread Daniel Gustafsson
> On 27 Sep 2021, at 02:34, Michael Paquier  wrote:

> I don't think that we should remove the entries that track
> files we could expect based on the state of the build code, though,
> like config.pl or buildenv.pl in src/tools/msvc/ as committing those
> could silently break builds.

Agreed, those clearly belong in the .gitignore.  The ones I was looking at were
*.vcproj and *.vcxproj in the root .gitignore, but I don't know the MSVC build
well enough to know if those make sense or not.

--
Daniel Gustafsson   https://vmware.com/





Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]

2021-09-27 Thread Aleksander Alekseev
Hi hackers,

> As a matter of fact, I think the patch I suggested is the right approach:
> let me elaborate on why.
> [...]
> It is straightforward to replace it by implementing the Table AM methods
> above, but we are missing a callback on dropping the table. If we have that,
> we can record the table-to-be-dropped in a similar manner to how the heap AM
> does it and register a transaction callback using RegisterXactCallback.

Since no one objected in 5 months, I assume Mats made a good point. At least,
personally, I can't argue.

The patch looks good to me except for the fact that comments seem to be
inaccurate in light of the discussion. The corrected patch is attached.
I'm going to mark it as "Ready for Committer" unless anyone objects.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Add-callback-to-reset-filenode-to-table-access-metho.patch
Description: Binary data


Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]

2021-09-27 Thread Alexander Kuzmenkov

On 05.04.2021 22:57, Mats Kindahl wrote:
Now, suppose that we want to replace the storage layer with a 
different one. It is straightforward to replace it by implementing the 
Table AM methods above, but we are missing a callback on dropping the 
table. If we have that, we can record the table-to-be-dropped in a 
similar manner to how the heap AM does it and register a transaction 
callback using RegisterXactCallback.


This explanation makes sense, and the suggested patch makes it easier to 
replace the storage layer with a different one.


Some other places might become problematic if we're trying to implement 
fully memory-based tables. For example, the heap_create_with_catalog -> 
GetNewRelFilenode -> access() call that directly checks the existence of 
a file bypassing the smgr layer. But I think that adding a symmetric 
callback to the tableam layer can be a good start for further experiments.


Some nitpicks:

+   /*
+* This callback needs to remove all associations with the relation 
`rel`
+* since the relation is being dropped.
+*
+* See also table_relation_reset_filenode().
+*/

"Remove all associations" sounds vague, maybe something like "schedule 
the relation files to be deleted at transaction commit"?



+   void(*relation_reset_filenode) (Relation rel);

This line uses spaces instead of tabs.


For the reference, there is a recent patch that makes the smgr layer 
itself pluggable: 
https://www.postgresql.org/message-id/flat/1dc080496f58ce5375778baed0c0fbcc%40postgrespro.ru#502a1278ad8fce6ae85c08b4806c2289



--
Alexander Kuzmenkov
https://www.timescale.com/


Re: [PATCH] Cross-reference comments on signal handling logic

2021-09-27 Thread Daniel Gustafsson
> On 1 Mar 2021, at 19:22, Mark Dilger  wrote:

>> On Jan 17, 2021, at 11:51 PM, Craig Ringer  
>> wrote:
>> 
>> 
> 
> In src/backend/postmaster/interrupt.c:
> 
> + * These handlers are NOT used by normal user backends as they do not support
> 
> vs.
> 
> + * Most backends use this handler.
> 
> These two comments seem to contradict.  If interrupt.c contains handlers that 
> normal user backends to not use, then how can it be that most backends use 
> one of the handlers in the file?

I'm closing this as Returned with Feedback as it there has been no response to
the review comment during two commitfests.  Please reopen in a future
commitfest if you still would like to pursue this patch.

--
Daniel Gustafsson   https://vmware.com/





Re: Column Filtering in Logical Replication

2021-09-27 Thread Alvaro Herrera
On 2021-Sep-27, Amit Kapila wrote:

> I am not sure what makes you say that we can't distinguish the above
> cases when there is already a separate rule for CURRENT_SCHEMA? I
> think you can distinguish by tracking the previous objects as we are
> already doing in the patch. But one thing that is not clear to me is
> is the reason to introduce a new type PUBLICATIONOBJ_CURRSCHEMA when
> we use PUBLICATIONOBJ_REL_IN_SCHEMA and PUBLICATIONOBJ_CONTINUATION to
> distinguish all cases of CURRENT_SCHEMA. Alvaro might have something
> in mind for this which is not apparent and that might have caused
> confusion to you as well?

My issue is what happens if you have a schema that is named
CURRENT_SCHEMA.  In the normal case where you do ALL TABLES IN SCHEMA
"CURRENT_SCHEMA" you would end up with a String containing
"CURRENT_SCHEMA", so how do you distinguish that from ALL TABLES IN
SCHEMA CURRENT_SCHEMA, which does not refer to the schema named
"CURRENT_SCHEMA" but in Vignesh's proposal also uses a String containing
"CURRENT_SCHEMA"?

Now you could say "but who would be stupid enough to do that??!", but it
seems easier to dodge the problem entirely.  AFAICS our grammar never
uses String "CURRENT_SCHEMA" to represent CURRENT_SCHEMA, but rather
some special enum value.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: Column Filtering in Logical Replication

2021-09-27 Thread Rahila Syed
Hi,


>
> I don't think there is one.  I think the latest is what I posted in
> https://postgr.es/m/202109061751.3qz5xpugwx6w@alvherre.pgsql (At least I
> don't see any reply from Rahila with attachments after that), but that
> wasn't addressing a bunch of review comments that had been made; and I
> suspect that Amit K has already committed a few conflicting patches
> after that.
>
> Yes, the v5 version of the patch attached by Alvaro is the latest one.
IIUC, the review comments that are yet to be addressed apart from the
ongoing grammar
discussion, are as follows:

1. Behaviour on dropping a column from the table, that is a part of column
filter.
In the latest patch, the entire table is dropped from publication on
dropping a column
that is a part of the column filter. However, there is preference for
another approach
to drop just the column from the filter on DROP column CASCADE(continue to
filter
the other columns), and an error for DROP RESTRICT.

2. Instead of WITH RECURSIVE query to find the topmost parent of the
partition
in fetch_remote_table_info, use pg_partition_tree and pg_partition_root.

3. Report of memory leakage in get_rel_sync_entry().

4. Missing documentation

5. Latest comments(last two messages) by Peter Smith.

Thank you,
Rahila Syed


Re: PostgreSQL High Precision Mathematics Extension.

2021-09-27 Thread Alvaro Herrera
On 2021-Sep-27, A Z wrote:

> I have been trying to find active interest in a free for all use
> PostgreSQL extension, complete and available, on the public internet,
> that will support the following:

You have posted this question ten times already to the PostgreSQL
mailing lists.  I think it's time for you to stop -- people are starting
to get annoyed.

https://www.postgresql.org/message-id/PSXP216MB0085760D0FCA442A1D4974769AF99%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB0085E2A574FAE5EE16FE1BE09AFF9%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB008519B96A025725439F41719AC09%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB0085B1C0B3E10A1CF3BCD1A09ACD9%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB00856A5C2B402E6D646D24609ACD9%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB0085098A2D76E3C5DD4F8AE99ACF9%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB0085F21467C36F05AB9427879ADF9%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB008545A90DD2886F4BE21E489AA09%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB0085ADE313F9F48A134057F39AA19%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM
https://www.postgresql.org/message-id/PSXP216MB0085D05D015DE0C46A11BE1F9AA79%40PSXP216MB0085.KORP216.PROD.OUTLOOK.COM

Thanks

-- 
Álvaro Herrera




Re: when the startup process doesn't (logging startup delays)

2021-09-27 Thread Justin Pryzby
On Mon, Sep 27, 2021 at 04:57:20PM +0530, Nitin Jadhav wrote:
> > It is also not logical to define 0 as meaning that "all messages for
> > such operations are logged". What does that even mean? It makes sense
> > for something like log_autovacuum_min_duration, because there we are
> > talking about logging one message per operation, and we could log
> > messages for all operations or just some of them. Here we are talking
> > about the time between one message and the next, so talking about "all
> > messages" does not really seem to make a lot of sense. Experimentally,
> > what 0 actually does is cause the system to spam log lines in a tight
> > loop, which cannot be what anyone wants. I think you should make 0
> > mean disabled, and a positive value mean log at that interval, and
> > disallow -1 altogether.
> 
> Made changes which indicate 0 mean disabled, > 0 mean interval in
> millisecond and removed -1.
> 
> Please find the patch attached.

I think you misunderstood - Robert was saying that interval=0 doesn't work, not
suggesting that you write more documentation about it.

Also, I agree with Robert that the documentation is too verbose.  I don't think
you need to talk about what happens if the clock goes backwards (It just needs
to behave conveniently).

Look at the other _duration statements for what they say about units.
"If this value is specified without units, it is taken as milliseconds."
https://www.postgresql.org/docs/14/runtime-config-logging.html
 log_autovacuum_min_duration
 log_min_duration_statement

>>It also looks pretty silly to say that if you set the value to 10s, something
>>will happen every 10s. What else would anyone expect to happen?

@Robert: that's consistent with existing documentation, even though it might
seem obvious and silly to us.

| For example, if you set this to 250ms then all automatic vacuums and analyzes 
that run 250ms or longer will be logged
| For example, if you set it to 250ms then all SQL statements that run 250ms or 
longer will be logged

-- 
Justin




Re: row filtering for logical replication

2021-09-27 Thread Tomas Vondra

Hi,

I see no one responded to this important part of my review so far:

On 9/23/21 2:33 PM, Tomas Vondra wrote:

3) create_subscription.sgml

     WHERE clauses, rows must satisfy all expressions
     to be copied. If the subscriber is a

I'm rather skeptical about the principle that all expressions have to 
match - I'd have expected exactly the opposite behavior, actually.


I see a subscription as "a union of all publications". Imagine for 
example you have a data set for all customers, and you create a 
publication for different parts of the world, like


   CREATE PUBLICATION customers_france
  FOR TABLE customers WHERE (country = 'France');

   CREATE PUBLICATION customers_germany
  FOR TABLE customers WHERE (country = 'Germany');

   CREATE PUBLICATION customers_usa
  FOR TABLE customers WHERE (country = 'USA');

and now you want to subscribe to multiple publications, because you want 
to replicate data for multiple countries (e.g. you want EU countries). 
But if you do


   CREATE SUBSCRIPTION customers_eu
  PUBLICATION customers_france, customers_germany;

then you won't get anything, because each customer belongs to just a 
single country. Yes, I could create multiple individual subscriptions, 
one for each country, but that's inefficient and may have a different 
set of issues (e.g. keeping them in sync when a customer moves between 
countries).


I might have missed something, but I haven't found any explanation why 
the requirement to satisfy all expressions is the right choice.


IMHO this should be 'satisfies at least one expression' i.e. we should 
connect the expressions by OR, not AND.


Am I the only one finding the current behavior strange? What's the 
reasoning supporting the current approach?



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Support for NSS as a libpq TLS backend

2021-09-27 Thread Daniel Gustafsson
> On 21 Sep 2021, at 02:06, Jacob Champion  wrote:
> 
> On Mon, 2021-07-26 at 15:26 +0200, Daniel Gustafsson wrote:
>>> On 19 Jul 2021, at 21:33, Jacob Champion  wrote:
>>> ..client connections will crash if
>>> hostaddr is provided rather than host, because SSL_SetURL can't handle
>>> a NULL argument. I'm running with 0002 to fix it for the moment, but
>>> I'm not sure yet if it does the right thing for IP addresses, which the
>>> OpenSSL side has a special case for.
>> 
>> AFAICT the idea is to handle it in the cert auth callback, so I've added some
>> PoC code to check for sslsni there and updated the TODO comment to reflect
>> that.
> 
> I dug a bit deeper into the SNI stuff:
> 
>> +server_hostname = SSL_RevealURL(conn->pr_fd);
>> +if (!server_hostname || server_hostname[0] == '\0')
>> +{
>> +/* If SNI is enabled we must have a hostname set */
>> +if (conn->sslsni && conn->sslsni[0])
>> +status = SECFailure;
> 
> conn->sslsni can be explicitly set to "0" to disable it, so this should
> probably be changed to a check for "1",

Agreed.

> but I'm not sure that would be
> correct either. If the user has the default sslsni="1" and supplies an
> IP address for the host parameter, I don't think we should fail the
> connection.

Maybe not, but doing so is at least in line with how the OpenSSL support will
handle the same config AFAICT. Or am I missing something?

>> +if (host && host[0] &&
>> +!(strspn(host, "0123456789.") == strlen(host) ||
>> +  strchr(host, ':')))
>> +SSL_SetURL(conn->pr_fd, host);
> 
> It looks like NSS may already have some code that prevents SNI from
> being sent for IP addresses, so that part of the guard might not be
> necessary. (And potentially counterproductive, because it looks like
> NSS can perform verification against the certificate's SANs if you pass
> an IP address to SSL_SetURL().)

Skimming the NSS code I wasn't able find the countermeasures, can you provide a
reference to where I should look?

Feel free to post a new version of the NSS patch with these changes if you want.

> Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
> backend can handle those. That's a separate conversation, but I might
> take a look at a patch for next commitfest.

Please do.

--
Daniel Gustafsson   https://vmware.com/





Re: row filtering for logical replication

2021-09-27 Thread Euler Taveira
On Mon, Sep 27, 2021, at 10:34 AM, Tomas Vondra wrote:
> Hi,
> 
> I see no one responded to this important part of my review so far:
I'm still preparing a new patch and a summary.

> Am I the only one finding the current behavior strange? What's the 
> reasoning supporting the current approach?
I think it is an oversight from my side. It used to work the way you mentioned
but I changed it. I'll include this change in the next patch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: POC: Cleaning up orphaned files using undo logs

2021-09-27 Thread Antonin Houska
Amit Kapila  wrote:

> On Fri, Sep 24, 2021 at 4:44 PM Antonin Houska  wrote:
> >
> > Amit Kapila  wrote:
> >
> > > On Mon, Sep 20, 2021 at 10:24 AM Antonin Houska  wrote:
> > > >
> > > > Amit Kapila  wrote:
> > > >
> > > > > On Fri, Sep 17, 2021 at 9:50 PM Dmitry Dolgov <9erthali...@gmail.com> 
> > > > > wrote:
> > > > > >
> > > > > > > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
> > > > > >
> > > > > > * What happened with the idea of abandoning discard worker for the 
> > > > > > sake
> > > > > >   of simplicity? From what I see limiting everything to foreground 
> > > > > > undo
> > > > > >   could reduce the core of the patch series to the first four 
> > > > > > patches
> > > > > >   (forgetting about test and docs, but I guess it would be enough at
> > > > > >   least for the design review), which is already less overwhelming.
> >
> > > > What we can miss, at least for the cleanup of the orphaned files, is 
> > > > the *undo
> > > > worker*. In this patch series the cleanup is handled by the startup 
> > > > process.
> > > >
> > >
> > > Okay, I think various people at different point of times has suggested
> > > that idea. I think one thing we might need to consider is what to do
> > > in case of a FATAL error? In case of FATAL error, it won't be
> > > advisable to execute undo immediately, so would we upgrade the error
> > > to PANIC in such cases. I remember vaguely that for clean up of
> > > orphaned files that can happen rarely and someone has suggested
> > > upgrading the error to PANIC in such a case but I don't remember the
> > > exact details.
> >
> > Do you mean FATAL error during normal operation?
> >
> 
> Yes.
> 
> > As far as I understand, even
> > zheap does not rely on immediate UNDO execution (otherwise it'd never
> > introduce the undo worker), so FATAL only means that the undo needs to be
> > applied later so it can be discarded.
> >
> 
> Yeah, zheap either applies undo later via background worker or next
> time before dml operation if there is a need.
> 
> > As for the orphaned files cleanup feature with no undo worker, we might need
> > PANIC to ensure that the undo is applied during restart and that it can be
> > discarded, otherwise the unapplied undo log would stay there until the next
> > (regular) restart and it would block discarding. However upgrading FATAL to
> > PANIC just because the current transaction created a table seems quite
> > rude.
> >
> 
> True, I guess but we can once see in what all scenarios it can
> generate FATAL during that operation.

By "that operation" you mean "CREATE TABLE"?

It's not about FATAL during CREATE TABLE, rather it's about FATAL anytime
during a transaction. Whichever operation caused the FATAL error, we'd need to
upgrade it to PANIC as long as the transaction has some undo.

Although the postgres core probably does not raise FATAL errors too often (OOM
conditions seem to be the typical cause), I'm still not enthusiastic about
idea that the undo feature turns such errors into PANIC.

I wonder what the reason to avoid undoing transaction on FATAL is. If it's
about possibly long duration of the undo execution, deletion of orphaned files
(relations or the whole databases) via undo shouldn't make things worse
because currently FATAL also triggers this sort of cleanup immediately, it's
just implemented in different ways.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: when the startup process doesn't (logging startup delays)

2021-09-27 Thread Robert Haas
On Mon, Sep 27, 2021 at 7:26 AM Nitin Jadhav
 wrote:
> > I really don't know what to say about this. You say that the time is
> > measured in milliseconds, and then immediately turn around and say
> > "For example, if you set it to 10s". Now we do expect that most people
> > will set it to intervals that are measured in seconds rather than
> > milliseconds, but saying that setting it to a value measured in
> > seconds is an example of setting it in milliseconds is not logical.
>
> Based on the statement "I suggest making the GUC GUC_UNIT_MS rather
> than GUC_UNIT_S, but expressing the default in postgresl.conf.sample
> as 10s rather than 1ms", I have used the default value in the
> postgresl.conf.sample as 10s rather than 1ms. So I just used the
> same value in the example too in config.sgml. If it is really getting
> confusing, I will change it to 100ms in config.sgml.

That's really not what I'm complaining about. I think if we're going
to give an example at all, 10ms is a better example than 100ms,
because 10s is a value that people are more likely to find useful. But
I'm not sure that it's necessary to mention a specific value, and if
it is, I think it needs to be phrased in a less confusing way.

> Made changes which indicate 0 mean disabled, > 0 mean interval in
> millisecond and removed -1.

Well, I see that -1 is now disallowed, and that's good as far as it
goes, but 0 still does not actually disable the feature. I don't
understand why you posted the previous version of the patch without
testing that it works, and I even less understand why you are posting
another version without fixing the bug that I pointed out to you in
the last version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: when the startup process doesn't (logging startup delays)

2021-09-27 Thread Robert Haas
On Mon, Sep 27, 2021 at 9:32 AM Justin Pryzby  wrote:
> >>It also looks pretty silly to say that if you set the value to 10s, 
> >>something
> >>will happen every 10s. What else would anyone expect to happen?
>
> @Robert: that's consistent with existing documentation, even though it might
> seem obvious and silly to us.
>
> | For example, if you set this to 250ms then all automatic vacuums and 
> analyzes that run 250ms or longer will be logged
> | For example, if you set it to 250ms then all SQL statements that run 250ms 
> or longer will be logged

Fair enough, but I still don't like it much. I tried my hand at
rewriting this and came up with the attached:

+ Sets the amount of time after which the startup process will log
+ a message about a long-running operation that is still in progress,
+ as well as the interval between further progress messages for that
+ operation. This setting is applied separately to each operation.
+ For example, if syncing the data directory takes 25 seconds and
+ thereafter resetting unlogged relations takes 8 seconds, and if this
+ setting has the default value of 10 seconds, then a messages will be
+ logged for syncing the data directory after it has been in progress
+ for 10 seconds and again after it has been in progress for 20 seconds,
+ but nothing will be logged for resetting unlogged operations.
+ A setting of 0 disables the feature.

I prefer this to Nitin's version because I think it could be unclear
to someone that the value applies separately to each operation,
whereas I don't think we need to document that we can't guarantee that
the messages will be perfectly on time - that's true of every kind of
scheduled event in pretty much every computer system - or what happens
if the system clock goes backwards. Those are things we should try to
get right, as well as we can anyway, but we don't need to tell the
user that we got them right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Support for NSS as a libpq TLS backend

2021-09-27 Thread Jacob Champion
On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote:
> > On 21 Sep 2021, at 02:06, Jacob Champion  wrote:
> > but I'm not sure that would be
> > correct either. If the user has the default sslsni="1" and supplies an
> > IP address for the host parameter, I don't think we should fail the
> > connection.
> 
> Maybe not, but doing so is at least in line with how the OpenSSL support will
> handle the same config AFAICT. Or am I missing something?

With OpenSSL, I don't see a connection failure when using sslsni=1 with
IP addresses. (verify-full can't work, but that's a separate problem.)

> > > + if (host && host[0] &&
> > > + !(strspn(host, "0123456789.") == strlen(host) ||
> > > +   strchr(host, ':')))
> > > + SSL_SetURL(conn->pr_fd, host);
> > 
> > It looks like NSS may already have some code that prevents SNI from
> > being sent for IP addresses, so that part of the guard might not be
> > necessary. (And potentially counterproductive, because it looks like
> > NSS can perform verification against the certificate's SANs if you pass
> > an IP address to SSL_SetURL().)
> 
> Skimming the NSS code I wasn't able find the countermeasures, can you provide 
> a
> reference to where I should look?

I see the check in ssl_ShouldSendSNIExtension(), in ssl3exthandle.c.

> Feel free to post a new version of the NSS patch with these changes if you 
> want.

Will do!

Thanks,
--Jacob


Re: when the startup process doesn't (logging startup delays)

2021-09-27 Thread Alvaro Herrera
> +/*
> + * Decides whether to log the startup progress or not based on whether the
> + * timer is expired or not. Returns FALSE if the timer is not expired, 
> otherwise
> + * calculates the elapsed time and sets the respective out parameters secs 
> and
> + * usecs. Enables the timer for the next log message and returns TRUE.
> + */
> +bool
> +startup_progress_timeout_has_expired(long *secs, int *usecs)

I think this comment can be worded better.  It says it "decides", but it
doesn't actually decide on any action to take -- it just reports whether
the timer expired or not, to allow its caller to make the decision.  In
such situations we just say something like "Report whether startup
progress has caused a timeout, return true and rearm the timer if it
did, or just return false otherwise"; and we don't indicate what the
value is going to be used *for*.  Then the caller can use the boolean
return value to make a decision, such as whether something is going to
be logged.  This function can be oblivious to details such as this:

> + /* If the timeout has not occurred, then no need to log the details. */
> + if (!startup_progress_timer_expired)
> + return false;

here we can just say "No timeout has occurred" and make no inference
about what's going to happen or not happen.

Also, for functions that do things like this we typically use English
sentence structure with the function name starting with the verb --
perhaps has_startup_progress_timeout_expired().  Sometimes we are lax
about this if we have some sort of poor-man's modularisation by using a
common prefix for several functions doing related things, which perhaps
could be "startup_progress_*" in your case, but your other functions are
already not doing that (such as begin_startup_progress_phase) so it's
not clear why you would not use the most natural name for this one.

> + ereport_startup_progress("syncing data directory (syncfs), elapsed 
> time: %ld.%02d s, current path: %s",
> +  path);

Please make sure to add ereport_startup_progress() as a translation
trigger in src/backend/nls.mk.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: Gather performance analysis

2021-09-27 Thread Robert Haas
On Mon, Sep 27, 2021 at 1:22 AM Dilip Kumar  wrote:
> I have done testing with different batch sizes, 16k (which is the same
> as 1/4 of the queue size with 64k queue size) , 8k, 4k, 2k.
>
> In the attached sheet I have done a comparison of
> 1. head vs patch (1/4 queue size) = execution time reduced to 37% to
> 90% this is the same as the old sheet.
> 2. patch (1/4 queue size) vs patch(8k batch) =  both are same, but 8k
> batch size is slow in some cases.
> 3. patch (1/4 queue size) vs patch(4k batch) = both are same, but 4k
> batch size is slow in some cases (even slower than 8k batch size).
> 4. patch (1/4 queue size) vs patch(2k batch) = 2k batch size is
> significantly slow.

Generally these results seem to show that a larger batch size is
better than a smaller one, but we know that's not true everywhere and
under all circumstances, because some of Tomas's numbers are worse
than the unpatched cases. And I think we don't really understand the
reason for those results. Now it could be that there's no really good
reason for those results, and it's just something weird or not very
generally interesting.

On the other hand, while it's easy to see that batching can be a win
if it avoids contention, it also seems easy to imagine that it can be
a loss. By postponing the update to shared memory, we are essentially
gambling. If nobody would have read the updated value anyway, we win,
because we avoided doing work that wasn't really needed by
consolidating multiple updates of shared memory down to one. However,
imagine the scenario where someone reads a value that is small enough
that they have to block, because they think there's no more data
available. If there really is more data available and we just didn't
update shared memory, then we lose.

Here, the wins are going to be much smaller than the losses. Cache
line contention isn't cheap, but it's a lot cheaper than actually
having a process go to sleep and having to wake it up again. So
essentially the patch is betting that the winning scenario is much
more common than the losing scenario - the occasional big losses when
the reader sleeps unnecessarily will be more than counterbalanced by
the small wins every time we skip an update to shared memory without
causing that to happen.

And most of the time, that's probably a good bet. But, if you do
somehow hit the losing case repeatedly, then you could see a
significant regression. And that might explain Tomas's results.
Perhaps for some reason they just happen to hit that case over and
over again. If that's true, it would be useful to know why it happens
in that case and not others, because then maybe we could avoid the
problem somehow. However, I'm not sure how to figure that out, and I'm
not even entirely sure it's important to figure it out.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: two_phase commit parameter used in subscription for a publication which is on < 15.

2021-09-27 Thread Euler Taveira
On Mon, Sep 27, 2021, at 4:09 AM, tushar wrote:
> so are we silently ignoring this parameter as it is not supported on v14 ?
Yes. Because two_phase is a supported parameter for v15 (your current
subscriber).  The issue is that this parameter are not forwarded to publisher
because its version (v14) does not support it. Since we do not have a
connection before parse_subscription_options(), publisher server version is
unknown. Hence, it does not know if that specific parameter is supported on
publisher. I'm not sure it is worth parsing the options again after a
replication connection is available just to check those parameters that don't
work on all supported server versions.

IMO we can provide messages during the connection (see
libpqrcv_startstreaming()) instead of while executing CREATE/ALTER
SUBSCRIPTION. Something like:

 if (options->proto.logical.twophase &&
 PQserverVersion(conn->streamConn) >= 15)
 appendStringInfoString(&cmd, ", two_phase 'on'");
 else if (options->proto.logical.twophase)
 ereport(DEBUG1,
 (errmsg_internal("parameter \"two_phase\" is not supported 
on the publisher")));

It is a DEBUG message because it can be annoying when the subscriber cannot
connect to the publisher.

The output plugin also raises an error if the subscriber sends the two_phase
parameter. See pgoutput_startup(). The subscriber could probably send all
parameters and the output plugin would be responsible to report an error. I
think the author decided to not do it because it is not an user-friendly
approach.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-09-27 Thread Mark Dilger

> On Sep 21, 2021, at 12:58 PM, Andrew Dunstan  wrote:
> 
> This patch set is failing to apply for me - it fails on patch 2.

Thanks for looking!  I have pulled together a new patch set which applies 
cleanly against current master.

> I haven't dug terribly deeply into it yet, but I notice that there is a
> very large increase in test volume, which appears to account for much of
> the 44635 lines of the patch set. I think we're probably going to want
> to reduce that. We've had complaints in the past from prominent hackers
> about adding too much volume to the regression tests.

The v8 patch set is much smaller, with the reduction being in the size of 
regression tests covering which roles can perform SET, RESET, ALTER SYSTEM SET, 
and ALTER SYSTEM RESET and on which GUCs.  The v7 patch set did exhaustive 
testing on this, which is why it was so big.  The v8 set does just a sampling 
of GUCs per role.  The total number of lines for the patch set drops from 44635 
to 13026, with only 1960 lines total between the .sql and .out tests of GUC 
privileges.

> I do like the basic thrust of reducing the power of CREATEROLE. There's
> an old legal maxim I learned in my distant youth that says "nemo dat
> quod non habet" - Nobody can give something they don't own. This seems
> to be in that spirit, and I approve :-)

Great!  I'm glad to hear the approach has some support.


Other changes in v8:

Add a new test for subscriptions owned by non-superusers to verify that the 
tablesync and apply workers replicating their subscription won't replicate into 
schemas and tables that the subscription owner lacks privilege to touch.  The 
logic to prevent that existed in the v7 patch, but I overlooked adding tests 
for it.  Fixed.

Allow non-superusers to create event triggers.  The logic already existed and 
is unchanged to handle event triggers owned by non-superusers and conditioning 
those triggers firing on the (trigger-owner, role-performing-event) pair.  The 
v7 patch set didn't go quite so far as allowing non-superusers to create event 
triggers, but that undercuts much of the benefit of the changes for no obvious 
purpose.


Not changed in v8, but worth discussing:

Non-superusers are still prohibited from creating subscriptions, despite 
improvements to the security around that circumstance.  Improvements to the 
security model around event triggers does not have to also include permission 
for non-superuser to create event triggers, but v8 does both.  These could be 
viewed as inconsistent choices, but I struck the balance this way because roles 
creating event triggers does not entail them doing anything that they can't 
already do, whereas allowing arbitrary users to create subscriptions would 
entail an ordinary user causing external network connections being initiated.  
We likely need to create another privileged role and require a non-superuser to 
be part of that role before they can create subscriptions.  That seems, 
however, like something to do as a follow-on patch, since tightening up the 
security on subscriptions as done in this patch doesn't depend on that.




v8.tar.bz2
Description: BZip2 compressed data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-09-27 Thread Melanie Plageman
On Fri, Sep 24, 2021 at 5:58 PM Melanie Plageman
 wrote:
>
> On Thu, Sep 23, 2021 at 5:05 PM Melanie Plageman
>  wrote:
> The only remaining TODOs are described in the commit message.
> most critical one is that the reset message doesn't work.

v10 is attached with updated comments and some limited code refactoring
From 566a5e6194acf6e670f9c1836d4296cf43b7ffcc Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 2 Sep 2021 11:47:41 -0400
Subject: [PATCH v10 3/3] Remove superfluous bgwriter stats

Remove stats from pg_stat_bgwriter which are now more clearly expressed
in pg_stat_buffers.

TODO:
- make pg_stat_checkpointer view and move relevant stats into it
- add additional stats to pg_stat_bgwriter
---
 doc/src/sgml/monitoring.sgml  | 47 ---
 src/backend/catalog/system_views.sql  |  6 +---
 src/backend/postmaster/checkpointer.c | 26 ---
 src/backend/postmaster/pgstat.c   |  5 ---
 src/backend/storage/buffer/bufmgr.c   |  6 
 src/backend/utils/adt/pgstatfuncs.c   | 30 -
 src/include/catalog/pg_proc.dat   | 22 -
 src/include/pgstat.h  | 10 --
 src/test/regress/expected/rules.out   |  5 ---
 9 files changed, 1 insertion(+), 156 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 60627c692a..08772652ac 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3416,24 +3416,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
- 
-  
-   buffers_checkpoint bigint
-  
-  
-   Number of buffers written during checkpoints
-  
- 
-
- 
-  
-   buffers_clean bigint
-  
-  
-   Number of buffers written by the background writer
-  
- 
-
  
   
maxwritten_clean bigint
@@ -3444,35 +3426,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
- 
-  
-   buffers_backend bigint
-  
-  
-   Number of buffers written directly by a backend
-  
- 
-
- 
-  
-   buffers_backend_fsync bigint
-  
-  
-   Number of times a backend had to execute its own
-   fsync call (normally the background writer handles those
-   even when the backend does its own write)
-  
- 
-
- 
-  
-   buffers_alloc bigint
-  
-  
-   Number of buffers allocated
-  
- 
-
  
   
stats_reset timestamp with time zone
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 30280d520b..c45c261f4b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1058,18 +1058,14 @@ CREATE VIEW pg_stat_archiver AS
 s.stats_reset
 FROM pg_stat_get_archiver() s;
 
+-- TODO: make separate pg_stat_checkpointer view
 CREATE VIEW pg_stat_bgwriter AS
 SELECT
 pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
 pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
 pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
 pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
-pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
-pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean,
 pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean,
-pg_stat_get_buf_written_backend() AS buffers_backend,
-pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
-pg_stat_get_buf_alloc() AS buffers_alloc,
 pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
 CREATE VIEW pg_stat_buffers AS
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index c0c4122fd5..829f52cc8f 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -90,17 +90,9 @@
  * requesting backends since the last checkpoint start.  The flags are
  * chosen so that OR'ing is the correct way to combine multiple requests.
  *
- * num_backend_writes is used to count the number of buffer writes performed
- * by user backend processes.  This counter should be wide enough that it
- * can't overflow during a single processing cycle.  num_backend_fsync
- * counts the subset of those writes that also had to do their own fsync,
- * because the checkpointer failed to absorb their request.
- *
  * The requests array holds fsync requests sent by backends and not yet
  * absorbed by the checkpointer.
  *
- * Unlike the checkpoint fields, num_backend_writes, num_backend_fsync, and
- * the requests fields are protected by CheckpointerCommLock.
  *--
  */
 typedef struct
@@ -124,9 +116,6 @@ typedef struct
 	ConditionVariable start_cv; /* signaled when ckpt_started advances */
 	ConditionVariable done_cv;	/* signaled when ckpt_done advances */
 
-	uint32

Re: [BUG] failed assertion in EnsurePortalSnapshotExists()

2021-09-27 Thread Tom Lane
"Drouvot, Bertrand"  writes:
> I recently observed a failed assertion in EnsurePortalSnapshotExists().

Hmm, interesting.  If I take out the "update bdt2" step, so that the
exception clause is just COMMIT, then I get something different:

ERROR:  portal snapshots (1) did not account for all active snapshots (0)
CONTEXT:  PL/pgSQL function inline_code_block line 8 at COMMIT

I think perhaps plpgsql's exception handling needs a bit of adjustment,
but not sure what yet.

regards, tom lane




Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Mark Dilger


> On Sep 25, 2021, at 11:04 AM, Mark Dilger  
> wrote:
> 
> I took Tom's response to be, "yeah, go ahead", and am mostly waiting for the 
> weekend to be over to see if anybody else has anything to say about it.

Here is a patch set, one patch per test.  The third patch enables its test in 
the Makefile, which is commented as having been disabled due to the test being 
unstable in the build farm.  Re-enabling the test might be wrong, since the 
instability might not have been due to WAL being recycled early.  I didn't find 
enough history about why that test was disabled, so trying it in the build farm 
again is the best I can suggest.  Maybe somebody else on this list knows more?




v1-0001-Harden-test-againt-premature-discarding-of-WAL.patch
Description: Binary data


v1-0002-Harden-test-againt-premature-discarding-of-WAL.patch
Description: Binary data


v1-0003-Harden-and-enable-bloom-TAP-test.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Tom Lane
Mark Dilger  writes:
> Here is a patch set, one patch per test.  The third patch enables its test in 
> the Makefile, which is commented as having been disabled due to the test 
> being unstable in the build farm.  Re-enabling the test might be wrong, since 
> the instability might not have been due to WAL being recycled early.  I 
> didn't find enough history about why that test was disabled, so trying it in 
> the build farm again is the best I can suggest.  Maybe somebody else on this 
> list knows more?

Digging in the archives where the commit points, I find

https://www.postgresql.org/message-id/flat/20181126025125.GH1776%40paquier.xyz

which says there was an unexpected result on my animal longfin.
I tried the same thing (i.e., re-enable bloom's TAP test) on my laptop
just now, and it passed fine.  The laptop is not exactly the same
as longfin was in 2018, but it ought to be close enough.  Not sure
what to make of that --- maybe the failure is only intermittent,
or else we fixed the underlying issue since then.

I'm a little inclined to re-enable the test without your other
changes, just to see what happens.

regards, tom lane




Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Mark Dilger



> On Sep 27, 2021, at 1:19 PM, Tom Lane  wrote:
> 
> I'm a little inclined to re-enable the test without your other
> changes, just to see what happens.

That sounds like a good idea.  Even if it passes at first, I'd prefer to leave 
it for a week or more to have a better sense of how stable it is.  Applying my 
patches too soon would just add more variables to a not-so-well understood 
situation.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: storing an explicit nonce

2021-09-27 Thread Bruce Momjian
On Sun, Sep  5, 2021 at 10:51:42PM +0800, Sasasu wrote:
> Hi, community,
> 
> It looks like we are still considering AES-CBC, AES-XTS, and AES-GCM(-SIV).
> I want to say something that we don't think about.
> 
> For AES-CBC, the IV should be not predictable. I think LSN or HASH(LSN,
> block number or something) is predictable. There are many CVE related to
> AES-CBC with a predictable IV.

The LSN would change every time the page is modified, so while the LSN
could be predicted, it would not be reused.  However, there is currently
no work being done on page-level encryption of Postgres.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: how to distinguish between using the server as a standby or for executing a targeted recovery in PG 11?

2021-09-27 Thread Euler Taveira
On Fri, Sep 24, 2021, at 1:46 PM, Bharath Rupireddy wrote:
> I'm trying to set up a postgres server with version 11 in targeted
> recovery mode (for the first time after my journey started with
> postgres) and I came across the explanation at [1] in PG 12 and newer
> versions that we have a clear differentiation as to what is the
> "standby" mode or "targeted recovery" mode. How do we differentiate
> these two modes in PG 11? Can anyone please help me with it?
It seems you have to rely on parsing recovery.conf. However, someone can modify
it after starting Postgres. In this case, you have to use a debugger such as

gdb /path/to/postgres -p $(pgrep -f 'postgres: startup recovering') -quiet 
-batch -ex 'p StandbyMode' -ex 'quit'

Unfortunately, there is no simple way such as checking if a .signal file exists.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: typos (and more)

2021-09-27 Thread Alvaro Herrera
On 2021-Sep-27, Michael Paquier wrote:

> The use
> of "statistic", "statistics" and "statistics object" in 0008 and 0012
> is indeed inconsistent.  The latter term is the most used, but it
> sounds a bit weird to me even if it refers to the DDL object
> manipulated.  Simply using "statistics" would be tempting.

Initially we just used "statistic" as a noun, which IIRC was already
grammatically wrong (but I didn't know that and I think Tomas didn't
either); later at some point when discussing how to use that noun in
plural we realized this and argued that merely using "statistics" was
even more wrong.  It was then that we started using the term "statistics
object" with plural "statistics objects".  Going back to using just
"statistics" is unlikely to have become correct; I think Justin's
patches 0008 and 0012 are correct.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)




Couldn't we mark enum_in() as immutable?

2021-09-27 Thread Tom Lane
Currently enum_in() is marked as stable, on the reasonable grounds
that it depends on system catalog contents.  However, after the
discussion at [1] I'm wondering why it wouldn't be perfectly safe,
and useful, to mark it as immutable.

Here's my reasoning: "immutable" promises that the function will
always give the same results for the same inputs.  However, one of
the inputs is the type OID for the desired enum type.  It's certainly
possible that the enum type could be dropped later, and then its type
OID could be recycled, so that at some future epoch enum_in() might
give a different result for the "same" type OID.  But I cannot think
of any situation where a stored output value of enum_in() would
outlive the dropping of the enum type.  It certainly couldn't happen
for a table column of that type, nor would it be safe for a stored
rule to outlive a type it mentions.  So it seems like the results of
enum_in() are immutable for as long as anybody could care about them,
and that's good enough.

Moreover, if it's *not* good enough, then our existing practice of
folding enum literals to OID constants on-sight must be unsafe too.

So it seems like this would be okay, and if we did it it'd eliminate
some annoying corner cases for query optimization, as seen in the
referenced thread.

I think that a similar argument could be made about enum_out, although
maybe I'm missing some interesting case there.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/9a51090e-e075-4f2f-e3a6-55ed4359a357%40kimmet.dk




Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Tom Lane
Mark Dilger  writes:
>> On Sep 27, 2021, at 1:19 PM, Tom Lane  wrote:
>> I'm a little inclined to re-enable the test without your other
>> changes, just to see what happens.

> That sounds like a good idea.  Even if it passes at first, I'd prefer to 
> leave it for a week or more to have a better sense of how stable it is.  
> Applying my patches too soon would just add more variables to a not-so-well 
> understood situation.

Done.  I shall retire to a safe viewing distance and observe the
buildfarm.

regards, tom lane




[PATCH] Print error when libpq-refs-stamp fails

2021-09-27 Thread Rachel Heaton
Hello,

While developing I got this error and it was difficult to figure out what
was going on.

Thanks to Jacob, I was able to learn the context of the failure, so we
created this small patch.


The text of the error message, of course, is up for debate, but hopefully
this will make it more clear to others.


Thank you,
Rachel Heaton


0001-Print-error-when-libpq-refs-stamp-fails.patch
Description: Binary data


Re: Support for NSS as a libpq TLS backend

2021-09-27 Thread Rachel Heaton
On Mon, Sep 20, 2021 at 2:38 AM Daniel Gustafsson  wrote:
>
> Rebased on top of HEAD with off-list comment fixes by Kevin Burke.
>

Hello Daniel,

I've been playing with your patch on Mac (OS 11.6 Big Sur) and have
run into a couple of issues so far.

1. I get 7 warnings while running make (truncated):
cryptohash_nss.c:101:21: warning: implicit conversion from enumeration
type 'SECOidTag' to different enumeration type 'HASH_HashType'
[-Wenum-conversion]
ctx->hash_type = SEC_OID_SHA1;
   ~ ^~~~
...
cryptohash_nss.c:134:34: warning: implicit conversion from enumeration
type 'HASH_HashType' to different enumeration type 'SECOidTag'
[-Wenum-conversion]
hash = SECOID_FindOIDByTag(ctx->hash_type);
   ~~~ ~^
7 warnings generated.

2. libpq-refs-stamp fails -- it appears an exit is being injected into
libpq on Mac

Notes about my environment:
I've installed nss via homebrew (at version 3.70) and linked it.

Cheers,
Rachel




Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Michael Paquier
On Mon, Sep 27, 2021 at 04:19:27PM -0400, Tom Lane wrote:
> I tried the same thing (i.e., re-enable bloom's TAP test) on my laptop
> just now, and it passed fine.  The laptop is not exactly the same
> as longfin was in 2018, but it ought to be close enough.  Not sure
> what to make of that --- maybe the failure is only intermittent,
> or else we fixed the underlying issue since then.

Honestly, I have no idea what change in the backend matters here.  And
it is not like bloom has changed in any significant way since d3c09b9.
--
Michael


signature.asc
Description: PGP signature


Re: typos (and more)

2021-09-27 Thread Michael Paquier
On Mon, Sep 27, 2021 at 06:04:02PM -0300, Alvaro Herrera wrote:
> Initially we just used "statistic" as a noun, which IIRC was already
> grammatically wrong (but I didn't know that and I think Tomas didn't
> either); later at some point when discussing how to use that noun in
> plural we realized this and argued that merely using "statistics" was
> even more wrong.  It was then that we started using the term "statistics
> object" with plural "statistics objects".  Going back to using just
> "statistics" is unlikely to have become correct; I think Justin's
> patches 0008 and 0012 are correct.

Thanks for confirming.

 if (list_length(pstate->p_rtable) != 1)
 ereport(ERROR,
 (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("statistics expressions can refer only to the table 
being indexed")));
+ errmsg("statistics expressions can refer only to the table 
being referenced")));
This part should be backpatched?  The code claims that this should
be dead code so an elog() would be more adapted, and the same can be
said about transformRuleStmt() and transformIndexStmt(), no?  That
would be less messages to translate. 
--
Michael


signature.asc
Description: PGP signature


Re: typos (and more)

2021-09-27 Thread Justin Pryzby
On Mon, Sep 27, 2021 at 06:04:02PM -0300, Alvaro Herrera wrote:
> On 2021-Sep-27, Michael Paquier wrote:
> 
> > The use
> > of "statistic", "statistics" and "statistics object" in 0008 and 0012
> > is indeed inconsistent.  The latter term is the most used, but it
> > sounds a bit weird to me even if it refers to the DDL object
> > manipulated.  Simply using "statistics" would be tempting.
> 
> Initially we just used "statistic" as a noun, which IIRC was already
> grammatically wrong (but I didn't know that and I think Tomas didn't
> either); later at some point when discussing how to use that noun in
> plural we realized this and argued that merely using "statistics" was
> even more wrong.  It was then that we started using the term "statistics
> object" with plural "statistics objects".  Going back to using just
> "statistics" is unlikely to have become correct; I think Justin's
> patches 0008 and 0012 are correct.

Attached is an updated patch fixing more of the same.

-- 
Justin
>From 45c16e1cb6b3ddf007806f5942f3acf64a75edf7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 12 Sep 2021 23:45:05 -0500
Subject: [PATCH] Refer to plural statistics OBJECTS

See also: f04c9a61468904b6815b2bc73a48878817766e0e
---
 src/backend/commands/statscmds.c| 14 +++---
 src/backend/statistics/extended_stats.c | 12 ++--
 src/backend/utils/adt/selfuncs.c| 12 ++--
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index 9938b65083..13987d9dcb 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -431,7 +431,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	 * similar to why we don't bother with extracting columns from
 	 * expressions. It's either expensive or very easy to defeat for
 	 * determined user, and there's no risk if we allow such statistics (the
-	 * statistics is useless, but harmless).
+	 * statistic is useless, but harmless).
 	 */
 	foreach(cell, stxexprs)
 	{
@@ -560,7 +560,7 @@ CreateStatistics(CreateStatsStmt *stmt)
 	}
 
 	/*
-	 * If there are no dependencies on a column, give the statistics an auto
+	 * If there are no dependencies on a column, give the statistics object an auto
 	 * dependency on the whole table.  In most cases, this will be redundant,
 	 * but it might not be if the statistics expressions contain no Vars
 	 * (which might seem strange but possible). This is consistent with what
@@ -649,7 +649,7 @@ AlterStatistics(AlterStatsStmt *stmt)
 	stxoid = get_statistics_object_oid(stmt->defnames, stmt->missing_ok);
 
 	/*
-	 * If we got here and the OID is not valid, it means the statistics does
+	 * If we got here and the OID is not valid, it means the statistics object does
 	 * not exist, but the command specified IF EXISTS. So report this as a
 	 * simple NOTICE and we're done.
 	 */
@@ -768,7 +768,7 @@ RemoveStatisticsById(Oid statsOid)
 }
 
 /*
- * Select a nonconflicting name for a new statistics.
+ * Select a nonconflicting name for a new statistics object.
  *
  * name1, name2, and label are used the same way as for makeObjectName(),
  * except that the label can't be NULL; digits will be appended to the label
@@ -815,7 +815,7 @@ ChooseExtendedStatisticName(const char *name1, const char *name2,
 }
 
 /*
- * Generate "name2" for a new statistics given the list of column names for it
+ * Generate "name2" for a new statistics object given the list of column names for it
  * This will be passed to ChooseExtendedStatisticName along with the parent
  * table name and a suitable label.
  *
@@ -869,8 +869,8 @@ ChooseExtendedStatisticNameAddition(List *exprs)
 }
 
 /*
- * StatisticsGetRelation: given a statistics's relation OID, get the OID of
- * the relation it is an statistics on.  Uses the system cache.
+ * StatisticsGetRelation: given a statistics object's OID, get the OID of
+ * the relation it is defined on.  Uses the system cache.
  */
 Oid
 StatisticsGetRelation(Oid statId, bool missing_ok)
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 4c35223457..97f4ba3590 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -182,7 +182,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 			continue;
 		}
 
-		/* compute statistics target for this statistics */
+		/* compute statistics target for this statistics object */
 		stattarget = statext_compute_stattarget(stat->stattarget,
 bms_num_members(stat->columns),
 stats);
@@ -195,7 +195,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 		if (stattarget == 0)
 			continue;
 
-		/* evaluate expressions (if the statistics has any) */
+		/* evaluate expressions (if the statistics object has any) */
 		data = make_build_data(onerel, stat, numrows, rows, stats, stattarget);
 
 		/* compute statistic of each requested type */
@@ -257,7 +257

Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-27 Thread Thomas Munro
On Thu, Sep 23, 2021 at 9:13 PM Juan José Santamaría Flecha
 wrote:
> On Thu, Sep 23, 2021 at 4:58 AM Thomas Munro  wrote:
>> One small detail I'd like to draw attention to is this bit, which
>> differs slightly from the [non-working] previous attempts by mapping
>> to two different errors:
>>
>> + * If there's no O_CREAT flag, then we'll pretend the file is
>> + * invisible.  With O_CREAT, we have no choice but to report that
>> + * there's a file in the way (which wouldn't happen on Unix).
>>
>> ...
>>
>> +if (fileFlags & O_CREAT)
>> +err = ERROR_FILE_EXISTS;
>> +else
>> +err = ERROR_FILE_NOT_FOUND;
>
>
> When GetTempFileName() finds a duplicated file name and the file is pending 
> for deletion, it fails with an "ERROR_ACCESS_DENIED" error code. That may 
> describe the situation better than "ERROR_FILE_EXISTS".

Thanks for looking.  Why do you think that's better?  I assume that's
just the usual NT->Win32 error conversion at work.

The only case I can think of so far in our tree where you'd notice
this change of errno for the O_CREAT case is relfilenode creation[1],
and there it's just a case of printing a different message.  Trying to
create a relfilenode that exists already in delete-pending state will
fail, but with this change we'll log the %m string for EEXIST instead
of EACCES (what you see today) or ENOENT (which seems nonsensical, "I
can't create your file because it doesn't exist", and what you'd get
with this patch if I didn't have the special case for O_CREAT).  So I
think that's pretty arguably an improvement.

As for how likely you are to reach that case... hmm, I don't know what
access() returns for a file in delete-pending state.  The docs say
"The function returns -1 if the named file does not exist or does not
have the given mode", so perhaps it returns 0 for such a case, in
which case we'll consider it a collision and keep searching for
another free relfilenode.  If that's the case, it's probably really
really unlikely you'll reach the case described in the previous
paragraph, so it probably doesn't matter much.

Do we have any other code paths where this finer point could cause
problems?  Looking around at code that handles EEXIST, most of it is
directory creation (unaffected by this patch), and then
src/port/mkdtemp.c for which this change is appropriate (it implements
POSIX mkdtemp(), which shouldn't report EACCES to its caller if
something it tries bumps into a delete-pending file, it should see
EEXIST and try a new name, which I think it will do with this patch,
through its call to open(O_CREAT | O_EXCL)).

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJz_pZTF9mckn6XgSv69%2BjGwdgLkxZ6b3NWGLBCVjqUZA%40mail.gmail.com




Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-27 Thread Bruce Momjian
On Tue, Sep  7, 2021 at 12:58:44PM -0400, Tom Lane wrote:
> Yeah, that would mostly fix the usability concern.  I guess what it
> comes down to is whether you think that public or private certs are
> likely to be the majority use-case in the long run.  The shortage of
> previous requests for this feature says that right now, just about
> everyone is using self-signed or private-CA certs for Postgres
> servers.  So it would likely be a long time, if ever, before public-CA
> certs become the majority use-case.
> 
> On the other hand, even if I'm using a private CA, there's a lot
> to be said for adding its root cert to system-level trust stores
> rather than copying it into individual users' home directories.
> So I still feel like there's a pretty good case for allowing use
> of the system store to happen by default.  (As I said, I'd always
> thought that was *already* what would happen.)

I don't think public CA's are not a good idea for complex setups since
they open the ability for an external party to create certificates that
are trusted by your server's CA, e.g., certificate authentication.  I
can see public certs being useful for default installs where the client
_only_ wants to verify the server is valid.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





RE: Failed transaction statistics to measure the logical replication progress

2021-09-27 Thread osumi.takami...@fujitsu.com
On Monday, September 27, 2021 1:42 PM Amit Kapila  
wrote:
> On Wed, Sep 22, 2021 at 10:10 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Just conducted some cosmetic changes
> > and rebased my patch, using v14 patch-set in [1].
> >
> 
> IIUC, this proposal will allow new xact stats for subscriptions via
> pg_stat_subscription. One thing that is not clear to me in this patch is that 
> why
> you choose a different way to store these stats than the existing stats in 
> that
> view? AFAICS, the other existing stats are stored in-memory in
> LogicalRepWorker whereas these new stats are stored/fetched via stats
> collector means these will persist. Isn't it better to be consistent here? I 
> am not
> sure which is a more appropriate way to store these stats and I would like to
> hear your and other's thoughts on that matter but it appears a bit awkward to
> me that some of the stats in the same view are persistent and others are
> in-memory.
Yeah, existing stats values of pg_stat_subscription are in-memory.
I thought xact stats should survive over the restart,
to summarize and show all accumulative transaction values
on one subscription for user. But, your pointing out is reasonable,
mixing two types can be awkward and lack of consistency.

Then, if, we proceed in this direction,
the place to implement those stats
would be on the LogicalRepWorker struct, instead ?

On one hand, what confuses me is that
in another thread of feature to skip xid,
I wondered if Sawada-san has started to take
those xact stats into account (probably in his patch-set),
because the stats values this thread is taking care of are listed up in the 
thread.
If that's true, its thread and this thread are getting really close.
So, IIUC, we have to discuss this point as well.

Best Regards,
Takamichi Osumi



Re: typos (and more)

2021-09-27 Thread Michael Paquier
On Mon, Sep 27, 2021 at 07:50:02PM -0500, Justin Pryzby wrote:
> Attached is an updated patch fixing more of the same.

Does this include everything you have spotted, as well as everything
from the previous patches 0008 and 0012 posted?
--
Michael


signature.asc
Description: PGP signature


Re: Identify missing publications from publisher while create/alter subscription.

2021-09-27 Thread Jaime Casanova
On Thu, Aug 26, 2021 at 07:49:49PM +0530, vignesh C wrote:
> On Thu, Jul 15, 2021 at 5:57 PM vignesh C  wrote:
> >
> > On Tue, Jul 6, 2021 at 8:09 PM vignesh C  wrote:
> > >
> > > On Wed, Jun 30, 2021 at 8:23 PM vignesh C  wrote:
> > > >
> > > > On Sun, Jun 6, 2021 at 11:55 AM vignesh C  wrote:
> > > > >
> > > > > On Fri, May 7, 2021 at 6:44 PM vignesh C  wrote:
> > > > > >
> > > > > > Thanks for the comments, the attached patch has the fix for the 
> > > > > > same.
> > > > >
> 
> The patch was not applying on the head, attached patch which is rebased on 
> HEAD.
> 

Hi,

I'm testing this patch now. It doesn't apply cleanly but is the
documentation part, so while a rebase would be good it doesn't avoid me
to test.

A couple of questions:

+check_publications(WalReceiverConn *wrconn, List *publications,
+  bool validate_publication)
[...]
+connect_and_check_pubs(Subscription *sub, List *publications,
+  bool validate_publication)

I wonder why validate_publication is passed as an argument just to
return if it's false, why not just test it before calling those
functions? Maybe is just a matter of style.

+get_publications_str(List *publications, StringInfo dest, bool quote_literal)

what's the purpose of the quote_literal argument?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Sep 27, 2021 at 04:19:27PM -0400, Tom Lane wrote:
>> I tried the same thing (i.e., re-enable bloom's TAP test) on my laptop
>> just now, and it passed fine.  The laptop is not exactly the same
>> as longfin was in 2018, but it ought to be close enough.  Not sure
>> what to make of that --- maybe the failure is only intermittent,
>> or else we fixed the underlying issue since then.

> Honestly, I have no idea what change in the backend matters here.  And
> it is not like bloom has changed in any significant way since d3c09b9.

I went so far as to check out 03faa4a8dd on longfin's host, and I find
that I cannot reproduce the failure shown at

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-11-25+23%3A59%3A03

So that's the same hardware, and identical PG source tree, and different
results.  This seems to leave only two theories standing:

1.  It was a since-fixed macOS bug.  (Unlikely, especially if we also saw
it on other platforms.)

2.  The failure manifested only in the buildfarm, not under manual "make
check".  This is somewhat more plausible, especially since subsequent
buildfarm script changes might then explain why it went away.  But I have
no idea what the "subsequent script changes" might've been.

regards, tom lane




Re: typos (and more)

2021-09-27 Thread Justin Pryzby
On Tue, Sep 28, 2021 at 11:15:39AM +0900, Michael Paquier wrote:
> On Mon, Sep 27, 2021 at 07:50:02PM -0500, Justin Pryzby wrote:
> > Attached is an updated patch fixing more of the same.
> 
> Does this include everything you have spotted, as well as everything
> from the previous patches 0008 and 0012 posted?

That's an "expanded" version of 0008.

It doesn't include 0012, which is primarily about fixing incorrect references
to "index expressions" that should refer to stats expressions.  Naturally 0012
also uses the phrase "statistics objects", and fixes one nearby reference
that's not itself about indexes, which could arguably be in 0008 instead..

-- 
Justin




Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2021-09-27 Thread Justin Pryzby
On Fri, Sep 17, 2021 at 12:05:04PM +0200, Pavel Stehule wrote:
> I can live with the proposed patch, and I understand why  ++ was
> introduced. But I am still not sure it is really user friendly. I prefer to
> extend \dA and \dn with some columns (\dA has only two columns and \dn has
> two columns too), and then we don't need special ++ variants for sizes.
> Using three levels of detail looks not too practical (more when the basic
> reports \dA and \dn) are really very simple).

You're suggesting to include the ACL+description in \dn and handler+description
and \dA.

Another option is to add pg_schema_size() and pg_am_size() without shortcuts in
psql.  That would avoid showing a potentially huge ACL when all one wants is
the schema size, and would serve my purposes well enough to write
| SELECT pg_namespace_size(oid), nspname FROM pg_namespace ORDER BY 1 DESC 
LIMIT 9;

-- 
Justin




Re: two_phase commit parameter used in subscription for a publication which is on < 15.

2021-09-27 Thread Amit Kapila
On Mon, Sep 27, 2021 at 11:40 PM Euler Taveira  wrote:
>
> On Mon, Sep 27, 2021, at 4:09 AM, tushar wrote:
>
> so are we silently ignoring this parameter as it is not supported on v14 ?
>
> Yes. Because two_phase is a supported parameter for v15 (your current
> subscriber).  The issue is that this parameter are not forwarded to publisher
> because its version (v14) does not support it. Since we do not have a
> connection before parse_subscription_options(), publisher server version is
> unknown. Hence, it does not know if that specific parameter is supported on
> publisher. I'm not sure it is worth parsing the options again after a
> replication connection is available just to check those parameters that don't
> work on all supported server versions.
>
> IMO we can provide messages during the connection (see
> libpqrcv_startstreaming()) instead of while executing CREATE/ALTER
> SUBSCRIPTION. Something like:
>
>  if (options->proto.logical.twophase &&
>  PQserverVersion(conn->streamConn) >= 15)
>  appendStringInfoString(&cmd, ", two_phase 'on'");
>  else if (options->proto.logical.twophase)
>  ereport(DEBUG1,
>  (errmsg_internal("parameter \"two_phase\" is not 
> supported on the publisher")));
>
> It is a DEBUG message because it can be annoying when the subscriber cannot
> connect to the publisher.
>
> The output plugin also raises an error if the subscriber sends the two_phase
> parameter. See pgoutput_startup(). The subscriber could probably send all
> parameters and the output plugin would be responsible to report an error. I
> think the author decided to not do it because it is not an user-friendly
> approach.
>

True, and the same behavior was already there for 'binary' and
'streaming' options. Shall we document this instead of DEBUG message
or probably along with DEBUG message?

-- 
With Regards,
Amit Kapila.




Re: pgcrypto support for bcrypt $2b$ hashes

2021-09-27 Thread Daniel Fone
Hi Daniel,

Thanks for the feedback.

> On 26/09/2021, at 12:09 AM, Daniel Gustafsson  wrote:
> 
> But 2b and 2a hashes aren't equal, although very similar.  2a should have the
> many-buggy to one-correct collision safety and 2b hashes shouldn't.  The fact
> that your hashes work isn't conclusive evidence.

I was afraid this might be a bit naive. Re-reading the crypt_blowfish release 
notes, it’s principally the changes introducing $2y$ into 1.2 that we need, 
with support for OpenBSD $2b$ introduced in 1.3. Do I understand this correctly?

> Upgrading our crypt_blowfish.c to the upstream 1.3 version would be the 
> correct
> fix IMO, but since we have a few local modifications it's not a drop-in.  I
> don't think it would be too hairy, but one needs to be very careful when
> dealing with crypto.

My C experience is limited, but I can make an initial attempt if the effort 
would be worthwhile. Is this realistically a patch that a newcomer to the 
codebase should attempt?

> Actually it is, in table F.16 in the below documentation page we refer to our
> supported level as "Blowfish-based, variant 2a”.

Sorry I wasn’t clear. My point was that the docs only mention $2a$, and $2x$ 
isn’t mentioned even though pgcrypto supports it. As part of the upgrade to 
1.3, perhaps the docs can be updated to mention variants x, y, and b as well.

Thanks,

Daniel






Re: Column Filtering in Logical Replication

2021-09-27 Thread Amit Kapila
On Mon, Sep 27, 2021 at 5:53 PM Alvaro Herrera  wrote:
>
> On 2021-Sep-27, Amit Kapila wrote:
>
> > I am not sure what makes you say that we can't distinguish the above
> > cases when there is already a separate rule for CURRENT_SCHEMA? I
> > think you can distinguish by tracking the previous objects as we are
> > already doing in the patch. But one thing that is not clear to me is
> > is the reason to introduce a new type PUBLICATIONOBJ_CURRSCHEMA when
> > we use PUBLICATIONOBJ_REL_IN_SCHEMA and PUBLICATIONOBJ_CONTINUATION to
> > distinguish all cases of CURRENT_SCHEMA. Alvaro might have something
> > in mind for this which is not apparent and that might have caused
> > confusion to you as well?
>
> My issue is what happens if you have a schema that is named
> CURRENT_SCHEMA.  In the normal case where you do ALL TABLES IN SCHEMA
> "CURRENT_SCHEMA" you would end up with a String containing
> "CURRENT_SCHEMA", so how do you distinguish that from ALL TABLES IN
> SCHEMA CURRENT_SCHEMA, which does not refer to the schema named
> "CURRENT_SCHEMA" but in Vignesh's proposal also uses a String containing
> "CURRENT_SCHEMA"?
>
> Now you could say "but who would be stupid enough to do that??!",
>

But it is not allowed to create schema or table with the name
CURRENT_SCHEMA, so not sure if we need to do anything special for it.
However, if we want to handle it as a separate enum then the handling
would be something like:

| ALL TABLES IN_P SCHEMA CURRENT_SCHEMA
{
$$ =
makeNode(PublicationObjSpec);
$$->pubobjtype =
PUBLICATIONOBJ_CURRSCHEMA;
}
...
...
| CURRENT_SCHEMA
{
$$ =
makeNode(PublicationObjSpec);
$$->pubobjtype =
PUBLICATIONOBJ_CONTINUATION;
}
;

Now, during post-processing, the PUBLICATIONOBJ_CONTINUATION will be
distinguished as CURRENT_SCHEMA because both rangeVar and name will be
NULL. Do you have other ideas to deal with it? Vignesh has already
point in his email [1] why we can't keep pubobjtype as
PUBLICATIONOBJ_CURRSCHEMA in the second case, so I used
PUBLICATIONOBJ_CONTINUATION.

[1] - 
https://www.postgresql.org/message-id/CALDaNm06shp%2BALwC2s-dV-S4k2o6bcmXnXGX4ETkoXxKHQfjfA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Add jsonlog log_destination for JSON server logs

2021-09-27 Thread Michael Paquier
On Fri, Sep 17, 2021 at 04:36:57PM -0400, Sehrope Sarkuni wrote:
> It was helpful to split them out while working on the patch but I see your
> point upon re-reading through the result.
> 
> Attached version (renamed to 002) adds only three static functions for
> checking rotation size, performing the actual rotation, and closing. Also
> one other to handle generating the logfile names with a suffix to handle
> the pfree-ing (wrapping logfile_open(...)).
> 
> The rest of the changes happen in place using the new structures.

I have looked at that in details, and found that the introduction of
FileLogDestination makes the code harder to follow, and that the
introduction of the file extension, the destination name and the
expected target destination LOG_DESTINATION_* had a limited impact
because they are used in few places.  The last two useful pieces are
the FILE* handle and the last file name for current_logfiles.

Attached are updated patches.  The logic of 0001 to refactor the fd
fetch/save logic when forking the syslogger in EXEC_BACKEND builds is
unchanged.  I have tweaked the patch with more comments and different
routine names though.  Patch 0002 refactors the main point that
introduced FileLogDestination by refactoring the per-destination file
rotation, not forgetting the fact that the file last name and handle
for stderr can never be cleaned up even if LOG_DESTINATION_STDERR is
disabled.  Grepping after LOG_DESTINATION_CSVLOG in the code tree, I'd
be fine to live with this level of abstraction as each per-destination
change are grouped with each other so they are hard to miss.

0001 is in a rather commitable shape, and I have made the code
consistent with HEAD.  However, I think that its handling of
_get_osfhandle() is clunky for 64-bit compilations as long is 32b in
WIN32 but intptr_t is platform-dependent as it could be 32b or 64b, so
atoi() would overflow if the handle is larger than INT_MAX for 64b
builds:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/standard-types
This problem deserves a different thread.

It would be good for 0002 if an extra pair of eyes looks at it.  While
on it, I have renamed the existing last_file_name to
last_sys_file_name in 0002 to make the naming more consistent with
syslogFile.  It is independent of 0001, so it could be done first as
well.
--
Michael
From 57bcaf49bd10ec3e6afb14f3ec384cb88a65efe6 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 28 Sep 2021 12:28:21 +0900
Subject: [PATCH v7 1/2] Refactor fd handling when forking syslogger in
 EXEC_BACKEND build

---
 src/backend/postmaster/syslogger.c | 122 +++--
 1 file changed, 62 insertions(+), 60 deletions(-)

diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index bca3883572..ddb47e3cb0 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -130,6 +130,8 @@ static volatile sig_atomic_t rotation_requested = false;
 
 /* Local subroutines */
 #ifdef EXEC_BACKEND
+static long syslogger_fdget(FILE *file);
+static FILE *syslogger_fdopen(int fd);
 static pid_t syslogger_forkexec(void);
 static void syslogger_parseArgs(int argc, char *argv[]);
 #endif
@@ -733,6 +735,60 @@ SysLogger_Start(void)
 
 #ifdef EXEC_BACKEND
 
+/*
+ * syslogger_fdget() -
+ *
+ * Utility wrapper to grab the file descriptor of an opened error output
+ * file.  Used when building the command to fork the logging collector.
+ */
+static long
+syslogger_fdget(FILE *file)
+{
+#ifndef WIN32
+	if (file != NULL)
+		return (long) fileno(file);
+	else
+		return -1;
+#else			/* WIN32 */
+	if (file != NULL)
+		return (long) _get_osfhandle(_fileno(file));
+	else
+		return 0;
+#endif			/* WIN32 */
+}
+
+/*
+ * syslogger_fdopen() -
+ *
+ * Utility wrapper to re-open an error output file, using the given file
+ * descriptor.  Used when parsing arguments in a forked logging collector.
+ */
+static FILE *
+syslogger_fdopen(int fd)
+{
+	FILE	   *file = NULL;
+
+#ifndef WIN32
+	if (fd != -1)
+	{
+		file = fdopen(fd, "a");
+		setvbuf(file, NULL, PG_IOLBF, 0);
+	}
+#else			/* WIN32 */
+	if (fd != 0)
+	{
+		fd = _open_osfhandle(fd, _O_APPEND | _O_TEXT);
+		if (fd > 0)
+		{
+			file = fdopen(fd, "a");
+			setvbuf(file, NULL, PG_IOLBF, 0);
+		}
+	}
+#endif			/* WIN32 */
+
+	return file;
+}
+
 /*
  * syslogger_forkexec() -
  *
@@ -751,34 +807,11 @@ syslogger_forkexec(void)
 	av[ac++] = NULL;			/* filled in by postmaster_forkexec */
 
 	/* static variables (those not passed by write_backend_variables) */
-#ifndef WIN32
-	if (syslogFile != NULL)
-		snprintf(filenobuf, sizeof(filenobuf), "%d",
- fileno(syslogFile));
-	else
-		strcpy(filenobuf, "-1");
-#else			/* WIN32 */
-	if (syslogFile != NULL)
-		snprintf(filenobuf, sizeof(filenobuf), "%ld",
- (long) _get_osfhandle(_fileno(syslogFile)));
-	else
-		strcpy(filenobuf, "0");
-#endif			/* WIN32 */
+	snprintf(filenobuf, sizeof(filenobuf), "%ld",
+			 syslogger_fdget(syslogFile));
 	a

Incorrect fd handling in syslogger.c for Win64 under EXEC_BACKEND

2021-09-27 Thread Michael Paquier
Hi all,

While reviewing a patch that refactors syslogger.c, we use the
following code to pass down a HANDLE to a forked syslogger as of
syslogger_forkexec():
if (syslogFile != NULL)
snprintf(filenobuf, sizeof(filenobuf), "%ld",
 (long) _get_osfhandle(_fileno(syslogFile)));

Then, in the kicked syslogger, the parsing is done as follows in
syslogger_parseArgs() for WIN32, with a simple atoi():
fd = atoi(*argv++);

_get_osfhandle() returns intptr_t whose size is system-dependent, as
it would be 32b for Win32 and 64b for Win64:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/get-osfhandle
https://docs.microsoft.com/en-us/cpp/c-runtime-library/standard-types

As long is 4 bytes on Windows, we would run into overflows here if the
handle is out of the normal 32b range.  So the logic as coded is fine
for Win32, but it could be wrong under Win64.

Am I missing something obvious?  One thing that we could do here is
to do the parsing with pg_lltoa() while printing the argument with
INT64_FORMAT, no?

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: Column Filtering in Logical Replication

2021-09-27 Thread Amit Kapila
On Mon, Sep 27, 2021 at 6:41 PM Rahila Syed  wrote:
>
>>
>>
>> I don't think there is one.  I think the latest is what I posted in
>> https://postgr.es/m/202109061751.3qz5xpugwx6w@alvherre.pgsql (At least I
>> don't see any reply from Rahila with attachments after that), but that
>> wasn't addressing a bunch of review comments that had been made; and I
>> suspect that Amit K has already committed a few conflicting patches
>> after that.
>>
> Yes, the v5 version of the patch attached by Alvaro is the latest one.
> IIUC, the review comments that are yet to be addressed apart from the ongoing 
> grammar
> discussion, are as follows:
>
> 1. Behaviour on dropping a column from the table, that is a part of column 
> filter.
> In the latest patch, the entire table is dropped from publication on dropping 
> a column
> that is a part of the column filter. However, there is preference for another 
> approach
> to drop just the column from the filter on DROP column CASCADE(continue to 
> filter
> the other columns), and an error for DROP RESTRICT.
>

I am not sure if we can do this as pointed by me in one of the
previous emails [1]. I think additionally, you might want to take some
action if the replica identity is changed as requested in the same
email [1].

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KCGF43pfLv8%2BmixcTMs%3DNkd6YdWL53LhiT1DvnuTg01g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-09-27 Thread Amit Kapila
On Mon, Sep 27, 2021 at 7:19 PM Euler Taveira  wrote:
>
> On Mon, Sep 27, 2021, at 10:34 AM, Tomas Vondra wrote:
>
> Hi,
>
> I see no one responded to this important part of my review so far:
>
> I'm still preparing a new patch and a summary.
>
> Am I the only one finding the current behavior strange? What's the
> reasoning supporting the current approach?
>
> I think it is an oversight from my side. It used to work the way you mentioned
> but I changed it. I'll include this change in the next patch.
>

+1.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2021-09-27 Thread Pavel Stehule
út 28. 9. 2021 v 4:46 odesílatel Justin Pryzby 
napsal:

> On Fri, Sep 17, 2021 at 12:05:04PM +0200, Pavel Stehule wrote:
> > I can live with the proposed patch, and I understand why  ++ was
> > introduced. But I am still not sure it is really user friendly. I prefer
> to
> > extend \dA and \dn with some columns (\dA has only two columns and \dn
> has
> > two columns too), and then we don't need special ++ variants for sizes.
> > Using three levels of detail looks not too practical (more when the basic
> > reports \dA and \dn) are really very simple).
>
> You're suggesting to include the ACL+description in \dn and
> handler+description
> and \dA.
>

yes


> Another option is to add pg_schema_size() and pg_am_size() without
> shortcuts in
> psql.  That would avoid showing a potentially huge ACL when all one wants
> is
> the schema size, and would serve my purposes well enough to write
> | SELECT pg_namespace_size(oid), nspname FROM pg_namespace ORDER BY 1 DESC
> LIMIT 9;
>

It can work too.

I think the long ACL is a customer design issue, but can be. But the same
problem is in \dt+, and I don't see an objection against this design.

Maybe I am too subjective, because 4 years I use pspg, and wide reports are
not a problem for me. When the size is on the end, then it is easy to see
it in pspg.

I like to see size in \dn+ report, and I like to use pg_namespace_size
separately too. Both can be very practical functionality.

I think so \dt+ and \l+ is working very well now, and I am not too happy to
break it (partially break it).  Although the proposed change is very
minimalistic.

But your example "SELECT pg_namespace_size(oid), nspname FROM pg_namespace
ORDER BY 1 DESC LIMIT 9" navigates me to the second idea (that just
enhances the previous). Can be nice if you can have prepared views on the
server side that are +/- equivalent to psql reports, and anybody can simply
write their own custom reports.

some like

SELECT schema, tablename, owner, pg_size_pretty(size) FROM
pg_description.tables ORDER BY size DESC LIMIT 10
SELECT schema, owner, pg_size_pretty(size) FROM pg_description.schemas
ORDER BY size DESC LIMIT 10

In the future, it can simplify psql code, and it allows pretty nice
customization in any client for a lot of purposes.

Regards

Pavel





> --
> Justin
>


Re: POC: Cleaning up orphaned files using undo logs

2021-09-27 Thread Amit Kapila
On Mon, Sep 27, 2021 at 7:43 PM Antonin Houska  wrote:
>
> Amit Kapila  wrote:
>
> Although the postgres core probably does not raise FATAL errors too often (OOM
> conditions seem to be the typical cause), I'm still not enthusiastic about
> idea that the undo feature turns such errors into PANIC.
>
> I wonder what the reason to avoid undoing transaction on FATAL is. If it's
> about possibly long duration of the undo execution, deletion of orphaned files
> (relations or the whole databases) via undo shouldn't make things worse
> because currently FATAL also triggers this sort of cleanup immediately, it's
> just implemented in different ways.
>

During FATAL, we don't want to perform more operations which can make
the situation worse. Say, we are already short of memory (OOM), and
undo execution can further try to allocate the memory won't do any
good. Depending on the implementation, sometimes undo execution might
need to perform WAL writes or data write which we don't want to do
during FATAL error processing.

-- 
With Regards,
Amit Kapila.




Re: Failed transaction statistics to measure the logical replication progress

2021-09-27 Thread Amit Kapila
On Tue, Sep 28, 2021 at 7:25 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, September 27, 2021 1:42 PM Amit Kapila  
> wrote:
> > On Wed, Sep 22, 2021 at 10:10 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > Just conducted some cosmetic changes
> > > and rebased my patch, using v14 patch-set in [1].
> > >
> >
> > IIUC, this proposal will allow new xact stats for subscriptions via
> > pg_stat_subscription. One thing that is not clear to me in this patch is 
> > that why
> > you choose a different way to store these stats than the existing stats in 
> > that
> > view? AFAICS, the other existing stats are stored in-memory in
> > LogicalRepWorker whereas these new stats are stored/fetched via stats
> > collector means these will persist. Isn't it better to be consistent here? 
> > I am not
> > sure which is a more appropriate way to store these stats and I would like 
> > to
> > hear your and other's thoughts on that matter but it appears a bit awkward 
> > to
> > me that some of the stats in the same view are persistent and others are
> > in-memory.
> Yeah, existing stats values of pg_stat_subscription are in-memory.
> I thought xact stats should survive over the restart,
> to summarize and show all accumulative transaction values
> on one subscription for user. But, your pointing out is reasonable,
> mixing two types can be awkward and lack of consistency.
>
> Then, if, we proceed in this direction,
> the place to implement those stats
> would be on the LogicalRepWorker struct, instead ?
>

Or, we can make existing stats persistent and then add these stats on
top of it. Sawada-San, do you have any thoughts on this matter?

> On one hand, what confuses me is that
> in another thread of feature to skip xid,
> I wondered if Sawada-san has started to take
> those xact stats into account (probably in his patch-set),
> because the stats values this thread is taking care of are listed up in the 
> thread.
>

I don't think the Skip Xid patch is going to take care of these
additional stats but Sawada-San can confirm.

-- 
With Regards,
Amit Kapila.




Re: Incorrect fd handling in syslogger.c for Win64 under EXEC_BACKEND

2021-09-27 Thread Michael Paquier
On Tue, Sep 28, 2021 at 12:41:40PM +0900, Michael Paquier wrote:
> Am I missing something obvious?  One thing that we could do here is
> to do the parsing with pg_lltoa() while printing the argument with
> INT64_FORMAT, no?

I wrote that a bit too quickly.  After looking at it, what we could
use to parse the handle pointer is scanint8() instead, even if that's
a bit ugly.  I also found the code a bit confused regarding "fd", that
could be manipulated as an int or intptr_t, so something like the
attached should improve the situation.

Opinions welcome.
--
Michael
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index bca3883572..9e897e5b17 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -50,6 +50,7 @@
 #include "storage/pg_shmem.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
+#include "utils/int8.h"
 #include "utils/ps_status.h"
 #include "utils/timestamp.h"
 
@@ -759,8 +760,8 @@ syslogger_forkexec(void)
 		strcpy(filenobuf, "-1");
 #else			/* WIN32 */
 	if (syslogFile != NULL)
-		snprintf(filenobuf, sizeof(filenobuf), "%ld",
- (long) _get_osfhandle(_fileno(syslogFile)));
+		snprintf(filenobuf, sizeof(filenobuf), INT64_FORMAT,
+ (int64) _get_osfhandle(_fileno(syslogFile)));
 	else
 		strcpy(filenobuf, "0");
 #endif			/* WIN32 */
@@ -774,8 +775,8 @@ syslogger_forkexec(void)
 		strcpy(csvfilenobuf, "-1");
 #else			/* WIN32 */
 	if (csvlogFile != NULL)
-		snprintf(csvfilenobuf, sizeof(csvfilenobuf), "%ld",
- (long) _get_osfhandle(_fileno(csvlogFile)));
+		snprintf(csvfilenobuf, sizeof(csvfilenobuf), INT64_FORMAT,
+ (int64) _get_osfhandle(_fileno(csvlogFile)));
 	else
 		strcpy(csvfilenobuf, "0");
 #endif			/* WIN32 */
@@ -795,7 +796,11 @@ syslogger_forkexec(void)
 static void
 syslogger_parseArgs(int argc, char *argv[])
 {
+#ifndef WIN32
 	int			fd;
+#else
+	intptr_t	handle;
+#endif
 
 	Assert(argc == 5);
 	argv += 3;
@@ -821,20 +826,24 @@ syslogger_parseArgs(int argc, char *argv[])
 		setvbuf(csvlogFile, NULL, PG_IOLBF, 0);
 	}
 #else			/* WIN32 */
-	fd = atoi(*argv++);
-	if (fd != 0)
+	(void) scanint8(*argv++, false, (int64 *) &handle);
+	if (handle != 0)
 	{
-		fd = _open_osfhandle(fd, _O_APPEND | _O_TEXT);
+		int		fd;
+
+		fd = _open_osfhandle(handle, _O_APPEND | _O_TEXT);
 		if (fd > 0)
 		{
 			syslogFile = fdopen(fd, "a");
 			setvbuf(syslogFile, NULL, PG_IOLBF, 0);
 		}
 	}
-	fd = atoi(*argv++);
-	if (fd != 0)
+	(void) scanint8(*argv++, false, (int64 *) &handle);
+	if (handle != 0)
 	{
-		fd = _open_osfhandle(fd, _O_APPEND | _O_TEXT);
+		int		fd;
+
+		fd = _open_osfhandle(handle, _O_APPEND | _O_TEXT);
 		if (fd > 0)
 		{
 			csvlogFile = fdopen(fd, "a");


signature.asc
Description: PGP signature


Re: Failed transaction statistics to measure the logical replication progress

2021-09-27 Thread Masahiko Sawada
On Tue, Sep 28, 2021 at 1:54 PM Amit Kapila  wrote:
>
> On Tue, Sep 28, 2021 at 7:25 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Monday, September 27, 2021 1:42 PM Amit Kapila  
> > wrote:
> > > On Wed, Sep 22, 2021 at 10:10 AM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> > > > Just conducted some cosmetic changes
> > > > and rebased my patch, using v14 patch-set in [1].
> > > >
> > >
> > > IIUC, this proposal will allow new xact stats for subscriptions via
> > > pg_stat_subscription. One thing that is not clear to me in this patch is 
> > > that why
> > > you choose a different way to store these stats than the existing stats 
> > > in that
> > > view? AFAICS, the other existing stats are stored in-memory in
> > > LogicalRepWorker whereas these new stats are stored/fetched via stats
> > > collector means these will persist. Isn't it better to be consistent 
> > > here? I am not
> > > sure which is a more appropriate way to store these stats and I would 
> > > like to
> > > hear your and other's thoughts on that matter but it appears a bit 
> > > awkward to
> > > me that some of the stats in the same view are persistent and others are
> > > in-memory.
> > Yeah, existing stats values of pg_stat_subscription are in-memory.
> > I thought xact stats should survive over the restart,
> > to summarize and show all accumulative transaction values
> > on one subscription for user. But, your pointing out is reasonable,
> > mixing two types can be awkward and lack of consistency.

I remembered that we have discussed a similar thing when discussing
pg_stat_replication_slots view[1]. The slot statistics such as
spill_txn, spill_count, and spill_bytes were originally shown in
pg_stat_replication, mixing cumulative counters and dynamic counters.
But we concluded to separate these cumulative values from
pg_stat_replication view and introduce a new pg_stat_replication_slots
view.

> >
> > Then, if, we proceed in this direction,
> > the place to implement those stats
> > would be on the LogicalRepWorker struct, instead ?
> >
>
> Or, we can make existing stats persistent and then add these stats on
> top of it. Sawada-San, do you have any thoughts on this matter?

I think that making existing stats including received_lsn and
last_msg_receipt_time persistent by using stats collector could cause
massive reporting messages. We can report these messages with a
certain interval to reduce the amount of messages but we will end up
seeing old stats on the view. Another idea could be to have a separate
view, say pg_stat_subscription_xact but I'm not sure it's a better
idea.

>
> > On one hand, what confuses me is that
> > in another thread of feature to skip xid,
> > I wondered if Sawada-san has started to take
> > those xact stats into account (probably in his patch-set),
> > because the stats values this thread is taking care of are listed up in the 
> > thread.
> >
>
> I don't think the Skip Xid patch is going to take care of these
> additional stats but Sawada-San can confirm.

Yes, I don't take care of these additional stats discussed on this
thread. The reason why I recently mentioned these statistics on the
skip_xid thread is that since I thought that this patch will be built
on top of the subscription error reporting patch that I'm proposing I
discussed the extensibility of my patch.

Regards,

[1] 
https://www.postgresql.org/message-id/CABUevEwayASgA2GHAQx%3DVtCp6OQh5PVfem6JDQ97gFHka%3D6n1w%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Gather performance analysis

2021-09-27 Thread Dilip Kumar
On Mon, Sep 27, 2021 at 10:52 PM Robert Haas  wrote:

>
> And most of the time, that's probably a good bet. But, if you do
> somehow hit the losing case repeatedly, then you could see a
> significant regression. And that might explain Tomas's results.
> Perhaps for some reason they just happen to hit that case over and
> over again. If that's true, it would be useful to know why it happens
> in that case and not others, because then maybe we could avoid the
> problem somehow. However, I'm not sure how to figure that out, and I'm
> not even entirely sure it's important to figure it out.
>

Yeah, if it is losing in some cases then it is definitely good to know
the reason, I was just looking into the performance numbers shared by
Tomas in query-results, I can see the worst case is
with 1000 rows, 10 columns and 4 threads and queue size 64k.
Basically, if we see the execution time with head is ~804ms whereas
with patch it is ~1277 ms.  But then I just tried to notice the
pattern with different queue size so number are like below,

  16k 64k   256k1024k
Head   1232.779804.241134.723901.257
Patch  1371.4931277.705862.598783.481

So what I have noticed is that in most of the cases on head as well as
with the patch, increasing the queue size make it faster, but with
head suddenly for this particular combination of rows, column and
thread the execution time is very low for 64k queue size and then
again the execution time increased with 256k queue size and then
follow the pattern.  So this particular dip in the execution time on
the head looks a bit suspicious to me.  I mean how could we justify
this sudden big dip in execution time w.r.t the other pattern.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com