date:20210104

On Sat, Jan 2, 2021 at 11:19 AM Bharath Rupireddy
 wrote:
> I'm sorry for the mess. I missed adding the new files into the v6-0001
> patch. Please ignore the v6 patch set and consder the v7 patch set for
> further review. Note that 0002 and 0003 patches have no difference
> from v5 patch set.

It seems like cf bot was failing on v7 patches. On Linux, it fails
while building documentation in 0001 patch, I corrected that.  On
FreeBSD, it fails in one of the test cases I added, since it was
unstable, I corrected it now.

Attaching v8 patch set. Hopefully, cf bot will be happy with v8.

Please consider the v8 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From f784809cd81ed18ecbd1b3c7e87818a00b671fa0 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 5 Jan 2021 12:36:16 +0530
Subject: [PATCH v8 1/3] postgres_fdw function to discard cached connections

This patch introduces new function postgres_fdw_disconnect() when
called with a foreign server name discards the associated
connections with the server name. When called without any argument,
discards all the existing cached connections.

This patch also adds another function postgres_fdw_get_connections()
to get the list of all cached connections by corresponding foreign
server names.
---
 contrib/postgres_fdw/Makefile |   2 +-
 contrib/postgres_fdw/connection.c | 313 -
 .../postgres_fdw/expected/postgres_fdw.out| 424 +-
 ...dw--1.0.sql => postgres_fdw--1.0--1.1.sql} |   6 +-
 contrib/postgres_fdw/postgres_fdw--1.1.sql|  33 ++
 contrib/postgres_fdw/postgres_fdw.control |   2 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql | 171 +++
 doc/src/sgml/postgres-fdw.sgml|  54 +++
 8 files changed, 982 insertions(+), 23 deletions(-)
 rename contrib/postgres_fdw/{postgres_fdw--1.0.sql => postgres_fdw--1.0--1.1.sql} (60%)
 create mode 100644 contrib/postgres_fdw/postgres_fdw--1.1.sql

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..52d3dac0bd 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -14,7 +14,7 @@ PG_CPPFLAGS = -I$(libpq_srcdir)
 SHLIB_LINK_INTERNAL = $(libpq)
 
 EXTENSION = postgres_fdw
-DATA = postgres_fdw--1.0.sql
+DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql
 
 REGRESS = postgres_fdw
 
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 266f66cc62..64e1d71514 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -22,6 +22,7 @@
 #include "postgres_fdw.h"
 #include "storage/fd.h"
 #include "storage/latch.h"
+#include "utils/builtins.h"
 #include "utils/datetime.h"
 #include "utils/hsearch.h"
 #include "utils/inval.h"
@@ -57,6 +58,8 @@ typedef struct ConnCacheEntry
 	bool		have_error;		/* have any subxacts aborted in this xact? */
 	bool		changing_xact_state;	/* xact state change in process */
 	bool		invalidated;	/* true if reconnect is pending */
+	/* Server oid to get the associated foreign server name. */
+	Oid			serverid;
 	uint32		server_hashvalue;	/* hash value of foreign server OID */
 	uint32		mapping_hashvalue;	/* hash value of user mapping OID */
 } ConnCacheEntry;
@@ -73,6 +76,12 @@ static unsigned int prep_stmt_number = 0;
 /* tracks whether any work is needed in callback functions */
 static bool xact_got_connection = false;
 
+/*
+ * SQL functions
+ */
+PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
+
 /* prototypes of private functions */
 static void make_new_connection(ConnCacheEntry *entry, UserMapping *user);
 static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
@@ -94,6 +103,8 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 	 PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static bool disconnect_cached_connections(uint32 hashvalue, bool all,
+		  bool *is_in_use);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -273,6 +284,7 @@ make_new_connection(ConnCacheEntry *entry, UserMapping *user)
 	entry->have_error = false;
 	entry->changing_xact_state = false;
 	entry->invalidated = false;
+	entry->serverid = server->serverid;
 	entry->server_hashvalue =
 		GetSysCacheHashValue1(FOREIGNSERVEROID,
 			  ObjectIdGetDatum(server->serverid));
@@ -789,6 +801,13 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 	if (!xact_got_connection)
 		return;
 
+	/*
+	 * Quick exit if the cache has been destroyed in
+	 * disconnect_cached_connections.
+	 */
+	if (!ConnectionHash)
+		return;
+
 	/*
 	 * Scan all connection cache entries to find open remote transactions, and
 	 * close them.
@@ -985,6 +1004,13 @@ pgfdw_subxact_callback(SubXactEvent event, SubTransactionId mySubid,
 	if

Re: Parallel Inserts in CREATE TABLE AS


On 04-01-2021 14:53, Bharath Rupireddy wrote:

On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming  wrote:

On 04-01-2021 12:16, Hou, Zhijie wrote:


wrt v18-0002patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs

what i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?


IMO, The reason is we want to make sure we only ignore the cost when Gather is 
the top node.
And it seems the generate_useful_gather_paths called in 
apply_scanjoin_target_to_paths is the right place which can only create top 
node Gather.
So we change the flag in apply_scanjoin_target_to_paths around 
generate_useful_gather_paths to identify the top node.


Right. We wanted to ignore parallel tuple cost for only the upper Gather path.


I was wondering actually if we need the state machine. Reason is that as
AFAICS the code could be placed in create_gather_path, where you can
also check if it is a top gather node, whether the dest receiver is the
right type, etc? To me that seems like a nicer solution as its makes
that all logic that decides whether or not a parallel CTAS is valid is
in a single place instead of distributed over various places.


IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.

I may be wrong. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



So the way I understand it the requirements are:
- it needs to be the top-most gather
- it should not do anything with the rows after the gather node as this 
would make the parallel inserts conceptually invalid.


Right now we're trying to judge what might be added on-top that could 
change the rows by inspecting all parts of the root object that would 
cause anything to be added, and add a little statemachine to track the 
state of that knowledge. To me this has the downside that the list in 
HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to 
make sure it stays up-to-date, which could result in regressions if not 
tracked carefully.


Personally I would therefore go for a design which is safe in the sense 
that regressions are not as easily introduced. IMHO that could be done 
by inspecting the planned query afterwards, and then judging whether or 
not the parallel inserts are actually the right thing to do.


Another way to create more safety against regressions would be to add an 
assert upon execution of the query that if we do parallel inserts that 
only a subset of allowed nodes exists above the gather node.


Some (not extremely fact checked) approaches as food for thought:
1. Plan the query as normal, and then afterwards look at the resulting 
plan to see if there are only nodes that are ok between the gather node 
and the top node, which afaics would only be things like append nodes.

Which would mean two things:
- at the end of subquery_planner before the final_rel is fetched, we add 
another pass like the grouping_planner called e.g. 
parallel_modify_planner or so, which traverses the query plan and checks 
if the inserts would indeed be executed parallel, and if so sets the 
cost of the gather to 0.
- we always keep around the best gathered partial path, or the partial 
path itself.


2. Generate both gather paths: one with zero cost for the inserts and 
one with costs. the one with zero costs would however be kept separately 
and added as prime candidate for the final rel. then we can check in the 
subquery_planner if the final candidate is different and then choose.


Kind regards,
Luc

Re: Moving other hex functions to /common

On Tue, Jan 5, 2021 at 4:47 PM Bruce Momjian  wrote:
> ... let's see how it likes this version.

cfbot ideally processes a new patch fairly quickly but I didn't think
of ".diff.gz" when writing the regexp to recognise patch files.  I
just tweaked the relevant regexp and it's building your patch now.
Sorry about that.  :-)

Re: Parallel Inserts in CREATE TABLE AS


On 05-01-2021 04:59, Bharath Rupireddy wrote:

On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
 wrote:



+   if (IS_PARALLEL_CTAS_DEST(gstate->dest) 
&&
+   ((DR_intorel *) 
gstate->dest)->into->rel &&
+   ((DR_intorel *) 
gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted?
because it seems from the intorel_startup function that that would be
set as soon as startup was done, which i assume (wrongly?) is always done?


Actually, that into clause rel variable is always being set in the gram.y for 
CTAS, Create Materialized View and SELECT INTO (because qualified_name 
non-terminal is not optional). My bad. I just added it as a sanity check. 
Actually, it's not required.

create_as_target:
 qualified_name opt_column_list table_access_method_clause
 OptWith OnCommitOption OptTableSpace
 {
 $$ = makeNode(IntoClause);
 $$->rel = $1;
create_mv_target:
 qualified_name opt_column_list table_access_method_clause 
opt_reloptions OptTableSpace
 {
 $$ = makeNode(IntoClause);
 $$->rel = $1;
into_clause:
 INTO OptTempTableName
 {
 $$ = makeNode(IntoClause);
$$->rel = $2;

I will change the below code:
+if (GetParallelInsertCmdType(gstate->dest) ==
+PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+((DR_intorel *) gstate->dest)->into &&
+((DR_intorel *) gstate->dest)->into->rel &&
+((DR_intorel *) gstate->dest)->into->rel->relname)
+{

to:
+if (GetParallelInsertCmdType(gstate->dest) ==
+PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+{

I will update this in the next version of the patch set.


Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Hi,

Reviewing further v20-0001:

I would still opt for moving the code for the parallel worker into a 
separate function, and then setting rStartup of the dest receiver to 
that function in ExecParallelGetInsReceiver, as its completely 
independent code. Just a matter of style I guess.


Maybe I'm not completely following why but afaics we want parallel 
inserts in various scenarios, not just CTAS? I'm asking because code like

+   if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+		pg_atomic_add_fetch_u64(>processed, 
queryDesc->estate->es_processed);
seems very specific to CTAS. For now that seems fine but I suppose that 
would be generalized soon after? Basically I would have expected the if 
to compare against PARALLEL_INSERT_CMD_UNDEF.


Apart from these small things v20-0001 looks (very) good to me.

v20-0002:
will reply on the specific mail-thread about the state machine

v20-0003 and v20-0004:
looks good to me.

Kind regards,
Luc

Re: pg_rewind restore_command issue in PG12

2021-01-04 Thread Michael Paquier

On Mon, Jan 04, 2021 at 04:12:34PM +0300, Amine Tengilimoglu wrote:
>When I read the pg_rewind PG12 doc.  It says:
> 
> "... but if the target cluster ran for a long time after the divergence,
> the old WAL files might no longer be present. In that case, they can be
> manually copied from the WAL archive to the pg_wal directory,* or fetched
> on startup by configuring **primary_conninfo
> 
> or restore_command
> *
> .".
> 
>   So I thought we could use restore_command. But when I try to use it , I
> see it doesn't work either.

I agree with your point that the docs of 9.6~12 are confusing here.
It makes no sense to mention restore_command or primary_conninfo to
fetch WAL segments for the target to allow pg_rewind to find the point
of divergence because the target is already offline when we look at
that.  Mentioning restore_command/primary_conninfo for recovery
purposes could make sense in the context in the follow-up paragraph
though, where the target gets restarted, after the rewind.  But the
uses are different.

The docs of 13~ got that right when -c has been introduced by
rewording this sentence as "or run pg_rewind with the -c option to
automatically retrieve them from the WAL archive".  So let's get rid
of ", or fetched on startup by configuring primary_conninfo or
restore_command." ("or fetched on startup by configuring
recovery.conf" in some older branches).  This confusion has been
introduced by 878bd9a, down to 9.6.

Heikki, what do you think?
--
Michael

signature.asc
Description: PGP signature

Re: Parallel Inserts in CREATE TABLE AS


On 04-01-2021 14:32, Bharath Rupireddy wrote:
On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming > wrote:

 > Sorry it took so long to get back to reviewing this.

Thanks for the comments.

 > wrt v18-0001patch:
 >
 > +               /*
 > +                * If the worker is for parallel insert in CTAS, then 
use the proper

 > +                * dest receiver.
 > +                */
 > +               intoclause = (IntoClause *) stringToNode(intoclausestr);
 > +               receiver = CreateIntoRelDestReceiver(intoclause);
 > +               ((DR_intorel *)receiver)->is_parallel_worker = true;
 > +               ((DR_intorel *)receiver)->object_id = fpes->objectid;
 > I would move this into a function called e.g.
 > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
 > createas.c.
 > I would then also split up intorel_startup into intorel_leader_startup
 > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
 > self->pub.rStartup to intorel_worker_startup.

My intention was to not add any new APIs to the dest receiver. I simply 
made the changes in intorel_startup, in which for workers it just does 
the minimalistic work and exit from it. In the leader most of the table 
creation and sanity check is kept untouched. Please have a look at the 
v19 patch posted upthread [1].




Looks much better, really nicely abstracted away in the v20 patch.


 > +       volatile pg_atomic_uint64       *processed;
 > why is it volatile?

Intention is to always read from the actual memory location. I referred 
it from the way pg_atomic_fetch_add_u64_impl, 
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their 
u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.




Okay I had not seen this syntax before for atomics with the volatile 
keyword but its apparently how the atomics abstraction works in postgresql.



 > +                       if (isctas)
 > +                       {
 > +                               intoclause = ((DR_intorel *) 
node->dest)->into;
 > +                               objectid = ((DR_intorel *) 
node->dest)->object_id;

 > +                       }
 > Given that you extract them each once and then pass them directly into
 > the parallel-worker, can't you instead pass in the destreceiver and
 > leave that logic to ExecInitParallelPlan?

That's changed entirely in the v19 patch set posted upthread [1]. Please 
have a look. I didn't pass the dest receiver, to keep the API generic, I 
passed parallel insert command type and a void * ptr which points to 
insertion command because the information we pass to workers depends on 
the insertion command (for instance, the information needed by workers 
is for CTAS into clause and object id and for Refresh Mat View object id).


 >
 > +                                       if 
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
 > +                                               ((DR_intorel *) 
gstate->dest)->into->rel &&
 > +                                               ((DR_intorel *) 
gstate->dest)->into->rel->relname)

 > why would rel and relname not be there? if no rows have been inserted?
 > because it seems from the intorel_startup function that that would be
 > set as soon as startup was done, which i assume (wrongly?) is always 
done?


Actually, that into clause rel variable is always being set in the 
gram.y for CTAS, Create Materialized View and SELECT INTO (because 
qualified_name non-terminal is not optional). My bad. I just added it as 
a sanity check. Actually, it's not required.


create_as_target:
*qualified_name* opt_column_list table_access_method_clause
             OptWith OnCommitOption OptTableSpace
                 {
                     $$ = makeNode(IntoClause);
*                    $$->rel = $1;*
create_mv_target:
*qualified_name* opt_column_list table_access_method_clause 
opt_reloptions OptTableSpace

     {
     $$ = makeNode(IntoClause);
*    $$->rel = $1;*
into_clause:
     INTO OptTempTableName
     {
     $$ = makeNode(IntoClause);
*   $$->rel = $2;*

I will change the below code:
+                    if (GetParallelInsertCmdType(gstate->dest) ==
+                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+                        ((DR_intorel *) gstate->dest)->into &&
+                        ((DR_intorel *) gstate->dest)->into->rel &&
+                        ((DR_intorel *) gstate->dest)->into->rel->relname)
+                    {

to:
+                    if (GetParallelInsertCmdType(gstate->dest) ==
+                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+                    {

I will update this in the next version of the patch set.



Thanks

 >
 > +        * In case if no workers were launched, allow the leader to 
insert entire

 > +        * tuples.
 > what does "entire tuples" mean? should it maybe be "all tuples"?

Yeah, noticed that while

Re: Track replica origin progress for Rollback Prepared

2021-01-04 Thread Michael Paquier

On Tue, Jan 05, 2021 at 09:35:21AM +0530, Amit Kapila wrote:
> As noted in [1], without this the subscriber might again ask for
> rollback prepared lsn after restart.
> 
> Attached patch addresses this problem.

Is it possible to add some tests in test_decoding?

/* dump transaction origin information only for abort prepared */
if ((replorigin_session_origin != InvalidRepOriginId) &&
-   TransactionIdIsValid(twophase_xid) &&
-   XLogLogicalInfoActive())
+   TransactionIdIsValid(twophase_xid))
It seems to me that you may want to document as a comment the reason
why this gets sent even if (wal_level < logical).
--
Michael


signature.asc
Description: PGP signature

Re: Moving other hex functions to /common

2021-01-04 Thread Michael Paquier

On Mon, Jan 04, 2021 at 10:47:39PM -0500, Bruce Momjian wrote:
> I can see the value of passing the destination length to the hex
> functions, and I think you have to pass the src length to pg_hex_encode
> since the input can be binary.  I assume the pg_hex_decode doesn't need
> the source length because it is a null-terminated C string, right?

I don't think that we should assume that this is always the case,
actually.  That would be an assumption easy to miss for callers of
those routines. 

> However, pg_base64_decode(const char *src, size_t len, char *dst) does
> pass in the src length, so I can see we should just make it the same.  I
> also agree it is unclear if 'len' is the src or dst len, so your patch
> fixes that with the names.  Also, is there a reason you use int64
> instead of the size_t used by bytea?

For the argument type, sure you could just use size_t, using an int
just looked more consistent to me to match with the style of
base64.c.

> I don't think removing the two error type reporting from src/common, and
> then having to add it back into adt/encode.c makes sense.  There are
> two, soon three, that want those two error reports, so I am thinking we
> are best to just leave ecpg alone since it is just a library that
> doesn't want more than one error reporting.  I don't think we will have
> many more library call cases.

Hmm.  Even for libpq?  I have grown allergic to the addition of more
"ifdef FRONTEND" and elog()/pg_log() calls into src/common/.

> I think your patch has a few mistakes.  First, I think you removed the
> hex_decode files from the file system, rather than removing it via git,
> so the diff didn't apply in the cfbot.

Oh, indeed, thanks.  I missed that the patch should have used
/dev/null for the removed files.

> Second, I don't think ecpg ever used libpgcommon before.

Yep.  That would be the first time.  I don't see anything that would
prevent its use, though I may be missing something.

> For non-Windows, libpgcommon got referenced
> anyway in the build, so non-Windows compiles worked, but in Windows,
> that was not referenced, meaning the cfbot failed on ecpglib.  I have
> modified the attached patch with both of these fixed --- let's see how
> it likes this version.

Indeed.  Likely I am to blame for not having my Windows machine at
hand these days.  I'll have this environment available only next week
:)

FWIW, I think that this refactoring has more value iff we are able to
remove all the duplicate hex implementations we have in the tree,
while being able to cover the case you are looking for frontend tools
able to do logging.  Merging ECPG and the backend requires switching
to a logic where we return more than one error code so we could just
use an enum for the result result à-la-PQping like in libpq.
--
Michael


signature.asc
Description: PGP signature

Re: Deadlock between backend and recovery may not be detected

2021-01-04 Thread Fujii Masao





On 2020/12/25 13:16, Kyotaro Horiguchi wrote:

At Wed, 23 Dec 2020 21:42:47 +0900, Fujii Masao  
wrote in

you. Attached
is the updated of the patch. What about this version?


The patch contains a hunk in the following structure.

+   if (got_standby_lock_timeout)
+   goto cleanup;
+
+   if (got_standby_deadlock_timeout)
+   {
...
+   }
+
+cleanup:

It is eqivalent to

+   if (!got_standby_lock_timeout && got_standby_deadlock_timeout)
+   {
...
+   }

Is there any reason for the goto?


Yes. That's because we have the following code using goto.

+   /* Quick exit if there's no work to be done */
+   if (!VirtualTransactionIdIsValid(*backends))
+   goto cleanup;


Regarding the back-patch, I was thinking to back-patch this to all the
supported branches. But I found that it's not easy to do that to v9.5
because v9.5 doesn't have some infrastructure code that this bug fix
patch depends on. So at least the commit 37c54863cf as the infrastructure
also needs to be back-patched to v9.5. And ISTM that some related commits
f868a8143a and 8f0de712c3 need to be back-patched. Probably there might
be some other changes to be back-patched. Unfortunately they cannot be
applied to v9.5 cleanly and additional changes are necessary.

This situation makes me feel that I'm inclined to skip the back-patch to v9.5.
Because the next minor version release is the final one for v9.5. So if we
unexpectedly introduce the bug to v9.5 by the back-patch, there is no
chance to fix that. OTOH, of course, if we don't do the back-patch, there is
no chance to fix the deadlock detection bug since the final minor version
for v9.5 doesn't include that bug fix. But I'm afraid that the back-patch
to v9.5 may give more risk than gain.

Thought?

Regards,  


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [HACKERS] logical decoding of two-phase transactions

On Tue, Dec 29, 2020 at 3:15 PM Ajin Cherian  wrote:
>
> Hi Sawada-san,
>
> I think Amit has a plan to commit this patch-set in phases. I will
> leave it to him to decide because I think he has a plan.
> I took time to refactor the test_decoding isolation test for
> consistent snapshot so that it uses just 3 sessions rather than 4.
> Posting an updated patch-0009
>

I have reviewed this test case patch and have the below comments:

1.
+step "s1checkpoint" { CHECKPOINT; }
...
+step "s2alter" { ALTER TABLE do_write ADD COLUMN addedbys2 int; }

I don't see the need for the above steps and we should be able to
generate the required scenario without these as well. Is there any
reason to keep those?

2.
"s3c""s1insert"

space is missing between these two.

3.
+# Force building of a consistent snapshot between a PREPARE and
COMMIT PREPARED.
+# Ensure that the whole transaction is decoded fresh at the time of
COMMIT PREPARED.
+permutation "s2b" "s2txid" "s1init" "s3b" "s3txid" "s2alter" "s2c"
"s2b" "s2insert" "s2prepare" "s3c""s1insert" "s1checkpoint" "s1start"
"s2commit" "s1start"

I think we can update the above comments to indicate how and which
important steps help us to realize the required scenario. See
subxact_without_top.spec for reference.

4.
+step "s2c" { COMMIT; }
...
+step "s2prepare" { PREPARE TRANSACTION 'test1'; }
+step "s2commit" { COMMIT PREPARED 'test1'; }

s2c and s2commit seem to be confusing names as both sounds like doing
the same thing. How about changing s2commit to s2cp and s2prepare to
s2p?

-- 
With Regards,
Amit Kapila.

Re: [PATCH] LWLock self-deadlock detection

2021-01-04 Thread Craig Ringer

On Wed, 30 Dec 2020 at 10:11, Andres Freund  wrote:

> Hi,
>
> On 2020-11-27 20:22:41 +0200, Heikki Linnakangas wrote:
> > On 26/11/2020 04:50, Tom Lane wrote:
> > > Craig Ringer  writes:
> > > > On Wed, Nov 25, 2020 at 9:23 PM Ashutosh Bapat <
> ashutosh.bapat@gmail.com>
> > > > wrote:
> > > > > I'd prefer to make the lock self deadlock check run for production
> > > > > builds, not just cassert builds.
> > >
> > > I'd like to register a strong objection to spending any cycles
> whatsoever
> > > on this.  If you're using LWLocks in a way that could deadlock, you're
> > > doing it wrong.  The entire point of that mechanism is to be Light
> Weight
> > > and not have all the overhead that the heavyweight lock mechanism has.
> > > Building in deadlock checks and such is exactly the wrong direction.
> > > If you think you need that, you should be using a heavyweight lock.
> > >
> > > Maybe there's some case for a cassert-only check of this sort, but
> running
> > > it in production is right out.
> > >
> > > I'd also opine that checking for self-deadlock, but not any more
> general
> > > deadlock, seems pretty pointless.  Careless coding is far more likely
> > > to create opportunities for the latter.  (Thus, I have little use for
> > > the existing assertions of this sort, either.)
> >
> > I've made the mistake of forgetting to release an LWLock many times,
> leading
> > to self-deadlock. And I just reviewed a patch that did that this week
> [1].
> > You usually find that mistake very quickly when you start testing
> though, I
> > don't think I've seen it happen in production.
>
> I think something roughly along Craig's original patch, basically adding
> assert checks against holding a lock already, makes sense. Compared to
> the other costs of running an assert build this isn't a huge cost, and
> it's helpful.
>
> I entirely concur that doing this outside of assertion builds is a
> seriously bad idea.
>

Yeah, given it only targets developer error that's sensible.

RE: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-04 Thread Shinya11.Kato

Thank you for your review!
I fixed some codes and attach a new patch.

>When I applied the patch, I got the following whitespace warnings:
>
>$ git apply ~/patches/fix_tab_complete_close_fetch_move.patch
>/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:40:
>indent with spaces.
>COMPLETE_WITH_QUERY(Query_for_list_of_cursors
>/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:41:
>indent with spaces.
>" UNION SELECT 'ABSOLUTE'"
>/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:42:
>indent with spaces.
>" UNION SELECT 'BACKWARD'"
>/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:43:
>indent with spaces.
>" UNION SELECT 'FORWARD'"
>/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:44:
>indent with spaces.
>" UNION SELECT 'RELATIVE'"
>warning: squelched 19 whitespace errors
>warning: 24 lines add whitespace errors.
>
>I recommend you checking whitespaces or running pgindent.

Thank you for your advice and I have corrected it.

>Here are some comments:
>
>/*
>-* Complete FETCH with one of ABSOLUTE, BACKWARD, FORWARD,
>RELATIVE, ALL,
>-* NEXT, PRIOR, FIRST, LAST
>+* Complete FETCH with a list of cursors and one of ABSOLUTE,
>BACKWARD, FORWARD, RELATIVE, ALL,
>+* NEXT, PRIOR, FIRST, LAST, FROM, IN
> */
>
>Maybe I think the commend should say:
>
>+* Complete FETCH with one of ABSOLUTE, BACKWARD, FORWARD,
>RELATIVE, ALL,
>+* NEXT, PRIOR, FIRST, LAST, FROM, IN, and a list of cursors
>
>Other updates of the comment seem to have the same issue.
>
>Or I think we can omit the details from the comment since it seems redundant
>information. We can read it from the arguments of the following
>COMPLETE_WITH_QUERY().

It certainly seems redundant, so I deleted them.

>---
>-   /*
>-* Complete FETCH  with "FROM" or "IN". These are equivalent,
>-* but we may as well tab-complete both: perhaps some users prefer one
>-* variant or the other.
>-*/
>+   /* Complete FETCH  with a list of cursors and "FROM" or
>+ "IN" */
>
>Why did you remove the second sentence in the above comment?

I had misunderstood the meaning and deleted it.
I deleted it as well as above, but would you prefer it to be there?

>---
>The patch improves tab completion for CLOSE, FETCH, and MOVE but is there
>any reason why you didn't do that for DECLARE? I think DECLARE also can be
>improved and it's a good timing for that.

I wanted to improve tab completion for DECLARE, but I couldn't find anything to 
improve.
Please let me know if there are any codes that can be improved.

Regards,
Shinya Kato


fix_tab_complete_close_fetch_move_v2.patch
Description: fix_tab_complete_close_fetch_move_v2.patch

Re: Reducing WaitEventSet syscall churn

On Fri, Jul 31, 2020 at 9:42 AM Thomas Munro  wrote:
> On Thu, Jul 30, 2020 at 5:50 PM Thomas Munro  wrote:
> > I pushed those three patches, but will wait for more discussion on the rest.
>
> And here's a rebase, to make cfbot happy.

And again.

To restate the two goals of the remaining patches:

1.  Use FeBeWaitSet for walsender, instead of setting up and tearing
down temporary WaitEventSets all the time.
2.  Use the standard WL_EXIT_ON_PM_DEATH flag for FeBeWaitSet, instead
of the special case ereport(FATAL, ... "terminating connection due to
unexpected postmaster exit" ...).

For point 2, the question I am raising is: why should users get a
special FATAL message in some places and not others, for PM death?
However, if people are attached to that behaviour, we could still
achieve goal 1 without goal 2 by handling PM death explicitly in
walsender.c and I'd be happy to post an alternative patch set like
that.
From 72fa3b6858b3ab121fd60533ed7d6a4dcdbc4a5c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:51:09 +1300
Subject: [PATCH v7 1/3] Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet.

Previously, we'd give either a FATAL message or a silent exit() when
we detected postmaster death, depending on which wait point we were at.
Make the exit more uniform, by using the standard exit facility.  This
will allow a later patch to use FeBeWaitSet in a new location without
having to add more explicit handling for postmaster death, in line with
our policy of reducing such clutter.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c | 28 
 src/backend/libpq/pqcomm.c|  2 +-
 2 files changed, 1 insertion(+), 29 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 4cf139a223..746fce8288 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -184,28 +184,6 @@ retry:
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_READ);
 
-		/*
-		 * If the postmaster has died, it's not safe to continue running,
-		 * because it is the postmaster's job to kill us if some other backend
-		 * exits uncleanly.  Moreover, we won't run very well in this state;
-		 * helper processes like walwriter and the bgwriter will exit, so
-		 * performance may be poor.  Finally, if we don't exit, pg_ctl will be
-		 * unable to restart the postmaster without manual intervention, so no
-		 * new connections can be accepted.  Exiting clears the deck for a
-		 * postmaster restart.
-		 *
-		 * (Note that we only make this check when we would otherwise sleep on
-		 * our latch.  We might still continue running for a while if the
-		 * postmaster is killed in mid-query, or even through multiple queries
-		 * if we never have to wait for read.  We don't want to burn too many
-		 * cycles checking for this very rare condition, and this should cause
-		 * us to exit quickly in most cases.)
-		 */
-		if (event.events & WL_POSTMASTER_DEATH)
-			ereport(FATAL,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to unexpected postmaster exit")));
-
 		/* Handle interrupt. */
 		if (event.events & WL_LATCH_SET)
 		{
@@ -296,12 +274,6 @@ retry:
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_WRITE);
 
-		/* See comments in secure_read. */
-		if (event.events & WL_POSTMASTER_DEATH)
-			ereport(FATAL,
-	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to unexpected postmaster exit")));
-
 		/* Handle interrupt. */
 		if (event.events & WL_LATCH_SET)
 		{
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 1e6b6db540..289d9bd2da 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -222,7 +222,7 @@ pq_init(void)
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
-	AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+	AddWaitEventToSet(FeBeWaitSet, WL_EXIT_ON_PM_DEATH, -1, NULL, NULL);
 }
 
 /* 
-- 
2.20.1

From ec7348f4fc794eed6cfd6bb3897086ef29c213ae Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:48:29 +1300
Subject: [PATCH v7 2/3] Introduce symbolic names for FeBeWaitSet positions.

Previously we used hard coded 0 and 1 to refer to the socket and latch.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c |  4 ++--
 src/backend/libpq/pqcomm.c| 18 +++---
 src/backend/utils/init/miscinit.c |  4 ++--
 src/include/libpq/libpq.h |  3 +++
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure.c

Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread vignesh C

On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
 wrote:
>
> On Wed, Dec 30, 2020 at 5:28 PM vignesh C  wrote:
> > Few comments:
> > -   /*
> > -* To allow parallel inserts, we need to ensure that they are safe 
> > to be
> > -* performed in workers. We have the infrastructure to allow 
> > parallel
> > -* inserts in general except for the cases where inserts generate a 
> > new
> > -* CommandId (eg. inserts into a table having a foreign key column).
> > -*/
> > -   if (IsParallelWorker())
> > -   ereport(ERROR,
> > -   (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > -errmsg("cannot insert tuples in a
> > parallel worker")));
> >
> > Is it possible to add a check if it is a CTAS insert here as we do not
> > support insert in parallel workers from others as of now.
>
> Currently, there's no global variable in which we can selectively skip
> this in case of parallel insertion in CTAS. How about having a
> variable in any of the worker global contexts, set that when parallel
> insertion is chosen for CTAS and use that in heap_prepare_insert() to
> skip the above error? Eventually, we can remove this restriction
> entirely in case we fully allow parallelism for INSERT INTO SELECT,
> CTAS, and COPY.
>
> Thoughts?

Yes, I felt that the leader can store the command as CTAS and the
leader/worker can use it to check and throw an error. The similar
change can be used for the parallel insert patches and once all the
patches are committed, we can remove it eventually.

>
> > +   Oid objectid;   /* workers to
> > open relation/table.  */
> > +   /* Number of tuples inserted by all the workers. */
> > +   pg_atomic_uint64processed;
> >
> > We can just mention relation instead of relation/table.
>
> I will modify it in the next patch set.
>
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +  explain_pictas
> > +--
> > + Gather (actual rows=N loops=N)
> > +   Workers Planned: 4
> > +   Workers Launched: N
> > + ->  Create parallel_write
> > +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> > +(5 rows)
> > +
> > +select count(*) from parallel_write;
> >
> > Can we include selection of cmin, xmin for one of the test to verify
> > that it uses the same transaction id  in the parallel workers
> > something like:
> > select distinct(cmin,xmin) from parallel_write;
>
> This is not possible since cmin and xmin are dynamic, we can not use
> them in test cases. I think it's not necessary to check whether the
> leader and workers are in the same txn or not, since we are not
> creating a new txn. All the txn state from the leader is serialized in
> SerializeTransactionState and restored in
> StartParallelWorkerTransaction.
>

I had seen in your patch that you serialize and use the same
transaction, but it will be good if you can have at least one test
case to validate that the leader and worker both use the same
transaction. To solve the problem that you are facing where cmin and
xmin are dynamic, you can check the distinct count by using something
like below:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  t1) as dt;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Added missing copy related data structures to typedefs.list

2021-01-04 Thread vignesh C

On Wed, Dec 30, 2020 at 7:10 PM Amit Kapila  wrote:
>
> On Sat, Dec 26, 2020 at 9:16 PM vignesh C  wrote:
> >
> > On Thu, Dec 17, 2020 at 4:28 AM Bruce Momjian  wrote:
> > >
> > > On Mon, Dec  7, 2020 at 01:56:50PM +0530, vignesh C wrote:
> > > > Hi,
> > > >
> > > > Added missing copy related data structures to typedefs.list, these
> > > > data structures were added while copy files were split during the
> > > > recent commit. I found this while running pgindent for parallel copy
> > > > patches.
> > > > The Attached patch has the changes for the same.
> > > > Thoughts?
> > >
> > > Uh, we usually only update the typedefs file before we run pgindent on
> > > the master branch.
> > >
> >
> > Ok, Thanks for the clarification. I was not sure as in few of the
> > enhancements it was included as part of the patches.
> >
>
> Yeah, I do that while committing patches that require changes in
> typedefs. It is not a norm and I am not sure how much value it adds to
> do it separately for the missing ones unless you are making changes in
> the same file they are used and you are facing unrelated diffs due to
> those missing ones.

I found this while I was running pgindent for parallel copy patches. I
was not sure if this change was left out intentionally or by mistake.
I'm fine if it is committed separately or together at a later point.
It is not a major problem for my patch since I know the change, I will
do the required adjustment when I make changes on top of it, if it is
not getting committed. But I felt we can commit this since it is a
recent change.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: list of extended statistics on psql

2021-01-04 Thread Tatsuro Yamada


Hi,


I rebased the patch set on the master (7e5e1bba03), and the regression
test is good. Therefore, I changed the status of the patch: "needs review". 


Happy New Year!

I rebased my patches on HEAD.
Please find attached files. :-D

Thanks,
Tatsuro Yamada

From 9f36a9df0c2803f5554b951e37ba5969c2744e3b Mon Sep 17 00:00:00 2001
From: Tatsuro Yamada 
Date: Tue, 5 Jan 2021 12:34:16 +0900
Subject: [PATCH 2/2] Add regression test for \dX on psql

---
 src/test/regress/expected/stats_ext.out | 94 +
 src/test/regress/sql/stats_ext.sql  | 31 +++
 2 files changed, 125 insertions(+)

diff --git a/src/test/regress/expected/stats_ext.out 
b/src/test/regress/expected/stats_ext.out
index 7bfeaf85f0..8c8a0afcf6 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -1725,6 +1725,100 @@ INSERT INTO tststats.priv_test_tbl
 CREATE STATISTICS tststats.priv_test_stats (mcv) ON a, b
   FROM tststats.priv_test_tbl;
 ANALYZE tststats.priv_test_tbl;
+-- Check printing info about extended statistics by \dX
+create table stts_t1 (a int, b int);
+create statistics stts_1 (ndistinct) on a, b from stts_t1;
+create statistics stts_2 (ndistinct, dependencies) on a, b from stts_t1;
+create statistics stts_3 (ndistinct, dependencies, mcv) on a, b from stts_t1;
+create table stts_t2 (a int, b int, c int);
+create statistics stts_4 on b, c from stts_t2;
+create table stts_t3 (col1 int, col2 int, col3 int);
+create statistics stts_hoge on col1, col2, col3 from stts_t3;
+create schema stts_s1;
+create schema stts_s2;
+create statistics stts_s1.stts_foo on col1, col2 from stts_t3;
+create statistics stts_s2.stts_yama (dependencies, mcv) on col1, col3 from 
stts_t3;
+insert into stts_t1 select i,i from generate_series(1,100) i;
+analyze stts_t1;
+\dX
+  List of extended statistics
+  Schema  |  Name  |  Definition  | 
Ndistinct | Dependencies |   MCV   
+--++--+---+--+-
+ public   | func_deps_stat | a, b, c FROM functional_dependencies |
   | built| 
+ public   | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays|
   |  | built
+ public   | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool  |
   |  | built
+ public   | mcv_lists_stats| a, b, d FROM mcv_lists   |
   |  | built
+ public   | stts_1 | a, b FROM stts_t1| 
built |  | 
+ public   | stts_2 | a, b FROM stts_t1| 
built | built| 
+ public   | stts_3 | a, b FROM stts_t1| 
built | built| built
+ public   | stts_4 | b, c FROM stts_t2| 
defined   | defined  | defined
+ public   | stts_hoge  | col1, col2, col3 FROM stts_t3| 
defined   | defined  | defined
+ stts_s1  | stts_foo   | col1, col2 FROM stts_t3  | 
defined   | defined  | defined
+ stts_s2  | stts_yama  | col1, col3 FROM stts_t3  |
   | defined  | defined
+ tststats | priv_test_stats| a, b FROM tststats.priv_test_tbl |
   |  | built
+(12 rows)
+
+\dX stts_?
+   List of extended statistics
+ Schema |  Name  |Definition | Ndistinct | Dependencies |   MCV   
+++---+---+--+-
+ public | stts_1 | a, b FROM stts_t1 | built |  | 
+ public | stts_2 | a, b FROM stts_t1 | built | built| 
+ public | stts_3 | a, b FROM stts_t1 | built | built| built
+ public | stts_4 | b, c FROM stts_t2 | defined   | defined  | defined
+(4 rows)
+
+\dX *stts_hoge
+   List of extended statistics
+ Schema |   Name|  Definition   | Ndistinct | Dependencies 
|   MCV   
++---+---+---+--+-
+ public | stts_hoge | col1, col2, col3 FROM stts_t3 | defined   | defined  
| defined
+(1 row)
+
+\dX+
+   List of 
extended statistics
+  Schema  |  Name  |  Definition  | 
Ndistinct | Dependencies |   MCV   | Ndistinct_size | Dependencies_size |  
MCV_size  
+--++--+---+--+-++---+
+ public   | func_deps_stat | a, b, c FROM functional_dependencies |
   | built| || 106 bytes | 
+ public   | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays

Re: macOS SIP, next try

2021-01-04 Thread Mark Dilger




> On Dec 2, 2020, at 6:28 AM, Peter Eisentraut 
>  wrote:
> 
> Previous discussions on macOS SIP:
> 
> https://www.postgresql.org/message-id/flat/561E73AB.8060800%40gmx.net
> https://www.postgresql.org/message-id/flat/CA%2BTgmoYGi5oR8Rvb2-SY1_WEjd76H5gCqSukxFQ66qR7MewWAQ%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/6a4d6124-41f0-756b-0811-c5c5def7ef4b%402ndquadrant.com

See also 
https://www.postgresql.org/message-id/flat/18012hGLG6HJ9pQDkHAMYuwQKg%40sparkpost.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Track replica origin progress for Rollback Prepared

While reviewing logical decoding of 2PC xacts work, I noticed that we
need $SUBJECT [1]. Commit 1eb6d6527a [2] allowed to track replica
origin replay progress for 2PC but it was not complete. It misses to
properly track the progress for rollback prepared especially it missed
to update the code for recovery. Additionally, we need to allow
tracking it on subscriber nodes where
wal_level might not be logical.

As noted in [1], without this the subscriber might again ask for
rollback prepared lsn after restart.

Attached patch addresses this problem.

Thoughts?


[1] - 
https://www.postgresql.org/message-id/CAA4eK1L3p4z%2B9wtK77MbdpkagR4GS2Y3r1Je7ZEvLQVF9GArfg%40mail.gmail.com
[2] -
commit 1eb6d6527aae264b3e0b9c95aa70bb7a594ad1cf
Author: Simon Riggs 
Date:   Wed Mar 28 17:42:50 2018 +0100

Store 2PC GID in commit/abort WAL recs for logical decoding

Store GID of 2PC in commit/abort WAL records when wal_level = logical.
This allows logical decoding to send the SAME gid to subscribers
across restarts of logical replication.

Track relica origin replay progress for 2PC.

-- 
With Regards,
Amit Kapila.


v1-0001-Track-replication-origin-progress-for-rollbacks.patch
Description: Binary data

Cirrus CI (Windows help wanted)

Hi,

My new favourite CI is Cirrus CI, because it has 4 operating systems,
generous enough quotas to handle 250+ branches in a single account,
and public build/test log URLs. I flipped cfbot.cputube.org (mostly)
over to that and it seems to work well so far -- fingers crossed.
I've also been using it for my own development branches that involve
some systems hacking-heavy work that uses different kernel interfaces
on all 4 of those OSes.

There's one thing I'm stuck on, though: Windows. If anyone wants to
help figure out how to get PostgreSQL to build on Cirrus's Windows,
I'd be very interested. To play with this stuff, you need a public
Github repo, and you need to add Cirrus CI from "Marketplace" (it's
free for public/open source), and then you add a .cirrus.yml file such
as https://github.com/macdice/cfbot/blob/master/cirrus/.cirrus.yml to
the top level of a PostgreSQL branch. When you push, you should see
build results on the Github web UI.

For a similar example that works on Windows on another CI, see
https://github.com/macdice/cfbot/blob/master/appveyor/appveyor.yml
(note that it also references a couple of other files; it'd be nice to
be able to do that stuff without the need for separate files, possibly
by using Power Shell). That's what cfbot is using for Windows for
now, which works really well, but it'd be nice to have more
options/choices. For another example of Windows builds working on
another CI, see the Github Actions patch I posted earlier when I was
considering that for cfbot[1]. I think what's different is that those
other CIs have images with MSVC on them, but Cirrus wants you to
figure out how to install the right toolchain yourself (and then, as a
next step after it's actually working, it also provides a way to
define what you want in a way that captures the resulting image using
Docker voodoo, so that you get fast startup times). Or something.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2By_SHVQcU3CPokmJxuHp1niebCjq4XzZizf8SR9ZdQRQ%40mail.gmail.com

Re: macOS SIP, next try

Peter Eisentraut  writes:
> The solution here is to use install_name_tool to edit the shared library 
> path in the binaries (programs and libraries) after installation into 
> the temporary prefix.

Ah, this indeed seems like a promising direction.

> The solution is, I think, correct in principle, but the implementation 
> is hackish, since it hardcodes the names and versions of the interesting 
> shared libraries in an unprincipled way.  More refinement is needed 
> here.  Ideas welcome.

Yeah, it's pretty hacky.  I tried the patch as given, and "make check"
failed with

error: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool:
 can't open file: /Users/tgl/pgsql/tmp_install//Users/tgl/testversion/lib/*.so 
(No such file or directory)
make: *** [temp-install] Error 1

evidently because there are no files named *.so in the "lib" directory
in my build.  I removed '$(abs_top_builddir)/tmp_install/$(libdir)'/*.so
from the "call temp_install_postprocess" invocation, and then it got
through "make temp-install", and most of the core regression tests worked.
But a couple of them failed with

+ERROR:  could not load library 
"/Users/tgl/pgsql/tmp_install/Users/tgl/testversion/lib/postgresql/libpqwalreceiver.so":

dlopen(/Users/tgl/pgsql/tmp_install/Users/tgl/testversion/lib/postgresql/libpqwalreceiver.so,
 10): Library not loaded: /Users/tgl/testversion/lib/libpq.5.dylib
+  Referenced from: 
/Users/tgl/pgsql/tmp_install/Users/tgl/testversion/lib/postgresql/libpqwalreceiver.so
+  Reason: image not found

What this looks like to me is some confusion about whether "pgsql"
or "postgresql" has been injected into the install path or not.
(This build is using /Users/tgl/testversion as the install --prefix;
I surmise that your path had "pgsql" or "postgres" in it already.)

One plausible way to fix this is to use "find" on the install tree
to locate files to modify, instead of hard-wiring knowledge into
the "call temp_install_postprocess" invocation.  A possibly better
way is to arrange to invoke install_name_tool once per installed
executable or shlib, at the time we do "make install" for it.
Not sure how hard the latter approach is given our Makefiles.

Another suggestion, after reading "man install_name_tool", is
that maybe you could avoid having to know the names of the specific
libraries if you didn't use per-library '-change' switches, but
instead installed the test-installation rpath with the -add_rpath
switch.  This would be closer to the spirit of the existing test
process anyway.

Aside from removing that annoying requirement, this method would
bound the amount of extra header space needed, so instead of
"-headerpad_max_install_names" we could use "-headerpad N" with
some not-hugely-generous N.  That ought to damp down the executable
bloat to the point where we'd not have to worry too much about it.

(Pokes around ... it appears that on my old dinosaur prairiedog,
install_name_tool exists and has the -change switch, but not
anything for rpath.  So we'd better investigate how old -add_rpath
is.  Maybe there's another way to set rpath that works further back?
At least the -headerpad switches exist there.)

I haven't experimented with any of these ideas, so maybe they won't
work for some reason.  But I think we ought to push forward looking
for a solution, because it's getting harder and harder to disable
SIP and still have a useful macOS installation :-(

regards, tom lane

Re: Moving other hex functions to /common

On Sat, Jan  2, 2021 at 02:25:33PM +0900, Michael Paquier wrote:
> > Let me get my patch building on the cfbot and then I will address each
> > of these.  I am trying to do one stage at a time since I am still
> > learning the process.  Thanks.
> 
> No problem.  On my end, this stuff has been itching me for a couple of
> days and I could not recall why..  Until I remembered that the design
> of the hex APIs in your patch is weak with overflow handling because
> we don't pass down to the function the size of the destination buffer.
> We have finished with a similar set of issues on the past with SCRAM
> and base64, with has led to CVE-2019-10164 and the improvements done
> in cfc40d3.  So I think that we need to improve things in a safer way.
> Mapping with the design for base64, I have finished with the attached
> patch, and the following set:
> +extern int64 pg_hex_decode(const char *src, int64 len, char *dst, int64 
> dstlen);
> +extern int64 pg_hex_encode(const char *src, int64 len, char *dst, int64 
> dstlen);
> +extern int64 pg_hex_enc_len(int64 srclen);
> +extern int64 pg_hex_dec_len(int64 srclen);

I can see the value of passing the destination length to the hex
functions, and I think you have to pass the src length to pg_hex_encode
since the input can be binary.  I assume the pg_hex_decode doesn't need
the source length because it is a null-terminated C string, right? 
However, pg_base64_decode(const char *src, size_t len, char *dst) does
pass in the src length, so I can see we should just make it the same.  I
also agree it is unclear if 'len' is the src or dst len, so your patch
fixes that with the names.  Also, is there a reason you use int64
instead of the size_t used by bytea?

> This ensures that the result never overflows, which explains the
> introduction of an error code for the encoding part, and does not 
> use elog()/pg_log() so as external libraries can use them.  ECPG uses
> long variables in a couple of places, explaining why it feels safer to
> use int64.  int should give enough room to any caller of those APIs,
> but there is no drawback in using a 64-bit API either, and I don't
> think it is worth the risk to break ECPG either for long handling,
> even if I don't believe either that folks are going to work on strings
> larger than 2GB.

Might as well just have hex match what bytea and esc does.

> Things get trickier for the bytea input/output because we want more
> generic error messages depending for invalid values or an incorrect
> number of digits, which is why I have left the copies in encode.c.
> This design could be easily extended with more types of error codes,
> though I am not sure if that's worth bothering.

I don't think removing the two error type reporting from src/common, and
then having to add it back into adt/encode.c makes sense.  There are
two, soon three, that want those two error reports, so I am thinking we
are best to just leave ecpg alone since it is just a library that
doesn't want more than one error reporting.  I don't think we will have
many more library call cases.

> Even with that, this leads to much more sanity for hex buffer
> manipulation in backup manifests (I don't think that using  
> PG_SHAXXX_DIGEST_STRING_LENGTH is a good idea either, I'd like to get
> rid of it in the long-term) and ECPG, so that's clearly a gain.
> 
> I don't have a Windows host at hand, though I think that it should
> work there correctly.  What do you think about the ideas in the
> attached patch?

I think your patch has a few mistakes.  First, I think you removed the
hex_decode files from the file system, rather than removing it via git,
so the diff didn't apply in the cfbot.  Second, I don't think ecpg ever
used libpgcommon before.  For non-Windows, libpgcommon got referenced
anyway in the build, so non-Windows compiles worked, but in Windows,
that was not referenced, meaning the cfbot failed on ecpglib.  I have
modified the attached patch with both of these fixed --- let's see how
it likes this version.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

hex2.diff.gz
Description: application/gzip

Re: pg_waldump/heapdesc.c and struct field names

On Mon, Jan 4, 2021 at 2:06 PM Peter Geoghegan  wrote:
> Right. Self-consistency matters, as does consistency with the source
> code itself.

Pushed a commit that standardizes on the name latestRemovedXid within
rmgr desc output routines just now.

Thanks
-- 
Peter Geoghegan

Re: Proposed patch for key management

2021-01-04 Thread Neil Chen

On Tue, Jan 5, 2021 at 10:18 AM Bruce Momjian  wrote:

> On Fri, Jan  1, 2021 at 06:26:36PM +, Alastair Turner wrote:
>
> There is all sorts of flexibility being proposed:
>
> *  scope of keys
> *  encryption method
> *  encryption mode
> *  internal/external
>
> Some of this is related to wrapping the data keys, and some of it gets
> to the encryption of cluster files.  I just don't see how anything with
> the level of flexibility being asked for could ever be reliable or
> simple to administer, and I don't see the security value of that
> flexibility.  I feel at a certain point, we just need to choose the best
> options and move forward.
>
> +1.

I thought this had all been debated, but people continue to show up and
> ask for it to be debated again.  At one level, these discussions are
> valuable, but at a certain point you end up re-litigate this that you
> get nothing else done.  I know some people who have given up working on
> this feature for this reason.
>
> I am not asking we stop discussing it, but I do ask we address the
> negatives of suggestions, especially when the negatives have already
> been clearly stated.
>
> Well, in fact, because the discussion about TDE/KMS has several very long
threads, maybe not everyone can read them completely first, and inevitably,
there will be topics that have already been discussed.

At least for the initial version, I think we should pay more attention to
whether the currently provided functions are acceptable, instead of always
staring at what it lacks, and how it should be more flexible.

For basic KMS functions, we need to ensure its stability and safety. Use a
more flexible API to support different encryption algorithms. Yes, it does
look more powerful, but it does not see much value for stability and
security. Regardless of whether we want to do this or not, but I think this
can at least not be discussed in the first version. We will not discuss
whether it is safer to use more keys, or even introduce HSM management. We
only evaluate whether this design can resist attacks under our Threat_models

as the initial version.

Of course, the submission of a feature needs to accept the opinions of many
people, but for a large and complex feature, I think that dividing the
stage to limit the scope of the discussion will help us avoid getting into
endless disputes.

-- 
There is no royal road to learning.
HighGo Software Co.

Re: Fix typo in comment

On Tue, Jan 5, 2021 at 5:36 AM Peter Smith  wrote:
>
> PSA a trivial patch to correct a comment typo.
>

LGTM. Pushed!

-- 
With Regards,
Amit Kapila.

Re: fix typo in ReorderBufferProcessTXN

On Tue, Jan 5, 2021 at 7:45 AM Amit Kapila  wrote:
>
> On Tue, Jan 5, 2021 at 7:13 AM Masahiko Sawada  wrote:
> >
> > On Tue, Jan 5, 2021 at 9:54 AM Hou, Zhijie  
> > wrote:
> >
> > > in contrast to normal the normal catalog during decoding == >> in 
> > > contrast to the normal catalog during decoding
> >
> > Looks good change to me.
> >
>
> +1. I will push this in some time.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: Single transaction in the tablesync worker?

2021-01-04 Thread Ajin Cherian

On Wed, Dec 30, 2020 at 5:08 PM Peter Smith  wrote:

>
> PSA my v9 WIP patch for the Solution1 which addresses some recent
> review comments, and other minor changes.

I did some tests using the test suite prepared by Erik Rijkers in [1]
during the initial design of tablesync.

Back then, they had seen some errors while doing multiple commits in
initial tablesync. So I've rerun the test script on the v9 patch
applied on HEAD and found no errors.
The script runs pgbench, creates a pub/sub on a standby server, and
all of the pgbench tables are replicated to the standby. The contents
of the tables are compared at
the end of each run to make sure they are identical.
I have run it for around 12 hours, and it worked without any errors.
Attaching the script I used.

regards,
Ajin Cherian
Fujitsu Australia

[1]- 
https://www.postgresql.org/message-id/93d02794068482f96d31b002e0eb248d%40xs4all.nl
#!/bin/bash

# assume both instances are running, on port 6972 (master) and 6973 (replica)

port1=6972
port2=6973

logfile1=$HOME/dataoss/logfile
logfile2=$HOME/dataoss2/logfile

# clear logs
echo > $logfile1
echo > $logfile2

logfile_ts=$( date +"%Y%m%d_%H_%M_%S" )

scale=1;  if [[ ! "$1" == "" ]]; then scale=$1;  fi
clients=1;if [[ ! "$2" == "" ]]; then clients=$2;fi
#INIT_WAIT=0; if [[ ! "$3" == "" ]]; then INIT_WAIT=$3;  fi
duration=60;  if [[ ! "$3" == "" ]]; then duration=$3;   fi
date_str=$logfile_ts; if [[ ! "$4" == "" ]]; then date_str=$4;   fi
CLEAN_ONLY='';if [[ ! "$5" == "" ]]; then CLEAN_ONLY=$5; fi

  threads=8
# threads=1

 master_version=$( psql postgres -qtAXc "show server_version" -p $port1 )
replica_version=$( psql postgres -qtAXc "show server_version" -p $port2 )
 master_commit_hash=$( psql postgres -qtAXc "show server_version" -p $port1 | cut -d _ -f 6 )
replica_commit_hash=$( psql postgres -qtAXc "show server_version" -p $port2 | cut -d _ -f 6 )
  master_start_time=$( psql postgres -qtAXc "select pg_postmaster_start_time()" -p $port1 )
 replica_start_time=$( psql postgres -qtAXc "select pg_postmaster_start_time()" -p $port2 )
   master_patch_md5=$(echo "select md5(comments) from information_schema.sql_features where feature_name = 'patch md5'"|psql postgres -qtAXp $port1)
  replica_patch_md5=$(echo "select md5(comments) from information_schema.sql_features where feature_name = 'patch md5'"|psql postgres -qtAXp $port2)
 master_s_c=$( psql postgres -qtAXc "show synchronous_commit" -p $port1 )
replica_s_c=$( psql postgres -qtAXc "show synchronous_commit" -p $port2 )
  master_assert=$( psql postgres -qtAXc "show debug_assertions" -p $port1 )
 replica_assert=$( psql postgres -qtAXc "show debug_assertions" -p $port2 )

echo""
echo"-- scale $scale   clients $clients   duration $duration   CLEAN_ONLY=$CLEAN_ONLY"
echo""
echo"-- hostname: "$( hostname -s )
echo"-- timestamp: $date_str"
#echo -n "-- "; ps -ef f | grep 6972 | grep -Evw 'grep|xterm|screen|SCREEN' | less -iS
#echo -n "-- "; ps -ef f | grep 6973 | grep -Evw 'grep|xterm|screen|SCREEN' | less -iS
echo"-- master_start_time $master_start_time replica_start_time $replica_start_time"
if [[ "$master_patch_md5" == "$replica_patch_md5" ]];
then
echo "-- master  patch-md5 [$master_patch_md5]"
echo "-- replica patch-md5 [$replica_patch_md5] (ok)"
else
echo"-- master  patch-md5 [$master_patch_md5] - replica patch-md5 NOT the same, bailing out"
echo"-- replica patch-md5 [$replica_patch_md5] - replica patch-md5 NOT the same, bailing out"
exit -1
fi
echo "-- synchronous_commit, master  [$master_s_c]  replica [$replica_s_c]"
echo "-- master_assert  [$master_assert]  replica_assert [$replica_assert]"
echo "-- self md5 "$( md5sum $0 )

unset PGSERVICEFILE PGSERVICE   # PGPORT PGDATA PGHOST
export PGDATABASE=postgres

 pgdata_master=$HOME/dataoss
pgdata_replica=$HOME/dataoss2

function cb()
{
  #  display the 4 pgbench tables' accumulated content as md5s
  #  a,b,t,h stand for:  pgbench_accounts, -branches, -tellers, -history
  md5_total_6972='-1'
  md5_total_6973='-2'
  num_tables=$( echo "select count(*) from pg_class where relkind = 'r' and relname ~ '^pgbench_'" | psql postgres -qtAXp 6972 )
  if [[ $num_tables -ne 4 ]] 
  then
 echo "pgbench tables not 4 - exit" >> $outf
 exit
  fi
  for port in $port1 $port2
  do
md5_a=$(echo "select * from pgbench_accounts order by aid"|psql postgres -qtAXp$port|md5sum|cut -b 1-9)
md5_b=$(echo "select * from pgbench_branches order by bid"|psql postgres -qtAXp$port|md5sum|cut -b 1-9)
md5_t=$(echo "select * from pgbench_tellers  order by tid"|psql postgres -qtAXp$port|md5sum|cut -b 1-9)
md5_h=$(echo "select * from pgbench_history  order by hid"|psql postgres -qtAXp$port|md5sum|cut -b 1-9)
cnt_a=$(echo "select

Re: Adding new commitfest entry?

On Tue, Jan 5, 2021 at 12:46 PM Masahiko Sawada  wrote:
>
> On Tue, Jan 5, 2021 at 9:45 AM Tom Lane  wrote:
> >
> > Peter Smith  writes:
> > > On Tue, Jan 5, 2021 at 11:36 AM Tom Lane  wrote:
> > >> Hm, there should be a "New patch" button just below the "Commitfest
> > >> 2021-03" title.
> >
> > > Not for me. I see only 2 buttons - "Search/Filter" and "Shortcuts".
> >
> > You sure you're logged in according to the status in the upper right?
> > Try clearing your browser cache, too.
> >
> > If still no joy, pgsql-www is probably where to complain.
> >
>
> Sorry, I had forgotten to switch Commitfest 2021-03 as "open”. I’ve
> switched it now. Please check if you can add the patch to the
> Commitfest 2021-03.

OK now. Thanks!


Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Context diffs

2021-01-04 Thread Noah Misch

On Tue, Jan 05, 2021 at 11:21:07AM +1300, Thomas Munro wrote:
> On Tue, Jan 5, 2021 at 8:07 AM Bruce Momjian  wrote:
> > *  "git apply" and "git am" can't process context diffs (they throw an
> >error once a context-like section of the diff is hit; simple
> >adding/removing lines in a block works)
> >
> > *  the commit-fest doesn't recognized context diff attachments as
> > patches:
> >
> > https://commitfest.postgresql.org/31/2912/
> >
> > *  cfbot can't process file renames/add/delete from context diffs
> 
> For the record, cfbot just uses plain old GNU patch, because that
> seems to accept nearly everything that anyone posts here (after a step
> that tries to unpack tarballs etc).  Several people have suggested I
> change it to use git apply instead (IIRC it works better for patches
> containing binary files such as cryptographic keys?)

It does work better for binary files, though there's little benefit in storing
binary cryptographic keys as opposed to ASCII ones.  Unfortunately for the
cfbot, "git apply" forces the equivalent of "patch -F0", so it rejects patches
needlessly.  If you do change the cfbot, I recommend having it start with "git
apply -3" (able to succeed when plain "git apply" fails), then fallback to
"patch" when "git apply -3" fails.

Re: fix typo in ReorderBufferProcessTXN

On Tue, Jan 5, 2021 at 7:13 AM Masahiko Sawada  wrote:
>
> On Tue, Jan 5, 2021 at 9:54 AM Hou, Zhijie  wrote:
>
> > in contrast to normal the normal catalog during decoding == >> in contrast 
> > to the normal catalog during decoding
>
> Looks good change to me.
>

+1. I will push this in some time.

-- 
With Regards,
Amit Kapila.

RE: Disable WAL logging to speed up data loading

2021-01-04 Thread osumi.takami...@fujitsu.com

Hi, Sawada-San


On Monday, December 28, 2020 7:12 PM Masahiko Sawada  
wrote:
> On Mon, Dec 28, 2020 at 4:29 PM osumi.takami...@fujitsu.com
>  wrote:
> > On Monday, December 28, 2020 2:29 PM Masahiko Sawada
>  wrote:
> > > On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> > > > I've made a new patch v05 that took in comments to filter out WALs
> > > > more strictly and addressed some minor fixes that were discussed
> > > > within past few days.
> > > > Also, I changed the documentations, considering those modifications.
> > >
> > > From a backup management tool perspective, how can they detect that
> > > wal_level has been changed to ‘none' (and back to the normal)? IIUC
> > > once we changed wal_level to none, old backups that are taken before
> > > setting to ‘none’ can be used only for restoring the database to the
> > > point before the LSN where setting 'wal_level = none'. The users can
> > > neither restore the database to any points in the term of 'wal_level
> > > = none' nor use an old backup to restore the database to the point
> > > after LSN where setting 'wal_level = none’. I think we might need to
> > > provide a way to detect the changes other than reading
> XLOG_PARAMETER_CHANGE.
> > In the past, we discussed the aspect of backup management tool in [1]
> > and concluded that this should be another patch separated from this
> > thread because to compare the wal_level changes between snapshots
> > applies to wal_level = minimal, too. Please have a look at the "second idea"
> > in the e-mail in the [1] and responses to it.
> >
> 
> Thank you for telling me about the discussion!
> 
> The discussion already started on another thread? I think it might be better 
> to
> consider it in parallel, if not started yet. We can verify beforehand that the
> proposed solutions will really work well together with backup management
> tools. And from the user perspective, I wonder if they would like to see this
> feature in the same release where wal_level = none is introduced. Otherwise,
> the fact that some backup management tools won’t be able to work together
> with wal_level = none will be a big restriction for users. That patch would
> probably not be a blocker of this patch but will facilitate the discussion.
I don't think the new thread is created already.

By the way, I checked documents and user manuals of backup management tools like
pgBackRest, EDB's BART and pg_probackup respectively.
These tools work when wal_level is higher than minimal
because these use physical online backup or wal archiving and thus
they don't need to work together with wal_level=minimal or none.

> the fact that some backup management tools won’t be able to work together
> with wal_level = none will be a big restriction for users.
Accordingly, it turned out that current situation or introducing wal_level=none
doesn't become restriction for users from the backup management tools 
perspective.
Any comments ?

Best Regards,
Takamichi Osumi

Re: Adding new commitfest entry?

On Tue, Jan 5, 2021 at 9:45 AM Tom Lane  wrote:
>
> Peter Smith  writes:
> > On Tue, Jan 5, 2021 at 11:36 AM Tom Lane  wrote:
> >> Hm, there should be a "New patch" button just below the "Commitfest
> >> 2021-03" title.
>
> > Not for me. I see only 2 buttons - "Search/Filter" and "Shortcuts".
>
> You sure you're logged in according to the status in the upper right?
> Try clearing your browser cache, too.
>
> If still no joy, pgsql-www is probably where to complain.
>

Sorry, I had forgotten to switch Commitfest 2021-03 as "open”. I’ve
switched it now. Please check if you can add the patch to the
Commitfest 2021-03.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: fix typo in ReorderBufferProcessTXN

On Tue, Jan 5, 2021 at 9:54 AM Hou, Zhijie  wrote:
>
> Hi
>
>
>
> I found a possible typo in the comment of  ReorderBufferProcessTXN().
>
>
>
> * relmapper has no "historic" view, in contrast to normal
>
> * the normal catalog during decoding. Thus repeated
>
>
>
> Is there an extra ‘normal’ in the comment ?

Yeah, I think so.

> in contrast to normal the normal catalog during decoding == >> in contrast to 
> the normal catalog during decoding

Looks good change to me.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: New IndexAM API controlling index vacuum strategies

On Tue, Dec 29, 2020 at 3:25 PM Masahiko Sawada  wrote:
>
> On Tue, Dec 29, 2020 at 7:06 AM Peter Geoghegan  wrote:
> >
> > On Sun, Dec 27, 2020 at 11:41 PM Peter Geoghegan  wrote:
> > > I experimented with this today, and I think that it is a good way to
> > > do it. I like the idea of choose_vacuum_strategy() understanding that
> > > heap pages that are subject to many non-HOT updates have a "natural
> > > extra capacity for LP_DEAD items" that it must care about directly (at
> > > least with non-default heap fill factor settings). My early testing
> > > shows that it will often take a surprisingly long time for the most
> > > heavily updated heap page to have more than about 100 LP_DEAD items.
> >
> > Attached is a rough patch showing what I did here. It was applied on
> > top of my bottom-up index deletion patch series and your
> > poc_vacuumstrategy.patch patch. This patch was written as a quick and
> > dirty way of simulating what I thought would work best for bottom-up
> > index deletion for one specific benchmark/test, which was
> > non-hot-update heavy. This consists of a variant pgbench with several
> > indexes on pgbench_accounts (almost the same as most other bottom-up
> > deletion benchmarks I've been running). Only one index is "logically
> > modified" by the updates, but of course we still physically modify all
> > indexes on every update. I set fill factor to 90 for this benchmark,
> > which is an important factor for how your VACUUM patch works during
> > the benchmark.
> >
> > This rough supplementary patch includes VACUUM logic that assumes (but
> > doesn't check) that the table has heap fill factor set to 90 -- see my
> > changes to choose_vacuum_strategy(). This benchmark is really about
> > stability over time more than performance (though performance is also
> > improved significantly). I wanted to keep both the table/heap and the
> > logically unmodified indexes (i.e. 3 out of 4 indexes on
> > pgbench_accounts) exactly the same size *forever*.
> >
> > Does this make sense?
>
> Thank you for sharing the patch. That makes sense.
>
> +if (!vacuum_heap)
> +{
> +if (maxdeadpage > 130 ||
> +/* Also check if maintenance_work_mem space is running out */
> +vacrelstats->dead_tuples->num_tuples >
> +vacrelstats->dead_tuples->max_tuples / 2)
> +vacuum_heap = true;
> +}
>
> The second test checking if maintenane_work_mem space is running out
> also makes sense to me. Perhaps another idea would be to compare the
> number of collected garbage tuple to the total number of heap tuples
> so that we do lazy_vacuum_heap() only when we’re likely to reclaim a
> certain amount of garbage in the table.
>
> >
> > Anyway, with a 15k TPS limit on a pgbench scale 3000 DB, I see that
> > pg_stat_database shows an almost ~28% reduction in blks_read after an
> > overnight run for the patch series (it was 508,820,699 for the
> > patches, 705,282,975 for the master branch). I think that the VACUUM
> > component is responsible for some of that reduction. There were 11
> > VACUUMs for the patch, 7 of which did not call lazy_vacuum_heap()
> > (these 7 VACUUM operations all only dead a btbulkdelete() call for the
> > one problematic index on the table, named "abalance_ruin", which my
> > supplementary patch has hard-coded knowledge of).
>
> That's a very good result in terms of skipping lazy_vacuum_heap(). How
> much the table and indexes bloated? Also, I'm curious about that which
> tests in choose_vacuum_strategy() turned vacuum_heap on: 130 test or
> test if maintenance_work_mem space is running out? And what was the
> impact on clearing all-visible bits?
>

I merged these patches and polished it.

In the 0002 patch, we calculate how many LP_DEAD items can be
accumulated in the space on a single heap page left by fillfactor. I
increased MaxHeapTuplesPerPage so that we can accumulate LP_DEAD items
on a heap page. Because otherwise accumulating LP_DEAD items
unnecessarily constrains the number of heap tuples in a single page,
especially when small tuples, as I mentioned before. Previously, we
constrained the number of line pointers to avoid excessive
line-pointer bloat and not require an increase in the size of the work
array. However, once amvacuumstrategy stuff entered the picture,
accumulating line pointers has value. Also, we might want to store the
returned value of amvacuumstrategy so that index AM can refer to it on
index-deletion.

The 0003 patch has btree indexes skip bulk-deletion if the index
doesn't grow since last bulk-deletion. I stored the number of blocks
in the meta page but didn't implement meta page upgrading.

I've attached the draft version patches. Note that the documentation
update is still lacking.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


0001-Introduce-IndexAM-API-for-choosing-index-vacuum-stra.patch
Description: Binary data

Re: [PATCH] Simple progress reporting for COPY command

2021-01-04 Thread Josef Šimánek

út 5. 1. 2021 v 0:46 odesílatel Tomas Vondra
 napsal:
>
> Hi,
>
> I did take a quick look today, and I have a couple minor comments:

Hi! Thanks for your time.

> 1) The catalog sgml docs seem to mention bytes_processed twice (one of
> that should be bytes_total), and line_processed (should be "lines_").

Fixed in attached patch.

>
> 2) I'm not quite sure about not including any info about the command.
> For example pg_stat_progress_create_index includes info about the
> command, although it's not very detailed. Not sure how useful would it
> be to show just COPY FROM / COPY TO, without more details.
>
> It's probably possible to extract this from pg_stat_activity, but that
> may be rather cumbersome (parsing arbitrary SQL and all that). Also,
> what if the COPY is called from a function etc.?

Any idea where to discuss this? My usecase is really simple. I would
like to be able to see progress of COPY command by pid. There is a lot
of COPY info inside CopyToStateData and CopyFromStateData structs. The
only limitation I see is support of only int64 values for progress
reporting columns. I'm not sure if it is easily possible to expose for
example filename this way.

>
> 3) I wonder if we should have something like lines_estimated. For COPY
> TO it's pretty simple to calculate it as
>
>  bytes_total / (bytes_processed / lines_processed)
>
> but what about
>
>  COPY (query) TO file
>
> In that case we don't know the total amount of bytes / rows, so we can't
> calculate any estimate. So maybe we could peek into the query plan? But
> I agree this is something we can add later.

If I remember well one of the original ideas was to be able to pass
custom bytes_total (or lines_total) by COPY options to be stored in
copy progress. I can add this in some following patch if still
welcomed.

For estimates I would prefer to add an additional column to not mix
those two together (or at least boolean estimated = true/false and
reuse bytes_total column). If query estimates are welcomed, I can take
a look at how to reach the query plan and expose those numbers when
the query is used to estimated_lines and potentially estimated_bytes
columns. It would be probably a little tricky to calculate
estimated_bytes for some column types.

Also currently only COPY FROM file supports bytes_total (it is 0 for
all other scenarios). I think it should be possible for PostgreSQL to
know the actual amount of lines query returns for some kind of
queries, but I have no idea where to look at this. If that's possible
to get, it would be one of the next steps to introduce additional
column lines_total.

>
> 4) This comment is a bit confusing, as it mixes "total" and "processed".
> I'd just say "number of bytes processed so far" instead.
>
Fixed in attached patch.
>
> Other than that, it seems fine. I'm sure we could add more features, but
> it seems like a good start - I plan to get this committed once I get a
> patch fixing the docs issues.

Patch is attached, it should be applied to the top of the previous
patch. Overall patch (having both patches merged together) could be
found at 
https://patch-diff.githubusercontent.com/raw/simi/postgres/pull/6.patch.

> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
From 6d2ee68b227c05ce4d1eb95a4c4a9c4f7dd6fbfa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 5 Jan 2021 02:07:03 +0100
Subject: [PATCH] Fix docs and comment.

---
 doc/src/sgml/monitoring.sgml | 4 ++--
 src/include/commands/copyfrom_internal.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 51d261defd94..875133303e19 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6477,7 +6477,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   bytes_processed bigint
+   bytes_total bigint
   
   
Size of source file for COPY FROM command in bytes. It is set to 0 if not available.
@@ -6486,7 +6486,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   line_processed bigint
+   lines_processed bigint
   
   
Number of lines already processed by COPY command.
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index ae76be295a9b..80fac1e58a12 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -154,7 +154,7 @@ typedef struct CopyFromStateData
 	char	   *raw_buf;
 	int			raw_buf_index;	/* next byte to process */
 	int			raw_buf_len;	/* total # of bytes stored */
-	uint64  bytes_processed;/* total # of bytes processed, used for progress reporting */
+	uint64  bytes_processed;/* number of bytes processed so far */
 	/* Shorthand for number of unconsumed bytes available in raw_buf */
 #define RAW_BUF_BYTES(cstate)

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2021-01-04 Thread Kyotaro Horiguchi

At Wed, 8 Jul 2020 12:56:44 +, Paul Guo  wrote in 
> On 2020/01/15 19:18, Paul Guo wrote:
> I further fixed the last test failure (due to a small bug in the test, not in 
> code). Attached are the new patch series. Let's see the CI pipeline result.
> 
> Thanks for updating the patches!
> 
> I started reading the 0003 patch.
> 
> The approach that the 0003 patch uses is not the perfect solution.
> If the standby crashes after tblspc_redo() removes the directory and before
> its subsequent COMMIT record is replayed, PANIC error would occur since
> there can be some unresolved missing directory entries when we reach the
> consistent state. The problem would very rarely happen, though...
> Just idea; calling XLogFlush() to update the minimum recovery point just
> before tblspc_redo() performs destroy_tablespace_directories() may be
> safe and helpful for the problem?

It seems to me that what the current patch does is too complex.  What
we need to do here is to remember every invalid operation then forget
it when the prerequisite object is dropped.

When a table space is dropped before consistency is established, we
don't need to care what has been performed inside the tablespace.  In
this perspective, it is enough to remember tablespace ids when failed
to do something inside it due to the absence of the tablespace and
then forget it when we remove it.  We could remember individual
database id to show them in error messages, but I'm not sure it's
useful.  The reason log_invalid_page records block numbers is to allow
the machinery handle partial table truncations, but this is not the
case since dropping tablespace cannot leave some of containing
databases.

As the result, we won't see an unresolved invalid operations in a
dropped tablespace.

Am I missing something?

dbase_redo:
+  if (!(stat(parent_path, ) == 0 && S_ISDIR(st.st_mode)))
+  {
+XLogRecordMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);

This means "record the belonging table space directory if it is not
found OR it is not a directory". The former can be valid but the
latter is unconditionally can not (I don't think we bother considering
symlinks there).

+/*
+ * Source directory may be missing.  E.g. the template database used
+ * for creating this database may have been dropped, due to reasons
+ * noted above.  Moving a database from one tablespace may also be a
+ * partner in the crime.
+ */
+if (!(stat(src_path, ) == 0 && S_ISDIR(st.st_mode)))
+{
+  XLogLogMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path);

This is a part of *creation* of the target directory. Lack of the
source directory cannot be valid even if the source directory is
dropped afterwards in the WAL stream and we can allow that if the
*target* tablespace is dropped afterwards. As the result, as I
mentioned above, we don't need to record about the database directory.

By the way the name XLogLogMiss.. is somewhat confusing. How about
XLogReportMissingDir (named after report_invalid_page).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: language cleanups in code and docs

On Tue, Jan 5, 2021 at 1:44 PM Dagfinn Ilmari Mannsåker
 wrote:
> Thanks!  Just after sending that, I thought to grep for "white\W*list"
> as well, and found a few more occurrences that were trivially reworded,
> per the attached patch.

Pushed.

fix typo in ReorderBufferProcessTXN

2021-01-04 Thread Hou, Zhijie

Hi

I found a possible typo in the comment of  ReorderBufferProcessTXN().

* relmapper has no "historic" view, in contrast to normal
* the normal catalog during decoding. Thus repeated

Is there an extra ‘normal’ in the comment ?

in contrast to normal the normal catalog during decoding == >> in contrast to 
the normal catalog during decoding

Best regards,
houzj




0001-fix-typo.patch
Description: 0001-fix-typo.patch

Re: PoC/WIP: Extended statistics on expressions

2021-01-04 Thread Tomas Vondra


On 1/4/21 4:34 PM, Dean Rasheed wrote:


... 


Some other comments:

* I'm not sure I understand the need for 0001. Wasn't there an earlier
version of this patch that just did it by re-populating the type
array, but which still had it as an array rather than turning it into
a list? Making it a list falsifies some of the comments and
function/variable name choices in that file.



That's a bit done to Justin - I think it's fine to use the older version 
repopulating the type array, but that question is somewhat unrelated to 
this patch.



* There's a comment typo in catalog/Makefile -- "are are reputedly
other...", should be "there are reputedly other...".

* Looking at the pg_stats_ext view, I think perhaps expressions stats
should be omitted entirely from that view, since it doesn't show any
useful information about them. So it could remove "e" from the "kinds"
array, and exclude rows whose only kind is "e", since such rows have
no interesting data in them. Essentially, the new view
pg_stats_ext_exprs makes having any expression stats in pg_stats_ext
redundant. Hiding this data in pg_stats_ext would also be consistent
with making the "expressions" stats kind hidden from the user.



Hmmm, not sure. I'm not sure removing 'e' from the array is a good idea. 
On the one hand it's internal detail, on the other hand most of that 
view is internal detail too. Excluding rows with only 'e' seems 
reasonable, though. I need to think about this.



* In gram.y, it wasn't quite obvious why you converted the column list
for CREATE STATISTICS from an expr_list to a stats_params list. I
figured it out, and it makes sense, but I think it could use a
comment, perhaps something along the lines of the one for index_elem,
e.g.:

/*
  * Statistics attributes can be either simple column references, or arbitrary
  * expressions in parens.  For compatibility with index attributes permitted
  * in CREATE INDEX, we allow an expression that's just a function call to be
  * written without parens.
  */



OH, right. I'd have trouble figuring this myself, and I wrote that code 
myself only one or two months ago.



* In parse_func.c and parse_agg.c, there are a few new error strings
that use the abbreviation "stats expressions", whereas most other
errors refer to "statistics expressions". For consistency, I think
they should all be the latter.



OK, will fix.


* In generateClonedExtStatsStmt(), I think the "expressions" stats
kind needs to be explicitly excluded, otherwise CREATE TABLE (LIKE
...) fails if the source table has expression stats.



Yeah, will fix. I guess this also means we're missing some tests.


* CreateStatistics() uses ShareUpdateExclusiveLock, but in
tcop/utility.c the relation is opened with a ShareLock. ISTM that the
latter lock mode should be made to match CreateStatistics().



Not sure, will check.


* Why does the new code in tcop/utility.c not use
RangeVarGetRelidExtended together with RangeVarCallbackOwnsRelation?
That seems preferable to doing the ACL check in CreateStatistics().
For one thing, as it stands, it allows the lock to be taken even if
the user doesn't own the table. Is it intentional that the current
code allows extended stats to be created on system catalogs? That
would be one thing that using RangeVarCallbackOwnsRelation would
change, but I can't see a reason to allow it.



I think I copied the code from somewhere - probably expression indexes, 
or something like that. Not a proof that it's the right/better way to do 
this, though.



* In src/bin/psql/describe.c, I think the \d output should also
exclude the "expressions" stats kind and just list the other kinds (or
have no kinds list at all, if there are no other kinds), to make it
consistent with the CREATE STATISTICS syntax.



Not sure I understand. Why would this make it consistent with CREATE 
STATISTICS? Can you elaborate?



* The pg_dump output for a stats object whose only kind is
"expressions" is broken -- it includes a spurious "()" for the kinds
list.



Will fix. Again, this suggests there are TAP tests missing.


That's it for now. I'll look at the optimiser changes next, and try to
post more comments later this week.



Thanks!


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Adding new commitfest entry?

Peter Smith  writes:
> On Tue, Jan 5, 2021 at 11:36 AM Tom Lane  wrote:
>> Hm, there should be a "New patch" button just below the "Commitfest
>> 2021-03" title.

> Not for me. I see only 2 buttons - "Search/Filter" and "Shortcuts".

You sure you're logged in according to the status in the upper right?
Try clearing your browser cache, too.

If still no joy, pgsql-www is probably where to complain.

regards, tom lane

Re: language cleanups in code and docs

2021-01-04 Thread Dagfinn Ilmari Mannsåker

Thomas Munro  writes:

> On Tue, Jan 5, 2021 at 1:12 PM Dagfinn Ilmari Mannsåker
>  wrote:
>> Magnus Hagander  writes:
>> > In looking at this I realize we also have exactly one thing referred to as
>> > "blacklist" in our codebase, which is the "enum blacklist" (and then a
>> > small internal variable in pgindent).
>>
>> Here's a patch that renames the @whitelist and %blacklist variables in
>> pgindent to @additional and %excluded, and adjusts the comments to
>> match.
>
> Pushed.  Thanks!

Thanks!  Just after sending that, I thought to grep for "white\W*list"
as well, and found a few more occurrences that were trivially reworded,
per the attached patch.

- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen

>From 43e9c60bac7b1702e5be2362a439f67adc8a5e06 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Tue, 5 Jan 2021 00:20:49 +
Subject: [PATCH] Replace remaining uses of "whitelist"

Instead describe the action that the list effects, or just use "list"
where the meaning is obvious from context.
---
 contrib/postgres_fdw/postgres_fdw.h| 2 +-
 contrib/postgres_fdw/shippable.c   | 4 ++--
 src/backend/access/hash/hashvalidate.c | 2 +-
 src/backend/utils/adt/lockfuncs.c  | 2 +-
 src/tools/pginclude/README | 4 ++--
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 277a30f500..19ea27a1bc 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -77,7 +77,7 @@ typedef struct PgFdwRelationInfo
 	bool		use_remote_estimate;
 	Cost		fdw_startup_cost;
 	Cost		fdw_tuple_cost;
-	List	   *shippable_extensions;	/* OIDs of whitelisted extensions */
+	List	   *shippable_extensions;	/* OIDs of shippable extensions */
 
 	/* Cached catalog information. */
 	ForeignTable *table;
diff --git a/contrib/postgres_fdw/shippable.c b/contrib/postgres_fdw/shippable.c
index c43e7e5ec5..b27f82e015 100644
--- a/contrib/postgres_fdw/shippable.c
+++ b/contrib/postgres_fdw/shippable.c
@@ -7,7 +7,7 @@
  * data types are shippable to a remote server for execution --- that is,
  * do they exist and have the same behavior remotely as they do locally?
  * Built-in objects are generally considered shippable.  Other objects can
- * be shipped if they are white-listed by the user.
+ * be shipped if they are declared as such by the user.
  *
  * Note: there are additional filter rules that prevent shipping mutable
  * functions or functions using nonportable collations.  Those considerations
@@ -110,7 +110,7 @@ InitializeShippableCache(void)
  *
  * Right now "shippability" is exclusively a function of whether the object
  * belongs to an extension declared by the user.  In the future we could
- * additionally have a whitelist of functions/operators declared one at a time.
+ * additionally have a list of functions/operators declared one at a time.
  */
 static bool
 lookup_shippable(Oid objectId, Oid classId, PgFdwRelationInfo *fpinfo)
diff --git a/src/backend/access/hash/hashvalidate.c b/src/backend/access/hash/hashvalidate.c
index 8462540017..1e343df0af 100644
--- a/src/backend/access/hash/hashvalidate.c
+++ b/src/backend/access/hash/hashvalidate.c
@@ -312,7 +312,7 @@ check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype)
 		 * that are different from but physically compatible with the opclass
 		 * datatype.  In some of these cases, even a "binary coercible" check
 		 * fails because there's no relevant cast.  For the moment, fix it by
-		 * having a whitelist of allowed cases.  Test the specific function
+		 * having a list of allowed cases.  Test the specific function
 		 * identity, not just its input type, because hashvarlena() takes
 		 * INTERNAL and allowing any such function seems too scary.
 		 */
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 9f2c4946c9..0db8be6c91 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -644,7 +644,7 @@ pg_isolation_test_session_is_blocked(PG_FUNCTION_ARGS)
 	/*
 	 * Check if blocked_pid is waiting for a safe snapshot.  We could in
 	 * theory check the resulting array of blocker PIDs against the
-	 * interesting PIDs whitelist, but since there is no danger of autovacuum
+	 * interesting PIDs list, but since there is no danger of autovacuum
 	 * blocking GetSafeSnapshot there seems to be no point in expending cycles
 	 * on allocating a buffer and searching for overlap; so it's presently
 	 * sufficient for the isolation tester's purposes to use a single element
diff --git a/src/tools/pginclude/README b/src/tools/pginclude/README
index a067c7f472..49eb4b6907 100644
--- a/src/tools/pginclude/README
+++ b/src/tools/pginclude/README
@@ -64,7 +64,7 @@ with no prerequisite headers other than postgres.h (or postgres_fe.h
 or c.h, as appropriate).
 
 A small number

Re: Adding new commitfest entry?

On Tue, Jan 5, 2021 at 11:36 AM Tom Lane  wrote:

> Hm, there should be a "New patch" button just below the "Commitfest
> 2021-03" title.

Not for me. I see only 2 buttons - "Search/Filter" and "Shortcuts".

---
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Adding new commitfest entry?

Peter Smith  writes:
> I wanted to add a new commitfest entry but although I am logged in I
> do not see any "Add" button either in the current or future CF.
> https://commitfest.postgresql.org/31/
> https://commitfest.postgresql.org/32/

Hm, there should be a "New patch" button just below the "Commitfest
2021-03" title.

regards, tom lane

Adding new commitfest entry?

Hi.

I wanted to add a new commitfest entry but although I am logged in I
do not see any "Add" button either in the current or future CF.

https://commitfest.postgresql.org/31/
https://commitfest.postgresql.org/32/

How to add?


Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: language cleanups in code and docs

On Tue, Jan 5, 2021 at 1:12 PM Dagfinn Ilmari Mannsåker
 wrote:
> Magnus Hagander  writes:
> > In looking at this I realize we also have exactly one thing referred to as
> > "blacklist" in our codebase, which is the "enum blacklist" (and then a
> > small internal variable in pgindent).
>
> Here's a patch that renames the @whitelist and %blacklist variables in
> pgindent to @additional and %excluded, and adjusts the comments to
> match.

Pushed.  Thanks!

Re: language cleanups in code and docs

2021-01-04 Thread Dagfinn Ilmari Mannsåker

Magnus Hagander  writes:

> In looking at this I realize we also have exactly one thing referred to as
> "blacklist" in our codebase, which is the "enum blacklist" (and then a
> small internal variable in pgindent).

Here's a patch that renames the @whitelist and %blacklist variables in
pgindent to @additional and %excluded, and adjusts the comments to
match.

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law

>From 6525826bdf87ce02bd0a1648f94c7122290907f6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Tue, 5 Jan 2021 00:10:07 +
Subject: [PATCH] Rename whitelist/blacklist in pgindent to additional/excluded

---
 src/tools/pgindent/pgindent | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index 4124d27dea..feb02067c5 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -54,12 +54,12 @@ $excludes ||= "$code_base/src/tools/pgindent/exclude_file_patterns"
 # some names we want to treat like typedefs, e.g. "bool" (which is a macro
 # according to ), and may include some names we don't want
 # treated as typedefs, although various headers that some builds include
-# might make them so.  For the moment we just hardwire a whitelist of names
-# to add and a blacklist of names to remove; eventually this may need to be
+# might make them so.  For the moment we just hardwire a list of names
+# to add and a list of names to exclude; eventually this may need to be
 # easier to configure.  Note that the typedefs need trailing newlines.
-my @whitelist = ("bool\n");
+my @additional = ("bool\n");
 
-my %blacklist = map { +"$_\n" => 1 } qw(
+my %excluded = map { +"$_\n" => 1 } qw(
   ANY FD_SET U abs allocfunc boolean date digit ilist interval iterator other
   pointer printfunc reference string timestamp type wrap
 );
@@ -134,11 +134,11 @@ sub load_typedefs
 		}
 	}
 
-	# add whitelisted entries
-	push(@typedefs, @whitelist);
+	# add additional entries
+	push(@typedefs, @additional);
 
-	# remove blacklisted entries
-	@typedefs = grep { !$blacklist{$_} } @typedefs;
+	# remove excluded entries
+	@typedefs = grep { !$excluded{$_} } @typedefs;
 
 	# write filtered typedefs
 	my $filter_typedefs_fh = new File::Temp(TEMPLATE => "pgtypedefX");
-- 
2.29.2

Fix typo in comment

PSA a trivial patch to correct a comment typo.


Kind Regards,
Peter Smith
Fujitsu Australia.


fix_typo.patch
Description: Binary data

Re: [PATCH] Simple progress reporting for COPY command

2021-01-04 Thread Tomas Vondra


Hi,

I did take a quick look today, and I have a couple minor comments:

1) The catalog sgml docs seem to mention bytes_processed twice (one of 
that should be bytes_total), and line_processed (should be "lines_").



2) I'm not quite sure about not including any info about the command. 
For example pg_stat_progress_create_index includes info about the 
command, although it's not very detailed. Not sure how useful would it 
be to show just COPY FROM / COPY TO, without more details.


It's probably possible to extract this from pg_stat_activity, but that 
may be rather cumbersome (parsing arbitrary SQL and all that). Also, 
what if the COPY is called from a function etc.?



3) I wonder if we should have something like lines_estimated. For COPY 
TO it's pretty simple to calculate it as


bytes_total / (bytes_processed / lines_processed)

but what about

COPY (query) TO file

In that case we don't know the total amount of bytes / rows, so we can't 
calculate any estimate. So maybe we could peek into the query plan? But 
I agree this is something we can add later.



4) This comment is a bit confusing, as it mixes "total" and "processed". 
I'd just say "number of bytes processed so far" instead.




Other than that, it seems fine. I'm sure we could add more features, but 
it seems like a good start - I plan to get this committed once I get a 
patch fixing the docs issues.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: language cleanups in code and docs

On Wed, Nov 4, 2020 at 4:10 AM Magnus Hagander  wrote:
> On Wed, Oct 21, 2020 at 11:23 PM Thomas Munro  wrote:
> > Hmm, can we find a more descriptive name for this mechanism?  What
> > about calling it the "uncommitted enum table"?  See attached.
>
> Thanks for picking this one up again!
>
> Agreed, that's a much better choice.
>
> The term itself is a bit of a mouthful, but it does describe what it
> is in a much more clear way than what the old term did anyway.
>
> Maybe consider just calling it "uncomitted enums", which would as a
> bonus have it not end up talking about uncommittedenumtablespace which
> gets hits on searches for tablespace.. Though I'm not sure that's
> important.
>
> I'm +1 to the change with or without that adjustment.

Cool.  I went with your suggestion.

> As for the code, I note that:
> +   /* Set up the enum table if not already done in this transaction */
>
> forgets to say it's *uncomitted* enum table -- which is the important
> part of I believe.

Fixed.

> And
> + * Test if the given enum value is in the table of blocked enums.
>
> should probably talk about uncommitted rather than blocked?

Fixed.

And pushed.

Re: Handing off SLRU fsyncs to the checkpointer

On Mon, Jan 4, 2021 at 3:35 AM Tomas Vondra
 wrote:
> Seems this commit left behind a couple unnecessary prototypes in a bunch
> of header files. In particular, it removed these functions
>
> - ShutdownCLOG();
> - ShutdownCommitTs();
> - ShutdownSUBTRANS();
> - ShutdownMultiXact();

Thanks.  Fixed.

Re: Context diffs

On Tue, Jan 5, 2021 at 8:07 AM Bruce Momjian  wrote:
> *  "git apply" and "git am" can't process context diffs (they throw an
>error once a context-like section of the diff is hit; simple
>adding/removing lines in a block works)
>
> *  the commit-fest doesn't recognized context diff attachments as
> patches:
>
> https://commitfest.postgresql.org/31/2912/
>
> *  cfbot can't process file renames/add/delete from context diffs

For the record, cfbot just uses plain old GNU patch, because that
seems to accept nearly everything that anyone posts here (after a step
that tries to unpack tarballs etc).  Several people have suggested I
change it to use git apply instead (IIRC it works better for patches
containing binary files such as cryptographic keys?), but then it
wouldn't accept ye olde context diffs.

Re: pg_waldump/heapdesc.c and struct field names

On Sun, Jan 3, 2021 at 8:58 PM Masahiko Sawada  wrote:
> On Mon, Jan 4, 2021 at 12:55 PM Peter Geoghegan  wrote:
> +1 for changing heapdesc.c on master. It's not only readable but also
> consistent with other *desc showing the field named latestRemovedXid.
> For instance, nbtdesc.c has:
>
> case XLOG_BTREE_REUSE_PAGE:
> {
> xl_btree_reuse_page *xlrec = (xl_btree_reuse_page *) rec;
>
> appendStringInfo(buf, "rel %u/%u/%u; latestRemovedXid %u",
>  xlrec->node.spcNode, xlrec->node.dbNode,
>  xlrec->node.relNode, 
> xlrec->latestRemovedXid);
> break;
> }

Right. Self-consistency matters, as does consistency with the source
code itself.

-- 
Peter Geoghegan

Re: pg_waldump/heapdesc.c and struct field names

On Mon, Jan 4, 2021 at 1:12 PM Andres Freund  wrote:
> I personally mildly prefer remxid - anything that is understandable but
> shortens the line length is good for my uses of waldump.

I want to use latestRemovedXid here because it is quite recognizable
to me as a symbol name. It appears as a symbol name 84 times, across
many files in several distinct areas. Whereas remxid means nothing to
me unless I go look it up, which is not ideal.

-- 
Peter Geoghegan

Re: pg_waldump/heapdesc.c and struct field names

2021-01-04 Thread Andres Freund

Hi,

On 2021-01-03 19:54:38 -0800, Peter Geoghegan wrote:
> I notice that heapdesc.c outputs the field latestRemovedXid as
> "remxid". But why? What sense is there in changing the name for output
> by tools like pg_waldump, which are intrinsically internals focussed?

I personally mildly prefer remxid - anything that is understandable but
shortens the line length is good for my uses of waldump.

Greetings,

Andres Freund

Re: Context diffs

On Mon, Jan 4, 2021 at 11:07 AM Bruce Momjian  wrote:
> *  "git apply" and "git am" can't process context diffs (they throw an
>error once a context-like section of the diff is hit; simple
>adding/removing lines in a block works)

I think that this is the big issue for me. I really find it convenient
to use "git format-patch" to produce patches for complicated long
lived projects -- having commit messages and explicit ordering of
multiple patches is really helpful. I've come to prefer unified,
perhaps as a result of using "git format-patch" so much.

-- 
Peter Geoghegan

Re: Context diffs

Bruce Momjian  writes:
> Our developer FAQ mentions context diffs in several places for their
> improved readability:
>   https://wiki.postgresql.org/wiki/Developer_FAQ

I think that's kind of out of date; AFAIK the majority of hackers
think -u format is more readable.  (For myself, I've gotten better
at reading -u format, but I still find it illegible for complex
changes.  Still, I try to always post patches in -u format to
defer to the majority.)

It is true that some of the tools we use don't work right with
anything except "git diff" output or close equivalents.

regards, tom lane

Context diffs

Our developer FAQ mentions context diffs in several places for their
improved readability:

https://wiki.postgresql.org/wiki/Developer_FAQ

I also know git can product context diff as _output_.  However, there
are several negatives to context vs unified diffs in our workflow:

*  "git apply" and "git am" can't process context diffs (they throw an
   error once a context-like section of the diff is hit; simple
   adding/removing lines in a block works)

*  the commit-fest doesn't recognized context diff attachments as
patches:

https://commitfest.postgresql.org/31/2912/

*  cfbot can't process file renames/add/delete from context diffs

I am not 100% sure on the above items, but that is what I have seen in
my testing.  Should we post only unified diffs in certain cases, even
with the readability issues?  (I know some people prefer the readability
of unified diffs.)  Should we document this?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Proposed patch for key management

On Fri, Jan  1, 2021 at 06:26:36PM +, Alastair Turner wrote:
> After the long intro, my question - If using a standard format,
> managed by a library, for the internal keystore does not result in a
> smaller or simpler patch, is there enough other value for this to be
> worth considering? For security features, standards and building on
> well exercised code bases do really have value, but is it enough...

I know Stephen Frost looked at PKCS12 but didn't see much value in it
since the features provided by PKCS12 didn't seem to add much for
Postgres, and added complexity.  Using GCM for key wrapping, heap/index
files, and WAL seemed simple.

FYI, the current patch has an initdb option --copy-encryption-key, which
copies the key from an old cluster, for use by pg_upgrade.  Technically
you could create a directory called pg_cryptokey/live, put wrapped files
0/1 in there, and use that when initializing a new cluster.  That would
allow you to generate the data keys using your own random source.  I am
not sure how useful that would be, but we could document this ability.

There is all sorts of flexibility being proposed:

*  scope of keys
*  encryption method
*  encryption mode
*  internal/external

Some of this is related to wrapping the data keys, and some of it gets
to the encryption of cluster files.  I just don't see how anything with
the level of flexibility being asked for could ever be reliable or
simple to administer, and I don't see the security value of that
flexibility.  I feel at a certain point, we just need to choose the best
options and move forward.

I thought this had all been debated, but people continue to show up and
ask for it to be debated again.  At one level, these discussions are
valuable, but at a certain point you end up re-litigate this that you
get nothing else done.  I know some people who have given up working on
this feature for this reason.

I am not asking we stop discussing it, but I do ask we address the
negatives of suggestions, especially when the negatives have already
been clearly stated.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-04 Thread Michael Banck

Heya,

(changing the subject as we're moving the goalposts)

Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:
> * Michael Paquier (mich...@paquier.xyz) wrote:
> > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:
> > > I think enough people use data checksums these days that it warrants to
> > > be moved into the "normal part", like in the attached.
> > 
> > +1.  Let's see first what others think about this change.
> 
> I agree with this, but I'd also like to propose, again, as has been
> discussed a few times, making it the default too.

One thing my colleagues have complained is the seemingly excessive
amount of WAL generation when checksums are enabled (compared to the
default where data_checksums and wal_log_hints is both off) due to
additional FPIs.

So I made some quick benchmarks based on pgbench -i (i.e. just
initializing the data, not actually running queries) and seeing how much
WAL is produced during a VACUUM with a forced CHECKPOINT beforehand.

This creates a new instance, turns archiving on and then first does the
data-load with scale-factor 100 in pgbench (initialization steps "dtg"),
followed by a CHECKPOINT and then the VACUUM/PK generation steps
(initialization steps "vp"), followed by a final CHECKPOINT. It looks
like this where $CHECKSUM is either empty or '-k':

pg_ctl -D data1 stop; rm -rf data1/ data1_archive/*; 
initdb $CHECKSUM -D
data1; cp postgresql.conf data1; 
pg_ctl -D data1 -l data1_logfile start;
pgbench -s 100 -i -p 65432 -I dtg; echo CHECKPOINT | psql -p 65432;
pgbench -s 100 -i -p 65432 -I vp; echo CHECKPOINT | psql -p 65432;
du -s
-h data1/pg_wal data1/base data1_archive/

All runs were repeated twice. These are the $PGDATA/{pg_wal,base} sizes
and the archive, as well as the timing for the second pgbench
initialization step:

data_checksums=off, wal_compression=off

1,1Gdata1/pg_wal
1,5Gdata1/base
1,3Gdata1_archive/

done in 10.24 s (vacuum 3.31 s, primary keys 6.92 s).
done in 8.81 s (vacuum 2.72 s, primary keys 6.09 s).
done in 8.35 s (vacuum 2.32 s, primary keys 6.03 s).

data_checksums=on, wal_compression=off

1,5Gdata1/pg_wal
1,5Gdata1/base
2,5Gdata1_archive/

done in 67.42 s (vacuum 54.57 s, primary keys 12.85 s).
done in 65.03 s (vacuum 53.25 s, primary keys 11.78 s).
done in 77.57 s (vacuum 62.64 s, primary keys 14.94 s).

So data_checksums (and/or wal_log_hints, I ommitted those numbers as
they are basically identical to the data_checksums=on case) makes (i)
Vacuum run 20x and primary keys 2x longer and also increases the
generated WAL by 40% for pg_wal and roughly doubles the WAL in the
archive.

I then re-ran the tests with wal_compression=on in order to see how much
that helps:

data_checksums=off, wal_compression=on

1,1Gdata1/pg_wal
1,5Gdata1/base
1,2Gdata1_archive/

done in 26.60 s (vacuum 3.30 s, primary keys 23.30 s).
done in 19.54 s (vacuum 3.11 s, primary keys 16.43 s).
done in 19.50 s (vacuum 3.46 s, primary keys 16.04 s).

data_checksums=on, wal_compression=on

1,1Gdata1/pg_wal
1,5Gdata1/base
1,3Gdata1_archive/

done in 60.24 s (vacuum 42.52 s, primary keys 17.72 s).
done in 62.07 s (vacuum 45.64 s, primary keys 16.43 s).
done in 56.20 s (vacuum 40.96 s, primary keys 15.24 s).

This looks much better from the WAL size perspective, there's now almost
no additional WAL. However, that is because pgbench doesn't do TOAST, so
in a real-world example it might still be quite larger. Also, the vacuum
runtime is still 15x longer.

So maybe we should switch on wal_compression if we enable data checksums
by default.

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Re: Bug in numeric_power() if exponent is INT_MIN

Dean Rasheed  writes:
> The issue is in this line from power_var_int():
> sig_digits += (int) log(Abs(exp)) + 8;
> because "exp" is a signed int, so Abs(exp) leaves INT_MIN unchanged.
> The most straightforward fix is to use fabs() instead, so that "exp"
> is cast to double *before* the absolute value is taken, as in the
> attached patch.

Nice catch, and the fix seems appropriate.

regards, tom lane

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-04 Thread Pavel Stehule

po 4. 1. 2021 v 14:58 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Sun, Jan 03, 2021 at 08:41:17PM +0100, Pavel Stehule wrote:
> >
> > probably some is wrong still
> >
> > create table foo(a jsonb);
> > update foo set a['a'] = '10';
> > update foo set a['b']['c'][1] = '10';
> > update foo set a['b']['c'][10] = '10'
>
> Thanks for noticing. Indeed, there was a subtle change of meaning for
> 'done' flag in setPath, which I haven't covered. Could you try this
> version?
>

sure

postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['c']['c'][10] = '10';
UPDATE 1
postgres=# select * from foo;
┌┐
│   a
 │
╞╡
│ {"c": {"c": [null, null, null, null, null, null, null, null, null, null,
10]}} │
└┘
(1 row)

postgres=# update foo set a['c'][10][10] = '10';
WARNING:  problem in alloc set ExprContext: req size > alloc size for chunk
0x151b688 in block 0x151aa90
WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x151aa90, chunk 0x151b688
WARNING:  problem in alloc set ExprContext: bad size 0 for chunk 0x151b8a0
in block 0x151aa90
WARNING:  problem in alloc set ExprContext: bad single-chunk 0x151b8b8 in
block 0x151aa90
WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x151aa90, chunk 0x151b8b8
WARNING:  problem in alloc set ExprContext: found inconsistent memory block
0x151aa90
UPDATE 1

Re: Proposed patch for key management

On Sat, Jan  2, 2021 at 12:47:19PM +, Alastair Turner wrote:
> If keys can have arbitrary scope, then the pg backend won't know what
> to ask for. So the API becomes even simpler with no specific request
> on stdin and all the relevant keys on stdout. I generally like this
> approach as well, and it will be the only option for some
> integrations. On the other hand, there is an advantage to having the
> key retrieval piece of key management in-process - the keys are not
> being passed around in plain.
> 
> There is also a further validation task - probably beyond the scope of
> the key management patch and into the encryption patch[es] territory -
> checking that the keys supplied are the same keys in use for the data
> currently on disk. It feels to me like this should be done at startup,
> rather than as each file is accessed, which could make startup quite
> slow if there are a lot of keys with narrow scope.

We do that already on startup by using GCM to validate the  KEK when
encrypting each DEK.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Proposed patch for key management

On Sat, Jan  2, 2021 at 10:50:15AM +0100, Fabien COELHO wrote:
> > No, this isn't anything like the Kerberos model and there's no case
> > where the PG account won't have access to the DEK (data encryption key)
> > during normal operation (except with the possibility of off-loading to a
> > hardware crypto device which, while interesting, is definitely something
> > that's of very limited utility and which could be added on as a
> > capability later anyway.  This is also well understood by those who are
> > interested in this capability and it's perfectly acceptable that PG will
> > have, in memory, the DEK.
> 
> I'm sorry that I'm not very clear. My ranting is having a KEK "master key"
> in pg memory. I think I'm fine at this point with having DEKs available on
> the host to code/decode files. What I meant about kerberos is that the setup
> I was describing was making the database dependent on a remote host, which
> is somehow what keroberos does.

We bzero the master key once we are done with it in the server.  Why is
having the master key in memory for a short time while a problem while
having the data keys always in memory acceptable?  Aren't the data keys
more valuable, and hence they fact the master key is in memory for a
short time no additional risk?

> > The keys which are stored on disk with the proposed patch are encrypted
> > using a KEK which is external to PG- that's all part of the existing
> > patch and design.
> 
> Yep. My point is that the patch hardwires a cryptographic design with many
> assumptions, whereas I think it should design an API compatible with a
> larger class of designs, and possibly implement one with KEK and so on, I'm
> fine with that.

You are not addressing my complexity/fragility reasons for having a
single method.  As stated above, this feature has limited usefulness,
and if it breaks or you lose the KEK, your database server and backups
might be gone.

> > so it's unclear to me what the argument here is.
> 
> The argument is that professional cryptophers do wrong designs, and I do not
> see why you would do different. So instead of hardwiring choice that you
> think are THE good ones, ISTM that pg should design a reasonably flexible
> API, and also provide an implementation, obviously.

See above.

> > Further, we haven't even gotten to actual cluster encryption yet- this
> > is all just the key management side of things,
> 
> With hardwired choices about 1 KEK and 2 DEK that are debatable, see below.

Then let's debate them, since this patch isn't moving forward as long as
people are asking questions about the implementation.

> > > I'm not sure of the design of the key rotation stuff as I recall it from 
> > > the
> > > patches I looked at, but the API design looks like the more pressing 
> > > issue.
> > > First, the mere existence of an "master key" is a cryptographic choice 
> > > which
> > > is debatable, because it creates issues if one must be able to change it,
> > > hence the whole "key rotation" stuff. ISTM that what core needs to know is
> > > that one particular key is changed by a new one and the underlying file is
> > > rewritten. It should not need to know anything about master keys and key
> > > derivation at all. It should need to know that the user wants to change 
> > > file
> > > keys, though.
> > 
> > No, it's not debatable that a KEK is needed, I disagree entirely.
> 
> ISTM that there is cryptographic design which suits your needs and you seem
> to decide that it is the only possible way to do it It seems that you know.
> As a researcher, I know that I do not know:-) As a software engineer, I'm
> trying to suggest enabling more with the patch, including not hardwiring a
> three-key management scheme.

Well, your open API goal might work, but I am not comfortable
implementing that for reasons already explained, so what do we do?

> I'm advocating that you pg not decide for others what is best for them, but
> rather allow multiple choices and implementations through an API. Note that
> I'm not claiming that the 1 KEK + 2 DEK + AES… design is bad. I'm claiming
> that it is just one possible design, quite peculiar though, and that pg
> should not hardwire it, and it does not need to anyway.
> 
> The documentation included in the key management patch states: "Key zero is
> the key used to encrypt database heap and index files which are stored in
> the file system, plus temporary files created during database operation."
> Read as is, it suggests that the same key is used to encrypt many files.
> From a cryptographic point of view this looks like a bad idea. The mode used
> is unclear. If this is a stream cypher generated in counter mode, it could
> be a very bad idea. Hopefully this is not what is intended, but the included
> documentation is frankly misleading in that case. IMHO there should be one
> key per file. Also, the CTR mode should be avoided if possible because it
> has quite special properties, unless these properties are absolutely
>

Re: poc - possibility to write window function in PL languages

2021-01-04 Thread Zhihong Yu

Hi, Pavel:
Thanks for the update.

I don't have other comment.

Cheers

On Mon, Jan 4, 2021 at 3:15 AM Pavel Stehule 
wrote:

> Hi
>
> pá 1. 1. 2021 v 18:57 odesílatel Zhihong Yu  napsal:
>
>> Hi, Pavel:
>> Happy New Year.
>>
>> +   command with clause WINDOW. The specific feature of
>> +   this functions is a possibility to two special storages with
>>
>> this functions -> this function
>>
>> possibility to two special storages: there is no verb.
>>
>> 'store with stored one value': store is repeated.
>>
>> + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
>>
>> It would be better to change 2020 to 2021 in the new files.
>>
>
> fixed
>
>>
>> For some functions, such as windowobject_get_func_arg_frame, it would be
>> better to add comment explaining their purposes.
>>
>
> It is commented before. These functions just call WinAPI functions
>
> /*
>  * High level access function. These functions are wrappers for windows API
>  * for PL languages based on usage WindowObjectProxy.
>  */
>
>
>
>> For estimate_partition_context_size():
>> +errmsg("size of value is greather than limit (1024
>> bytes)")));
>>
>> Please include the value of typlen in the message. There is similar error
>> message in the else block where value of size should be included.
>>
>> +   return *realsize;
>> +   }
>> +   else
>>
>> The 'else' is not needed since the if block ends with return.
>>
>
> yes, but it is there for better readability (symmetry)
>
>>
>> +   size += size / 3;
>>
>> Please add a comment for the choice of constant 3.
>>
>> +   /* by default we allocate 30 bytes */
>> +   *realsize = 0;
>>
>> The value 30 may not be accurate - from the caller:
>>
>> +   if (PG_ARGISNULL(2))
>> +   minsize = VARLENA_MINSIZE;
>> +   else
>> +   minsize = PG_GETARG_INT32(2);
>>
>> VARLENA_MINSIZE is 32.
>>
>> Cheers
>>
>> On Fri, Jan 1, 2021 at 3:29 AM Pavel Stehule 
>> wrote:
>>
>>> Hi
>>>
>>> rebase
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>
> I am sending updated patch
>
> Thank you for comments
>
> Regards
>
> Pavel
>

Bug in numeric_power() if exponent is INT_MIN

2021-01-04 Thread Dean Rasheed

(Amusingly I only found this after discovering that Windows Calculator
has a similar bug which causes it to crash if you try to raise a
number to the power INT_MIN.)

On my machine, numeric_power() loses all precision if the exponent is
INT_MIN, though the actual failure mode might well be
platform-dependent:

SELECT n, 1.0123^n AS pow
  FROM (VALUES (-2147483647), (-2147483648), (-2147483649)) AS v(n);
  n  |pow
-+
 -2147483647 | 0.7678656557347558
 -2147483648 | 1.
 -2147483649 | 0.7678656555458609
(3 rows)

The issue is in this line from power_var_int():

sig_digits += (int) log(Abs(exp)) + 8;

because "exp" is a signed int, so Abs(exp) leaves INT_MIN unchanged.
The most straightforward fix is to use fabs() instead, so that "exp"
is cast to double *before* the absolute value is taken, as in the
attached patch.

This code was added in 7d9a4737c2, which first appeared in PG 9.6, so
barring objections, I'll push and back-patch this fix that far.

Regards,
Dean
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 68d8791..7cf5656
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -10290,7 +10290,7 @@ power_var_int(const NumericVar *base, in
 	 * to around log10(abs(exp)) digits, so work with this many extra digits
 	 * of precision (plus a few more for good measure).
 	 */
-	sig_digits += (int) log(Abs(exp)) + 8;
+	sig_digits += (int) log(fabs(exp)) + 8;
 
 	/*
 	 * Now we can proceed with the multiplications.
diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out
new file mode 100644
index 56e7799..30a5642
--- a/src/test/regress/expected/numeric.out
+++ b/src/test/regress/expected/numeric.out
@@ -2321,6 +2321,12 @@ select 0.12 ^ (-20);
  2608405330458882702.5529619561355838
 (1 row)
 
+select 1.0123 ^ (-2147483648);
+  ?column?  
+
+ 0.7678656556403084
+(1 row)
+
 -- cases that used to error out
 select 0.12 ^ (-25);
  ?column?  
diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql
new file mode 100644
index f19793a..db812c8
--- a/src/test/regress/sql/numeric.sql
+++ b/src/test/regress/sql/numeric.sql
@@ -1089,6 +1089,7 @@ select 3.789 ^ 21;
 select 3.789 ^ 35;
 select 1.2 ^ 345;
 select 0.12 ^ (-20);
+select 1.0123 ^ (-2147483648);
 
 -- cases that used to error out
 select 0.12 ^ (-25);

[PATCH]: Allow errors in parameter values to be reported during the BIND phase itself..

2021-01-04 Thread Justin Pryzby

For example:

$ python3.5 -c "import pg; db=pg.DB(); q = db.query(\"SET 
log_parameter_max_length_on_error=-1;\"); db.prepare('p', 'SELECT 
\$1::smallint'); db.query_prepared('p',6);"
2021-01-03 02:21:04.547 CST [20157] ERROR:  value "6" is out of range for 
type smallint
2021-01-03 02:21:04.547 CST [20157] CONTEXT:  unnamed portal with parameters: 
$1 = '6'
2021-01-03 02:21:04.547 CST [20157] STATEMENT:  SELECT $1::smallint

When there are many bind params, this can be useful to determine which is out
of range.  Think 900 int/smallint columns, or less-wide tables being inserted
multiple rows at a time with VALUES(),(),()...

Of course, this isn't as good as showing the column name, so I might pursue
Tom's suggestion for that at some point.

-- 
Justin
>From 40dd798d8518b4d113dbf70c37687c87bf734659 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 1 Jan 2021 19:38:57 -0600
Subject: [PATCH] Allow errors in parameter values to be reported during the
 BIND phase itself..

For example:
$ python3.5 -c "import pg,time; db=pg.DB('dbname=postgres host=/var/run/postgresql port=5432 host=/tmp'); q = db.query(\"SET log_parameter_max_length_on_error=-1;\"); db.prepare('p', 'SELECT \$1::smallint'); db.query_prepared('p',6);"
2021-01-01 19:40:24.777 CST [5330] ERROR:  value "6" is out of range for type smallint
2021-01-01 19:40:24.777 CST [5330] CONTEXT:  unnamed portal with parameters: $1 = '6'

This can be useful to determine which is out of range when there are many bind
params.

See also:
commit ba79cb5dc
commit 0b34e7d30
https://www.postgresql.org/message-id/flat/canfkh5k-6nnt-4csv1vpb80nq2bzczhfvr5o4vznybsx0wz...@mail.gmail.com
  Mention column name in error messages
---
 src/backend/tcop/postgres.c | 166 +++-
 1 file changed, 106 insertions(+), 60 deletions(-)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 317d1aa573..30fb86d97f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1765,9 +1765,109 @@ exec_bind_message(StringInfo input_message)
 	if (numParams > 0)
 	{
 		char	  **knownTextValues = NULL; /* allocate on first use */
+		int32		*plengths;
+		char	  **pstrings;
+
+		plengths = palloc0(numParams * sizeof(*plengths));
+
+		/* Indexed by paramno, but not used for nulls */
+		pstrings = palloc0(numParams * sizeof(*pstrings));
 
 		params = makeParamList(numParams);
 
+		/*
+		 * Do two loops to allow error reports during BIND;
+		 * On the first pass, just save the offset and length of each param;
+		 * On the second pass, convert the params to the require type.
+		 * If this fails, the params have already been saved for error callack.
+		 */
+
+		for (int paramno = 0; paramno < numParams; paramno++)
+		{
+			int32		plength;
+			int16		pformat;
+			char	   *pstring;
+			MemoryContext oldcxt;
+
+			plength = plengths[paramno] = pq_getmsgint(input_message, 4);
+
+			/* Minimal amount needed for error callback */
+			params->params[paramno].ptype = psrc->param_types[paramno];;
+			params->params[paramno].isnull = (plength == -1);
+
+			if (params->params[paramno].isnull)
+continue;
+
+			pstrings[paramno] = unconstify(char *, pq_getmsgbytes(input_message, plength));
+
+			if (log_parameter_max_length_on_error == 0)
+continue;
+
+			if (numPFormats > 1)
+pformat = pformats[paramno];
+			else if (numPFormats > 0)
+pformat = pformats[0];
+			else
+pformat = 0;	/* default = text */
+
+			if (pformat != 0)	/* text mode */
+continue;
+
+			/*
+			 * If we might need to log parameters later, save a copy of
+			 * the converted string in MessageContext;
+			 * XXX: only supported in text mode ??
+			 */
+			pstring = pg_client_to_server(pstrings[paramno], plength);
+			oldcxt = MemoryContextSwitchTo(MessageContext);
+
+			if (knownTextValues == NULL)
+knownTextValues =
+	palloc0(numParams * sizeof(char *));
+
+			if (log_parameter_max_length_on_error < 0)
+knownTextValues[paramno] = pstrdup(pstring);
+			else
+			{
+/*
+ * We can trim the saved string, knowing that we
+ * won't print all of it.  But we must copy at
+ * least two more full characters than
+ * BuildParamLogString wants to use; otherwise it
+ * might fail to include the trailing ellipsis.
+ */
+knownTextValues[paramno] =
+	pnstrdup(pstring,
+			 log_parameter_max_length_on_error
+			 + 2 * MAX_MULTIBYTE_CHAR_LEN);
+			}
+
+			MemoryContextSwitchTo(oldcxt);
+			if (pstring != pstrings[paramno])
+pfree(pstring);
+		}
+
+		/*
+		 * Once all parameters have been received, prepare for printing them
+		 * in errors, if configured to do so.  (This is saved in the portal,
+		 * so that they'll appear when the query is executed later.)
+		 */
+		if (log_parameter_max_length_on_error != 0)
+		{
+			params->paramValuesStr =
+BuildParamLogString(params,
+	knownTextValues,
+	log_parameter_max_length_on_error);
+		}
+
+		/* Set the error

Re: set_config() documentation clarification

2021-01-04 Thread David G. Johnston

On Mon, Jan 4, 2021 at 8:26 AM Joel Jacobson  wrote:

> In the documentation at
> https://www.postgresql.org/docs/current/functions-admin.html
> this behaviour is not mentioned anywhere as far as I can see:
>
>
https://www.postgresql.org/docs/current/runtime-config-custom.html

I do think "Customized Options" is an accurate description.  One needs to
consider their usage of such variables as "modular" even though there is no
library code involved.  i.e., the description is possibly overly specific
but the variables created in this manner are simply namespaced but
otherwise usable in any manner you deem fit.

David J.

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-04 Thread Victor Yegorov

чт, 31 дек. 2020 г. в 03:55, Peter Geoghegan :

> Attached is v12, which fixed bitrot against the master branch. This
> version has significant comment and documentation revisions. It is
> functionally equivalent to v11, though.
>
> I intend to commit the patch in the next couple of weeks. While it
> certainly would be nice to get a more thorough review, I don't feel
> that it is strictly necessary. The patch provides very significant
> benefits with certain workloads that have traditionally been
> considered an Achilles' heel for Postgres. Even zheap doesn't provide
> a solution to these problems. The only thing that I can think of that
> might reasonably be considered in competition with this design is
> WARM, which hasn't been under active development since 2017 (I assume
> that it has been abandoned by those involved).
>

I've looked through the v12 patch.

I like the new outline:

- _bt_delete_or_dedup_one_page() is the main entry for the new code
- first we try _bt_simpledel_pass() does improved cleanup of LP_DEAD entries
- then (if necessary) _bt_bottomupdel_pass() for bottomup deletion
- finally, we perform _bt_dedup_pass() to deduplication

We split the leaf page only if all the actions above failed to provide
enough space.

Some comments on the code.

v12-0001


1. For the following comment

+* Only do this for key columns.  A change to a non-key column within an
+* INCLUDE index should not be considered because that's just payload to
+* the index (they're not unlike table TIDs to the index AM).

The last part of it (in the parenthesis) is difficult to grasp due to
the double negation (not unlike). I think it's better to rephrase it.

2. After reading the patch, I also think, that fact, that
index_unchanged_by_update()
and index_unchanged_by_update_var_walker() return different bool states
(i.e. when the latter returns true, the first one returns false) is a bit
misleading.

Although logic as it is looks fine, maybe
index_unchanged_by_update_var_walker()
can be renamed to avoid this confusion, to smth like
index_expression_changed_walker() ?

v12-0002


1. Thanks for the comments, they're well made and do help to read the code.

2. I'm not sure the bottomup_delete_items parameter is very helpful. In
order to disable
bottom-up deletion, DBA needs somehow to measure it's impact on a
particular index.
Currently I do not see how to achieve this. Not sure if this is overly
important, though, as
you have a similar parameter for the deduplication.

3. It feels like indexUnchanged is better to make indexChanged and negate
its usage in the code.
As !indexChanged reads more natural than !indexUnchanged, at least to
me.

This is all I have. I agree, that this code is pretty close to being
committed.

Now for the tests.

First, I run a 2-hour long case with the same setup as I used in my e-mail
from 15 of November.
I found no difference between patch and master whatsoever. Which makes me
think, that current
master is quite good at keeping better bloat control (not sure if this is
an effect of
4228817449 commit or deduplication).

I created another setup (see attached testcases). Basically, I emulated
queue operations(INSERT at the end and DELETE



-- 
Victor Yegorov

Re: Safety/validity of resetting permissions by updating system tables

2021-01-04 Thread Isaac Morland

On Mon, 4 Jan 2021 at 10:12, Andrew Dunstan  wrote:

>
> On 1/1/21 11:44 AM, Tom Lane wrote:
> > Isaac Morland  writes:
> >> Is it safe and valid to reset to default permissions by doing
> >> UPDATE pg_namespace/pg_class/pg_type/pg_proc
> >> SET nspacl/relacl/typacl/proacl = NULL WHERE ... to accomplish this?
> > Not terribly; the main objection is you'd fail to update pg_shdepend.
>
> And apart from that I'm generally resistant to anything that requires
> direct manipulation of the catalog. One of many reasons is that there is
> no guarantee that it will have the same shape in the next release. I
> normally encourage people strongly to look for other solutions.
>

So am I. That's why I asked before proceeding.

As far as I can tell, it is not possible to fully reset permissions using
GRANT/REVOKE even querying the system tables to figure out which
permissions exist; the closest one can get is to set explicit (non-NULL)
acls that have the same effect as the default (NULL) acls; and doing so
requires duplicating the logic used within the system to determine the
permissions that apply to an object with a blank (NULL) acl.

Re: PoC/WIP: Extended statistics on expressions

2021-01-04 Thread Justin Pryzby

On Mon, Jan 04, 2021 at 03:34:08PM +, Dean Rasheed wrote:
> * I'm not sure I understand the need for 0001. Wasn't there an earlier
> version of this patch that just did it by re-populating the type
> array, but which still had it as an array rather than turning it into
> a list? Making it a list falsifies some of the comments and
> function/variable name choices in that file.

This part is from me.

I can review the names if it's desired , but it'd be fine to fall back to the
earlier patch.  I thought a pglist was cleaner, but it's not needed.

-- 
Justin

Re: set_config() documentation clarification

2021-01-04 Thread Chapman Flack

On 01/04/21 10:25, Joel Jacobson wrote:
> I just learned from a pg friend, it's possible to set your own made up 
> setting_name using set_config(),
> if the setting_name contains at least one dot ".".

It works that way so you can set a config variable needed by an extension,
if necessary before the extension is loaded, so PostgreSQL doesn't know yet
what variable names with that prefix are or aren't valid. Once the extension
that defines the prefix (before the first dot) is loaded, it will handle
any variables with that prefix that it recognizes, and errors will be
reported for any others.

I'm not sure how much of that behavior needs to be documented for
set_config() itself; it gets a little deep into the weeds of extension
development. Is it documented there? In that case, maybe the briefest
of mentions at set_config() would be appropriate, such as "names starting
with a prefix and a dot can be treated specially, as described at [link]".

Regards,
-Chap

Re: PoC/WIP: Extended statistics on expressions

2021-01-04 Thread Dean Rasheed

On Fri, 11 Dec 2020 at 20:17, Tomas Vondra
 wrote:
>
> OK. Attached is an updated version, reworking it this way.

Cool. I think this is an exciting development, so I hope it makes it
into the next release.

I have started looking at it. So far I have only looked at the
catalog, parser and client changes, but I thought it's worth posting
my comments so far.

> I tried tweaking the grammar to differentiate these two syntax variants,
> but that led to shift/reduce conflicts with the existing ones. I tried
> fixing that, but I ended up doing that in CreateStatistics().

Yeah, that makes sense. I wasn't expecting the grammar to change.

> The other thing is that we probably can't tie this to just MCV, because
> functional dependencies need the per-expression stats too. So I simply
> build expression stats whenever there's at least one expression.

Makes sense.

> I also decided to keep the "expressions" statistics kind - it's not
> allowed to specify it in CREATE STATISTICS, but it's useful internally
> as it allows deciding whether to build the stats in a single place.
> Otherwise we'd need to do that every time we build the statistics, etc.

Yes, I thought that would be the easiest way to do it. Essentially the
"expressions" stats kind is an internal implementation detail, hidden
from the user, because it's built automatically when required, so you
don't need to (and can't) explicitly ask for it. This new behaviour
seems much more logical to me.

> I added a brief explanation to the sgml docs, not sure if that's good
> enough - maybe it needs more details.

Yes, I think that could use a little tidying up, but I haven't looked
too closely yet.

Some other comments:

* I'm not sure I understand the need for 0001. Wasn't there an earlier
version of this patch that just did it by re-populating the type
array, but which still had it as an array rather than turning it into
a list? Making it a list falsifies some of the comments and
function/variable name choices in that file.

* There's a comment typo in catalog/Makefile -- "are are reputedly
other...", should be "there are reputedly other...".

* Looking at the pg_stats_ext view, I think perhaps expressions stats
should be omitted entirely from that view, since it doesn't show any
useful information about them. So it could remove "e" from the "kinds"
array, and exclude rows whose only kind is "e", since such rows have
no interesting data in them. Essentially, the new view
pg_stats_ext_exprs makes having any expression stats in pg_stats_ext
redundant. Hiding this data in pg_stats_ext would also be consistent
with making the "expressions" stats kind hidden from the user.

* In gram.y, it wasn't quite obvious why you converted the column list
for CREATE STATISTICS from an expr_list to a stats_params list. I
figured it out, and it makes sense, but I think it could use a
comment, perhaps something along the lines of the one for index_elem,
e.g.:

/*
 * Statistics attributes can be either simple column references, or arbitrary
 * expressions in parens.  For compatibility with index attributes permitted
 * in CREATE INDEX, we allow an expression that's just a function call to be
 * written without parens.
 */

* In parse_func.c and parse_agg.c, there are a few new error strings
that use the abbreviation "stats expressions", whereas most other
errors refer to "statistics expressions". For consistency, I think
they should all be the latter.

* In generateClonedExtStatsStmt(), I think the "expressions" stats
kind needs to be explicitly excluded, otherwise CREATE TABLE (LIKE
...) fails if the source table has expression stats.

* CreateStatistics() uses ShareUpdateExclusiveLock, but in
tcop/utility.c the relation is opened with a ShareLock. ISTM that the
latter lock mode should be made to match CreateStatistics().

* Why does the new code in tcop/utility.c not use
RangeVarGetRelidExtended together with RangeVarCallbackOwnsRelation?
That seems preferable to doing the ACL check in CreateStatistics().
For one thing, as it stands, it allows the lock to be taken even if
the user doesn't own the table. Is it intentional that the current
code allows extended stats to be created on system catalogs? That
would be one thing that using RangeVarCallbackOwnsRelation would
change, but I can't see a reason to allow it.

* In src/bin/psql/describe.c, I think the \d output should also
exclude the "expressions" stats kind and just list the other kinds (or
have no kinds list at all, if there are no other kinds), to make it
consistent with the CREATE STATISTICS syntax.

* The pg_dump output for a stats object whose only kind is
"expressions" is broken -- it includes a spurious "()" for the kinds
list.

That's it for now. I'll look at the optimiser changes next, and try to
post more comments later this week.

Regards,
Dean

set_config() documentation clarification

2021-01-04 Thread Joel Jacobson

I just learned from a pg friend, it's possible to set your own made up 
setting_name using set_config(),
if the setting_name contains at least one dot ".".

I didn't know about this and have always thought it's not possible,
due to early in my career having run into the error message
you get when not having a dot in the name:

SELECT set_config('foobar','test',true);
ERROR:  unrecognized configuration parameter "foobar"

If instead having a dot in the name, it works:

SELECT set_config('foo.bar','test',true);
set_config

test
(1 row)


In the documentation at 
https://www.postgresql.org/docs/current/functions-admin.html
this behaviour is not mentioned anywhere as far as I can see:

"set_config ( setting_name text, new_value text, is_local boolean ) → text

Sets the parameter setting_name to new_value, and returns that value. If 
is_local is true, the new value will only apply for the current transaction. If 
you want the new value to apply for the current session, use false instead. 
This function corresponds to the SQL command SET."

Perhaps a paragraph where this is clarified should be added?

Kind regards,

Joel

Re: Safety/validity of resetting permissions by updating system tables

2021-01-04 Thread Andrew Dunstan

On 1/1/21 11:44 AM, Tom Lane wrote:
> Isaac Morland  writes:
>> Is it safe and valid to reset to default permissions by doing
>> UPDATE pg_namespace/pg_class/pg_type/pg_proc
>> SET nspacl/relacl/typacl/proacl = NULL WHERE ... to accomplish this?
> Not terribly; the main objection is you'd fail to update pg_shdepend.

And apart from that I'm generally resistant to anything that requires
direct manipulation of the catalog. One of many reasons is that there is
no guarantee that it will have the same shape in the next release. I
normally encourage people strongly to look for other solutions.

cheers

andrew

-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: VACUUM (DISABLE_PAGE_SKIPPING on)

On Tue, Dec 1, 2020 at 10:45 AM Masahiko Sawada  wrote:
>
> On Fri, Nov 20, 2020 at 8:47 PM Simon Riggs  wrote:
> >
> > On Fri, 20 Nov 2020 at 10:15, Simon Riggs  wrote:
> > >
> > > On Fri, 20 Nov 2020 at 01:40, Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Thu, Nov 19, 2020 at 8:02 PM Simon Riggs  
> > > > wrote:
> > > > >
> > > > > On Wed, 18 Nov 2020 at 17:59, Robert Haas  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Nov 18, 2020 at 12:54 PM Simon Riggs 
> > > > > >  wrote:
> > > > > > > Patches attached.
> > > > > > > 1. vacuum_anti_wraparound.v2.patch
> > > > > > > 2. vacuumdb_anti_wrap.v1.patch - depends upon (1)
> > > > > >
> > > > > > I don't like the use of ANTI_WRAPAROUND as a name for this new 
> > > > > > option.
> > > > > > Wouldn't it make more sense to call it AGGRESSIVE? Or maybe 
> > > > > > something
> > > > > > else, but I dislike anti-wraparound.
> > > > >
> > > > > -1 for using the term AGGRESSIVE, which seems likely to offend people.
> > > > > I'm sure a more descriptive term exists.
> > > >
> > > > Since we use the term aggressive scan in the docs, I personally don't
> > > > feel unnatural about that. But since this option also disables index
> > > > cleanup when not enabled explicitly, I’m concerned a bit if user might
> > > > get confused. I came up with some names like FEEZE_FAST and
> > > > FREEZE_MINIMAL but I'm not sure these are better.
> > >
> > > FREEZE_FAST seems good.
> > >
> > > > BTW if this option also disables index cleanup for faster freezing,
> > > > why don't we disable heap truncation as well?
> > >
> > > Good idea
> >
> > Patch attached, using the name "FAST_FREEZE" instead.
> >
>
> Thank you for updating the patch.
>
> Here are some comments on the patch.
>
> 
> -   if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
> +   if (params->options & VACOPT_DISABLE_PAGE_SKIPPING ||
> +   params->options & VACOPT_FAST_FREEZE)
>
> I think we need to update the following comment that is above this
> change as well:
>
> /*
>  * We request an aggressive scan if the table's frozen Xid is now older
>  * than or equal to the requested Xid full-table scan limit; or if the
>  * table's minimum MultiXactId is older than or equal to the requested
>  * mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was specified.
>  */
>
> This mentions only DISABLE_PAGE_SKIPPING now. Or the second idea is to
> set both params.freeze_table_age and params.multixact_freeze_table_age
> to 0 at ExecVacuum() instead of getting aggressive turned on here.
> Considering the consistency between FREEZE and FREEZE_FAST, we might
> want to take the second option.
>
> ---
> +   if (fast_freeze &&
> +   params.index_cleanup == VACOPT_TERNARY_DEFAULT)
> +   params.index_cleanup = VACOPT_TERNARY_DISABLED;
> +
> +   if (fast_freeze &&
> +   params.truncate == VACOPT_TERNARY_DEFAULT)
> +   params.truncate = VACOPT_TERNARY_DISABLED;
> +
> +   if (fast_freeze && freeze)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("cannot specify both FREEZE and FAST_FREEZE
> options on VACUUM")));
> +
>
> I guess that you disallow enabling both FREEZE and FAST_FREEZE because
> it's contradictory, which makes sense to me. But it seems to me that
> enabling FAST_FREEZE, INDEX_CLEANUP, and TRUNCATE is also
> contradictory because it will no longer be “fast”. The purpose of this
> option to advance relfrozenxid as fast as possible by disabling index
> cleanup, heap truncation etc. Is there any use case where a user wants
> to enable these options (FAST_FREEZE, INDEX_CLEANUP, and TRUNCATE) at
> the same time? If not, probably it’s better to either disallow it or
> have FAST_FREEZE overwrites these two settings even if the user
> specifies them explicitly.
>

I sent some review comments a month ago but the patch was marked as
"needs review”, which was incorrect So I think "waiting on author" is
a more appropriate state for this patch. I'm switching the patch as
"waiting on author".

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: libpq debug log

Iwata-san,

On Mon, Dec 21, 2020 at 5:20 PM k.jami...@fujitsu.com
 wrote:
>
> On Tuesday, December 15, 2020 8:12 PM, Iwata-san wrote:
> > > There are some things still to do:
> > I worked on some to do.
>
> Hi Iwata-san,
>
> Thank you for updating the patch.
> I would recommend to register this patch in the upcoming commitfest
> to help us keep track of it. I will follow the thread to provide more
> reviews too.
>
> > > 1. Is the handling of protocol 2 correct?
> > Protocol 3.0 is implemented in PostgreSQL v7.4 or later. Therefore, most
> > servers and clients today want to connect using 3.0.
> > Also, wishful thinking, I think Protocol 2.0 will no longer be supported. 
> > So I
> > didn't develop it aggressively.
> > Another reason I'm concerned about implementing it would make the code
> > less maintainable.
>
> Though I have also not tested it with 2.0 server yet, do I have to download 
> 7.3
> version to test that isn't it? Silly question, do we still want to have this
> feature for 2.0?
> I understand that protocol 2.0 is still supported, but it is only used for
> PostgreSQL versions 7.3 and earlier, which is not updated by fixes anymore
> since we only backpatch up to previous 5 versions. However I am not sure if
> it's a good idea, but how about if we just support this feature for protocol 
> 3.
> There are similar chunks of code in fe-exec.c, PQsendPrepare(), 
> PQsendDescribe(),
> etc. that already do something similar, so I just thought in hindsight if we 
> can
> do the same.
> if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
> {
> 
> }
> else
> {
> /* Protocol 2.0 isn't supported */
>  printfPQExpBuffer(>errorMessage,
>libpq_gettext("function requires at least protocol 
> version 3.0\n"));
>  return 0;
> }
>
> But if it's necessary to provide this improved trace output for 2.0 servers, 
> then ignore what
> I suggested above, and although difficult I think we should test for protocol 
> 2.0 in older server.
>
> Some minor code comments I noticed:
> 1.
> +   LOG_FIRST_BYTE, /* logging the first byte 
> identifing the
> +* protocol 
> message type */
>
> "identifing" should be "identifying". And closing */ should be on 3rd line.
>
> 2.
> +   case LOG_CONTENTS:
> +   /* Suppresses printing terminating zero byte 
> */
>
> --> Suppress printing of terminating zero byte
>

The patch got some review comments a couple weeks ago but the patch
was still marked as "needs review", which was incorrect. Also cfbot[1]
is unhappy with this patch. So I'm switching the patch as "waiting on
author".

Regards,

[1] http://commitfest.cputube.org/


--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: WIP: System Versioned Temporal Table

Hi Surafel,

On Tue, Dec 22, 2020 at 3:01 AM Surafel Temesgen  wrote:
>
> Hi Ryan,
>
> On Fri, Dec 18, 2020 at 10:28 PM Ryan Lambert  wrote:
>>
>> On Thu, Nov 19, 2020 at 11:04 AM Surafel Temesgen  
>> wrote:
>>
>> The docs have two instances of "EndtTime" that should be "EndTime".
>
>
> Since my first language is not english i'm glad you find only this error
> on doc. I will send rebased pach soon
>

The patch is not submitted yet. Are you planning to submit the updated
patch? Please also note the v7 patch cannot be applied to the current
HEAD. I'm switching the patch as Waiting on Author.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-04 Thread Dmitry Dolgov

> On Sun, Jan 03, 2021 at 08:41:17PM +0100, Pavel Stehule wrote:
>
> probably some is wrong still
>
> create table foo(a jsonb);
> update foo set a['a'] = '10';
> update foo set a['b']['c'][1] = '10';
> update foo set a['b']['c'][10] = '10'

Thanks for noticing. Indeed, there was a subtle change of meaning for
'done' flag in setPath, which I haven't covered. Could you try this
version?
>From c9143a620497dac5615c4de1d9349684e9af95b5 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 18 Dec 2020 17:19:51 +0100
Subject: [PATCH v42 1/2] Subscripting for jsonb

Subscripting implementation for jsonb. It does not support slices, does
not have a limit for number of subscripts and for assignment expects a
replace value to be of jsonb type. There is also one functional
difference in assignment via subscripting from jsonb_set, when an
original jsonb container is NULL, subscripting replaces it with an empty
jsonb and proceed with assignment.

For the sake of code reuse, some parts of jsonb functionality were
rearranged to allow use the same functions for jsonb_set and assign
subscripting operation.

The original idea belongs to Oleg Bartunov.

Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule
---
 doc/src/sgml/json.sgml  |  48 
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/jsonb_util.c  |  76 -
 src/backend/utils/adt/jsonbsubs.c   | 413 
 src/backend/utils/adt/jsonfuncs.c   | 180 ++--
 src/include/catalog/pg_proc.dat |   4 +
 src/include/catalog/pg_type.dat |   3 +-
 src/include/utils/jsonb.h   |   6 +-
 src/test/regress/expected/jsonb.out | 272 +-
 src/test/regress/sql/jsonb.sql  |  84 +-
 10 files changed, 982 insertions(+), 105 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonbsubs.c

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 5b9a5557a4..100d1a60f4 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -602,6 +602,54 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu
   
  
 
+ 
+  jsonb Subscripting
+  
+   jsonb data type supports array-style subscripting expressions
+   to extract or update particular elements. It's possible to use multiple
+   subscripting expressions to extract nested values. In this case, a chain of
+   subscripting expressions follows the same rules as the
+   path argument in jsonb_set function,
+   e.g. in case of arrays it is a 0-based operation or that negative integers
+   that appear in path count from the end of JSON arrays.
+   The result of subscripting expressions is always jsonb data type. An
+   example of subscripting syntax:
+
+-- Extract value by key
+SELECT ('{"a": 1}'::jsonb)['a'];
+
+-- Extract nested value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract element by index
+SELECT ('[1, "2", null]'::jsonb)[1];
+
+-- Update value by key, note the single quotes - the assigned value
+-- needs to be of jsonb type as well
+UPDATE table_name SET jsonb_field['key'] = '1';
+
+-- Select records using where clause with subscripting. Since the result of
+-- subscripting is jsonb and we basically want to compare two jsonb objects, we
+-- need to put the value in double quotes to be able to convert it to jsonb.
+SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"';
+
+
+  Subscripting for jsonb does not support slice expressions,
+  even if it contains an array.
+
+  In case if source jsonb is NULL, assignment
+  via subscripting will proceed as if it was an empty JSON object:
+
+-- If jsonb_field here is NULL, the result is {"a": 1}
+UPDATE table_name SET jsonb_field['a'] = '1';
+
+-- If jsonb_field here is NULL, the result is [1]
+UPDATE table_name SET jsonb_field[0] = '1';
+
+
+  
+ 
+
  
   Transforms
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 82732146d3..279ff15ade 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -50,6 +50,7 @@ OBJS = \
 	jsonb_op.o \
 	jsonb_util.o \
 	jsonfuncs.o \
+	jsonbsubs.o \
 	jsonpath.o \
 	jsonpath_exec.o \
 	jsonpath_gram.o \
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 4eeffa1424..41a1c1f9bb 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -68,18 +68,29 @@ static JsonbValue *pushJsonbValueScalar(JsonbParseState **pstate,
 		JsonbIteratorToken seq,
 		JsonbValue *scalarVal);
 
+JsonbValue *
+JsonbToJsonbValue(Jsonb *jsonb)
+{
+	JsonbValue *val = (JsonbValue *) palloc(sizeof(JsonbValue));
+
+	val->type = jbvBinary;
+	val->val.binary.data = >root;
+	val->val.binary.len = VARSIZE(jsonb) - VARHDRSZ;
+
+	return val;
+}
+
 /*
  * Turn an in-memory JsonbValue into a Jsonb for on-disk storage.
  *
- * There isn't a JsonbToJsonbValue(), because generally we find it more
- * convenient to directly iterate through the Jsonb

Re: Parallel Inserts in CREATE TABLE AS

On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming  wrote:
> On 04-01-2021 12:16, Hou, Zhijie wrote:
> >> 
> >> wrt v18-0002patch:
> >>
> >> It looks like this introduces a state machine that goes like:
> >> - starts at CTAS_PARALLEL_INS_UNDEF
> >> - possibly moves to CTAS_PARALLEL_INS_SELECT
> >> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> >> - if both were added at some stage, we can go to
> >> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> >>
> >> what i'm wondering is why you opted to put logic around
> >> generate_useful_gather_paths and in cost_gather when to me it seems more
> >> logical to put it in create_gather_path? i'm probably missing something
> >> there?
> >
> > IMO, The reason is we want to make sure we only ignore the cost when Gather 
> > is the top node.
> > And it seems the generate_useful_gather_paths called in 
> > apply_scanjoin_target_to_paths is the right place which can only create top 
> > node Gather.
> > So we change the flag in apply_scanjoin_target_to_paths around 
> > generate_useful_gather_paths to identify the top node.

Right. We wanted to ignore parallel tuple cost for only the upper Gather path.

> I was wondering actually if we need the state machine. Reason is that as
> AFAICS the code could be placed in create_gather_path, where you can
> also check if it is a top gather node, whether the dest receiver is the
> right type, etc? To me that seems like a nicer solution as its makes
> that all logic that decides whether or not a parallel CTAS is valid is
> in a single place instead of distributed over various places.

IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.

I may be wrong. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: row filtering for logical replication

2021-01-04 Thread Önder Kalacı

Hi Masahiko,


>
> You sent in your patch to pgsql-hackers on Dec 17, but you did not
> post it to the next CommitFest[1] (I found the old entry of this
> patch[2] but it's marked as "Returned with feedback"). If this was
> intentional, then you need to take no action.  However, if you want
> your patch to be reviewed as part of the upcoming CommitFest, then you
> need to add it yourself before 2021-01-01 AoE[3]. Thanks for your
> contributions.
>
>
Thanks for letting me know of this, I added this patch to the next commit
fest before 2021-01-01 AoE[3].

I'm also attaching the updated commits so that the tests pass on the CI.

Thanks,
Onder KALACI
Software Engineer at Microsoft &
Developing the Citus database extension for PostgreSQL


0001-Subject-PATCH-1-8-Remove-unused-atttypmod-column-fro.patch
Description: Binary data


0004-Subject-PATCH-4-8-Rename-a-WHERE-node.patch
Description: Binary data


0002-Subject-PATCH-2-8-Store-number-of-tuples-in-WalRcvEx.patch
Description: Binary data


0003-Subject-PATCH-3-8-Refactor-function-create_estate_fo.patch
Description: Binary data


0005-Subject-PATCH-5-8-Row-filtering-for-logical-replicat.patch
Description: Binary data


0006-Subject-PATCH-6-8-Print-publication-WHERE-condition-.patch
Description: Binary data


0007-Publication-where-condition-support-for-pg_dump.patch
Description: Binary data


commands_to_perf_test.sql
Description: Binary data

[Bug Fix] Logical replication on partition table is very slow and CPU is 99%

2021-01-04 Thread ????

Logical replication on partition table is very slow and CPU is 99%.




To reproduce this problem,




pg1 for publish

pgbench -i -s 1000

create table pgbench_accounts_copy(aid integer, bid integer, abalance integer, 
filler character(84)) partition by range (aid);

create table pgbench_accounts_copy_a0 partition of pgbench_accounts_copy for 
values from (0) to (1000);

create table pgbench_accounts_copy_a1 partition of pgbench_accounts_copy for 
values from (1000) to (2000);

create table pgbench_accounts_copy_a2 partition of pgbench_accounts_copy for 
values from (2000) to (3000);

create table pgbench_accounts_copy_a3 partition of pgbench_accounts_copy for 
values from (3000) to (4000);

create table pgbench_accounts_copy_a4 partition of pgbench_accounts_copy for 
values from (4000) to (5000);

create table pgbench_accounts_copy_a5 partition of pgbench_accounts_copy for 
values from (5000) to (6000);

create table pgbench_accounts_copy_a6 partition of pgbench_accounts_copy for 
values from (6000) to (7000);

create table pgbench_accounts_copy_a7 partition of pgbench_accounts_copy for 
values from (7000) to (8000);

create table pgbench_accounts_copy_a8 partition of pgbench_accounts_copy for 
values from (8000) to (9000);

create table pgbench_accounts_copy_a9 partition of pgbench_accounts_copy for 
values from (9000) to (1);

create publication my_publication for table pgbench_accounts_copy with 
(publish_via_partition_root = true);




pg2 for subscribe

create table pgbench_accounts_copy(aid integer, bid integer, abalance integer, 
filler character(84)) partition by range (aid);

create table pgbench_accounts_copy_a0 partition of pgbench_accounts_copy for 
values from (0) to (1000);

create table pgbench_accounts_copy_a1 partition of pgbench_accounts_copy for 
values from (1000) to (2000);

create table pgbench_accounts_copy_a2 partition of pgbench_accounts_copy for 
values from (2000) to (3000);

create table pgbench_accounts_copy_a3 partition of pgbench_accounts_copy for 
values from (3000) to (4000);

create table pgbench_accounts_copy_a4 partition of pgbench_accounts_copy for 
values from (4000) to (5000);

create table pgbench_accounts_copy_a5 partition of pgbench_accounts_copy for 
values from (5000) to (6000);

create table pgbench_accounts_copy_a6 partition of pgbench_accounts_copy for 
values from (6000) to (7000);

create table pgbench_accounts_copy_a7 partition of pgbench_accounts_copy for 
values from (7000) to (8000);

create table pgbench_accounts_copy_a8 partition of pgbench_accounts_copy for 
values from (8000) to (9000);

create table pgbench_accounts_copy_a9 partition of pgbench_accounts_copy for 
values from (9000) to (1);

create subscription sub1 CONNECTION 'host= port= user=postgres 
dbname=postgres' publication my_publication;




On pg1

insert into pgbench_accounts_copy select * from pgbench_accounts where aid0 
and aid<1;




The replication will be very slow, it lasts for 5 days in my environment and 
will continue. What's more, the CPU is 99% but actually do nothing.




In fact, there is a hidden bug when replication on partition table. When we 
publish partition tables via root table, the reference of root table on current 
owner is added, but the decrement of reference is missed. When the reference is 
large enough, It takes so much time and CPU to re-hash and resolve hash 
collision.




This patch adds the missed decrement to resolve the problem.




Previous discussion is 
here:https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_ejmoo...@mail.gmail.com
And I believe patch #83fd453 introduce this problem.




Thanks,

Mark Zhao

0001-add-missed-RelationClose-after-RelationIdGetRelation.patch
Description: Binary data

Re: Parallel Inserts in CREATE TABLE AS

On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming  wrote:
> Sorry it took so long to get back to reviewing this.

Thanks for the comments.

> wrt v18-0001patch:
>
> +   /*
> +* If the worker is for parallel insert in CTAS, then use
the proper
> +* dest receiver.
> +*/
> +   intoclause = (IntoClause *) stringToNode(intoclausestr);
> +   receiver = CreateIntoRelDestReceiver(intoclause);
> +   ((DR_intorel *)receiver)->is_parallel_worker = true;
> +   ((DR_intorel *)receiver)->object_id = fpes->objectid;
> I would move this into a function called e.g.
> GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
> createas.c.
> I would then also split up intorel_startup into intorel_leader_startup
> and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
> self->pub.rStartup to intorel_worker_startup.

My intention was to not add any new APIs to the dest receiver. I simply
made the changes in intorel_startup, in which for workers it just does the
minimalistic work and exit from it. In the leader most of the table
creation and sanity check is kept untouched. Please have a look at the v19
patch posted upthread [1].

> +   volatile pg_atomic_uint64   *processed;
> why is it volatile?

Intention is to always read from the actual memory location. I referred it
from the way pg_atomic_fetch_add_u64_impl,
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their u32
counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.

> +   if (isctas)
> +   {
> +   intoclause = ((DR_intorel *)
node->dest)->into;
> +   objectid = ((DR_intorel *)
node->dest)->object_id;
> +   }
> Given that you extract them each once and then pass them directly into
> the parallel-worker, can't you instead pass in the destreceiver and
> leave that logic to ExecInitParallelPlan?

That's changed entirely in the v19 patch set posted upthread [1]. Please
have a look. I didn't pass the dest receiver, to keep the API generic, I
passed parallel insert command type and a void * ptr which points to
insertion command because the information we pass to workers depends on the
insertion command (for instance, the information needed by workers is for
CTAS into clause and object id and for Refresh Mat View object id).

>
> +   if
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> +   ((DR_intorel *)
gstate->dest)->into->rel &&
> +   ((DR_intorel *)
gstate->dest)->into->rel->relname)
> why would rel and relname not be there? if no rows have been inserted?
> because it seems from the intorel_startup function that that would be
> set as soon as startup was done, which i assume (wrongly?) is always done?

Actually, that into clause rel variable is always being set in the gram.y
for CTAS, Create Materialized View and SELECT INTO (because qualified_name
non-terminal is not optional). My bad. I just added it as a sanity check.
Actually, it's not required.

create_as_target:
*qualified_name* opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
*$$->rel = $1;*
create_mv_target:
*qualified_name* opt_column_list table_access_method_clause
opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
*$$->rel = $1;*
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
*   $$->rel = $2;*

I will change the below code:
+if (GetParallelInsertCmdType(gstate->dest) ==
+PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+((DR_intorel *) gstate->dest)->into &&
+((DR_intorel *) gstate->dest)->into->rel &&
+((DR_intorel *) gstate->dest)->into->rel->relname)
+{

to:
+if (GetParallelInsertCmdType(gstate->dest) ==
+PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+{

I will update this in the next version of the patch set.

>
> +* In case if no workers were launched, allow the leader to
insert entire
> +* tuples.
> what does "entire tuples" mean? should it maybe be "all tuples"?

Yeah, noticed that while working on the v19 patch set. Please have a look
at the v19 patch posted upthread [1].

> 
> wrt v18-0003patch:
>
> not sure if it is needed, but i was wondering if we would want more
> tests with multiple gather nodes existing? caused e.g. by using CTE's,
> valid subquery's (like the one test you have, but without the group
> by/having)?

Re: Test harness for regex code (to allow importing Tcl's test suite)

2021-01-04 Thread Joel Jacobson

On Mon, Jan 4, 2021, at 04:49, Tom Lane wrote:
>Over the holiday break I've been fooling with some regex performance
>improvements.

Cool! I've also been fooling with regex performance over the years myself, not 
in the PostgreSQL code, but in general.

More specifically, to first DFA-minimize the regex,
and then to generate LLVMIR directly from the graph.

Perhaps some of the ideas could be interesting to look at.

Here is a live demo:  https://compiler.org/reason-re-nfa/src/index.html

One idea that I came up with myself is the "merge_linear" step,
where when possible, multiple characters are read in the same operation.
Not sure if other regex JIT engines does this, but it makes quite a difference
for large regexes where you have long strings.

Note: There is no support for capture groups, back-references, etc, but | + * 
() [] [^] works.

/Joel

Re: pg_rewind restore_command issue in PG12

2021-01-04 Thread Amine Tengilimoglu

   When I read the pg_rewind PG12 doc.  It says:

"... but if the target cluster ran for a long time after the divergence,
the old WAL files might no longer be present. In that case, they can be
manually copied from the WAL archive to the pg_wal directory,* or fetched
on startup by configuring **primary_conninfo

or restore_command
*
.".

  So I thought we could use restore_command. But when I try to use it , I
see it doesn't work either.

Thanks.

Heikki Linnakangas , 4 Oca 2021 Pzt, 15:42 tarihinde şunu
yazdı:

> On 03/01/2021 20:13, Amine Tengilimoglu wrote:
> >   In a situation where pg_rewind gets an error due to a missing
> > wall, I  have set restore_command so that the needed wals can be read
> > from the archive (I don't want to manually copy the wal files), but I
> > see it doesn't work. What am I missing?  Is restore_command not really
> > working with pg_rewind in PG12? Or  how should I trigger pg_rewind to
> > use restore_command?
>
> Using restore_command is a new feature in pg_rewind in PostgreSQL 13. It
> doesn't work on earlier versions.
>
> - Heikki
>

Re: pg_rewind restore_command issue in PG12

2021-01-04 Thread Heikki Linnakangas


On 03/01/2021 20:13, Amine Tengilimoglu wrote:
      In a situation where pg_rewind gets an error due to a missing 
wall, I  have set restore_command so that the needed wals can be read 
from the archive (I don't want to manually copy the wal files), but I 
see it doesn't work. What am I missing?  Is restore_command not really 
working with pg_rewind in PG12? Or  how should I trigger pg_rewind to 
use restore_command?


Using restore_command is a new feature in pg_rewind in PostgreSQL 13. It 
doesn't work on earlier versions.


- Heikki

Re: Parallel Inserts in CREATE TABLE AS


On 04-01-2021 12:16, Hou, Zhijie wrote:

Hi



wrt v18-0002patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs

what i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?


IMO, The reason is we want to make sure we only ignore the cost when Gather is 
the top node.
And it seems the generate_useful_gather_paths called in 
apply_scanjoin_target_to_paths is the right place which can only create top 
node Gather.
So we change the flag in apply_scanjoin_target_to_paths around 
generate_useful_gather_paths to identify the top node.


Best regards,
houzj




Hi,

I was wondering actually if we need the state machine. Reason is that as 
AFAICS the code could be placed in create_gather_path, where you can 
also check if it is a top gather node, whether the dest receiver is the 
right type, etc? To me that seems like a nicer solution as its makes 
that all logic that decides whether or not a parallel CTAS is valid is 
in a single place instead of distributed over various places.


Kind regards,
Luc

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-04 Thread Heikki Linnakangas


On 02/12/2020 00:18, Peter Geoghegan wrote:

On Tue, Dec 1, 2020 at 1:50 AM Heikki Linnakangas  wrote:

On 30/11/2020 21:50, Peter Geoghegan wrote:

+} TM_IndexDelete;



+} TM_IndexStatus;



Is it really significantly faster to have two arrays? If you had just
one array, each element would be only 12 bytes long. That's not much
smaller than TM_IndexDeletes, which is 8 bytes.


Yeah, but the swap operations really matter here. At one point I found
that to be the case, anyway. That might no longer be true, though. It
might be that the code became less performance critical because other
parts of the design improved, resulting in the code getting called
less frequently. But if that is true then it has a lot to do with the
power-of-two bucketing that you go on to ask about -- that helped
performance a lot in certain specific cases (as I go into below).

I will add a TODO item for myself, to look into this again. You may
well be right.


+ /* First sort caller's array by TID */
+ heap_tid_shellsort(delstate->deltids, delstate->ndeltids);



Instead of sorting the array by TID, wouldn't it be enough to sort by
just the block numbers?


I don't understand. Yeah, I guess that we could do our initial sort of
the deltids array (the heap_tid_shellsort() call) just using
BlockNumber (not TID). But OTOH there might be some small locality
benefit to doing a proper TID sort at the level of each heap page. And
even if there isn't any such benefit, does it really matter?


You said above that heap_tid_shellsort() is very performance critical, 
and that's why you use the two arrays approach. If it's so performance 
critical that swapping 8 bytes vs 12 byte array elements makes a 
difference, I would guess that comparing TID vs just the block numbers 
would also make a difference.


If you really want to shave cycles, you could also store BlockNumber and 
OffsetNumber in the TM_IndexDelete array, instead of ItemPointerData. 
What's the difference, you might ask? ItemPointerData stores the block 
number as two 16 bit integers that need to be reassembled into a 32 bit 
integer in the ItemPointerGetBlockNumber() macro.


My argument here is two-pronged: If this is performance critical, you 
could do these additional optimizations. If it's not, then you don't 
need the two-arrays trick; one array would be simpler.


I'm not sure how performance critical this really is, or how many 
instructions each of these optimizations might shave off, so of course I 
might be wrong and the tradeoff you have in the patch now really is the 
best. However, my intuition would be to use a single array, because 
that's simpler, and do all the other optimizations instead: store the 
tid in the struct as separate BlockNumber and OffsetNumber fields, and 
sort only by the block number. Those optimizations are very deep in the 
low level functions and won't add much to the mental effort of 
understanding the algorithm at a high level.



* While in general the presence of promising tuples (the hint that index
+  * AMs provide) is the best information that we have to go on, it is based
+  * on simple heuristics about duplicates in indexes that are understood to
+  * have specific flaws.  We should not let the most promising heap pages
+  * win or lose on the basis of _relatively_ small differences in the total
+  * number of promising tuples.  Small differences between the most
+  * promising few heap pages are effectively ignored by applying this
+  * power-of-two bucketing scheme.
+  *


What are the "specific flaws"?


I just meant the obvious: the index AM doesn't actually know for sure
that there are any old versions on the leaf page that it will
ultimately be able to delete. This uncertainty needs to be managed,
including inside heapam.c. Feel free to suggest a better way of
expressing that sentiment.


Hmm, maybe: "... is the best information that we have to go on, it is 
just a guess based on simple heuristics about duplicates in indexes".



I understand that this is all based on rough heuristics, but is there
any advantage to rounding/bucketing the numbers like this? Even if small
differences in the total number of promising tuple is essentially noise
that can be safely ignored, is there any harm in letting those small
differences guide the choice? Wouldn't it be the essentially the same as
picking at random, or are those small differences somehow biased?


Excellent question! It actually helps enormously, especially with low
cardinality data that makes good use of the deduplication optimization
(where it is especially important to keep the costs and the benefits
in balance). This has been validated by benchmarking.

This design naturally allows the heapam.c code to take advantage of
both temporal and spatial locality. For example, let's say that you
have several indexes all on the same table, which get lots of non-HOT
UPDATEs, which are skewed. Naturally, the heap TIDs will match across
each

Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

On Wed, Dec 9, 2020 at 12:59 PM  wrote:
>
> Hi!
>
>
>
> I created a patch for improving CLOSE, FETCH, MOVE tab completion.
>
> Specifically, I add CLOSE, FETCH, MOVE tab completion for completing a 
> predefined cursors.
>

Thank you for the patch!

When I applied the patch, I got the following whitespace warnings:

$ git apply ~/patches/fix_tab_complete_close_fetch_move.patch
/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:40:
indent with spaces.
COMPLETE_WITH_QUERY(Query_for_list_of_cursors
/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:41:
indent with spaces.
" UNION SELECT 'ABSOLUTE'"
/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:42:
indent with spaces.
" UNION SELECT 'BACKWARD'"
/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:43:
indent with spaces.
" UNION SELECT 'FORWARD'"
/home/masahiko/patches/fix_tab_complete_close_fetch_move.patch:44:
indent with spaces.
" UNION SELECT 'RELATIVE'"
warning: squelched 19 whitespace errors
warning: 24 lines add whitespace errors.

I recommend you checking whitespaces or running pgindent.

Here are some comments:

/*
-* Complete FETCH with one of ABSOLUTE, BACKWARD, FORWARD, RELATIVE, ALL,
-* NEXT, PRIOR, FIRST, LAST
+* Complete FETCH with a list of cursors and one of ABSOLUTE,
BACKWARD, FORWARD, RELATIVE, ALL,
+* NEXT, PRIOR, FIRST, LAST, FROM, IN
 */

Maybe I think the commend should say:

+* Complete FETCH with one of ABSOLUTE, BACKWARD, FORWARD, RELATIVE, ALL,
+* NEXT, PRIOR, FIRST, LAST, FROM, IN, and a list of cursors

Other updates of the comment seem to have the same issue.

Or I think we can omit the details from the comment since it seems
redundant information. We can read it from the arguments of the
following COMPLETE_WITH_QUERY().

---
-   /*
-* Complete FETCH  with "FROM" or "IN". These are equivalent,
-* but we may as well tab-complete both: perhaps some users prefer one
-* variant or the other.
-*/
+   /* Complete FETCH  with a list of cursors and "FROM" or "IN" */

Why did you remove the second sentence in the above comment?

---
The patch improves tab completion for CLOSE, FETCH, and MOVE but is
there any reason why you didn't do that for DECLARE? I think DECLARE
also can be improved and it's a good timing for that.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: Single transaction in the tablesync worker?

On Mon, Jan 4, 2021 at 2:38 PM Amit Kapila  wrote:
>
> Few other comments:
> =
>

Few more comments on v9:
==
1.
+ /* Drop the tablesync slot. */
+ {
+ char *syncslotname = ReplicationSlotNameForTablesync(subid, relid);
+
+ /*
+ * If the subscription slotname is NONE/NULL and the connection to publisher is
+ * broken, but the DropSubscription should still be allowed to complete.
+ * But without a connection it is not possible to drop any tablesync slots.
+ */
+ if (!wrconn)
+ {
+ /* FIXME - OK to just log a warning? */
+ elog(WARNING, "!!>> DropSubscription: no connection. Cannot drop
tablesync slot \"%s\".",
+   syncslotname);
+ }

Why is this not an ERROR? We don't want to keep the table slots
lingering after DropSubscription. If there is any tablesync slot that
needs to be dropped and the publisher is not available then we should
raise an error.

2.
+ /*
+ * Tablesync resource cleanup (slots and origins).
+ *
+ * Any READY-state relations would already have dealt with clean-ups.
+ */
+ {

There is no need to start a separate block '{' here.

3.
+#define SUBREL_STATE_COPYDONE 'C' /* tablesync copy phase is completed */

You can mention in the comments that sublsn will be NULL for this
state as it is mentioned for other similar states. Can we think of
using any letter in lower case for this as all other states are in
lower-case except for this which makes it a look bit odd? We can use
'f' or 'e' and describe it as 'copy finished' or 'copy end'. I am fine
if you have any better ideas.

4.
LogicalRepSyncTableStart()
{
..
..
+copy_table_done:
+
+ /* Setup replication origin tracking. */
+ {
+ char originname[NAMEDATALEN];
+ RepOriginId originid;
+
+ snprintf(originname, sizeof(originname), "pg_%u_%u",
MySubscription->oid, MyLogicalRepWorker->relid);
+ originid = replorigin_by_name(originname, true);
+ if (!OidIsValid(originid))
+ {
+ /*
+ * Origin tracking does not exist. Create it now, and advance to LSN
got from walrcv_create_slot.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_create
\"%s\".", originname);
+ originid = replorigin_create(originname);
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_session_setup
\"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 1 replorigin_advance
\"%s\".", originname);
+ replorigin_advance(originid, *origin_startpos, InvalidXLogRecPtr,
+true /* go backward */ , true /* WAL log */ );
+ }
+ else
+ {
+ /*
+ * Origin tracking already exists.
+ */
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2 replorigin_session_setup
\"%s\".", originname);
+ replorigin_session_setup(originid);
+ replorigin_session_origin = originid;
+ elog(LOG, "!!>> LogicalRepSyncTableStart: 2
replorigin_session_get_progress \"%s\".", originname);
+ *origin_startpos = replorigin_session_get_progress(false);
+ }
..
..
}

I am not sure if this code is correct because, for the very first time
when the copy is done, we don't expect replication origin to exist
whereas this code will silently use already existing replication
origin which can lead to a wrong start position for the slot. In such
a case it should error out. I guess we should create the replication
origin before making the state as copydone. I feel we should even have
a test case for this as it is not difficult to have a pre-existing
replication origin.

5. Is it possible to write a testcase where we fail (say due to pk
violation or some other error) after the initial copy is done, then
remove the conflicting row and allow a copy to be completed? If we
already have any such test then it is fine.

6.
+/*
+ * Drop the replication slot at the publisher node
+ * using the replication connection.
+ */

This comment looks a bit odd. The first line appears to be too short.
We have limit of 80 chars but this is much lesser than that.

7.
@@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
  LWLockAcquire(_state->lock, LW_EXCLUSIVE);

  /* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 &&
replication_state->acquired_by != MyProcPid)
  {

I think you won't need this change if you do replorigin_advance before
replorigin_session_setup in your patch.

8.
- * that ensures we won't loose knowledge about that after a crash if the
+ * that ensures we won't lose knowledge about that after a crash if the

It is better to submit this as a separate patch.

-- 
With Regards,
Amit Kapila.

RE: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Hou, Zhijie

Hi

> 
> wrt v18-0002patch:
> 
> It looks like this introduces a state machine that goes like:
> - starts at CTAS_PARALLEL_INS_UNDEF
> - possibly moves to CTAS_PARALLEL_INS_SELECT
> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> - if both were added at some stage, we can go to
> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> 
> what i'm wondering is why you opted to put logic around
> generate_useful_gather_paths and in cost_gather when to me it seems more
> logical to put it in create_gather_path? i'm probably missing something
> there?

IMO, The reason is we want to make sure we only ignore the cost when Gather is 
the top node.
And it seems the generate_useful_gather_paths called in 
apply_scanjoin_target_to_paths is the right place which can only create top 
node Gather.
So we change the flag in apply_scanjoin_target_to_paths around 
generate_useful_gather_paths to identify the top node. 


Best regards,
houzj

Re: poc - possibility to write window function in PL languages

2021-01-04 Thread Pavel Stehule

Hi

pá 1. 1. 2021 v 18:57 odesílatel Zhihong Yu  napsal:

> Hi, Pavel:
> Happy New Year.
>
> +   command with clause WINDOW. The specific feature of
> +   this functions is a possibility to two special storages with
>
> this functions -> this function
>
> possibility to two special storages: there is no verb.
>
> 'store with stored one value': store is repeated.
>
> + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
>
> It would be better to change 2020 to 2021 in the new files.
>

fixed

>
> For some functions, such as windowobject_get_func_arg_frame, it would be
> better to add comment explaining their purposes.
>

It is commented before. These functions just call WinAPI functions

/*
 * High level access function. These functions are wrappers for windows API
 * for PL languages based on usage WindowObjectProxy.
 */



> For estimate_partition_context_size():
> +errmsg("size of value is greather than limit (1024
> bytes)")));
>
> Please include the value of typlen in the message. There is similar error
> message in the else block where value of size should be included.
>
> +   return *realsize;
> +   }
> +   else
>
> The 'else' is not needed since the if block ends with return.
>

yes, but it is there for better readability (symmetry)

>
> +   size += size / 3;
>
> Please add a comment for the choice of constant 3.
>
> +   /* by default we allocate 30 bytes */
> +   *realsize = 0;
>
> The value 30 may not be accurate - from the caller:
>
> +   if (PG_ARGISNULL(2))
> +   minsize = VARLENA_MINSIZE;
> +   else
> +   minsize = PG_GETARG_INT32(2);
>
> VARLENA_MINSIZE is 32.
>
> Cheers
>
> On Fri, Jan 1, 2021 at 3:29 AM Pavel Stehule 
> wrote:
>
>> Hi
>>
>> rebase
>>
>> Regards
>>
>> Pavel
>>
>
I am sending updated patch

Thank you for comments

Regards

Pavel


plpgsql-window-functions-20210104.patch.gz
Description: application/gzip

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar  wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> The patch looks good to me. With regards to the code comments, I had pretty 
> similar concerns as already raised by Dmitry; and those have already been 
> answered by you. So this patch is good to go from my side once you have 
> updated the comments per your last email.

For the 0001 patch, since ReindexIndexInfo is used only within
ReindexRelationConcurrently() I think it’s a function-local structure
type. So we can declare it within the function. What do you think?

Regards,


--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: Parallel Inserts in CREATE TABLE AS


On 30-12-2020 04:55, Bharath Rupireddy wrote:

On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu  wrote:

w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch

+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.

I think the 'unless' should be 'if'. As can be seen from the body of the method:

+   if (!ps->ps_ProjInfo)
+   {
+   GatherState *gstate = (GatherState *) ps;
+
+   parallel = true;


Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
that no change in 0001 to 0003 patches from v17.

Please consider v18 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Hi,

Sorry it took so long to get back to reviewing this.

wrt v18-0001patch:

+   /*
+* If the worker is for parallel insert in CTAS, then use the 
proper
+* dest receiver.
+*/
+   intoclause = (IntoClause *) stringToNode(intoclausestr);
+   receiver = CreateIntoRelDestReceiver(intoclause);
+   ((DR_intorel *)receiver)->is_parallel_worker = true;
+   ((DR_intorel *)receiver)->object_id = fpes->objectid;
I would move this into a function called e.g. 
GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in 
createas.c.

I would then also split up intorel_startup into intorel_leader_startup
and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set 
self->pub.rStartup to intorel_worker_startup.



+   volatile pg_atomic_uint64   *processed;
why is it volatile?


+   if (isctas)
+   {
+   intoclause = ((DR_intorel *) node->dest)->into;
+   objectid = ((DR_intorel *) 
node->dest)->object_id;
+   }
Given that you extract them each once and then pass them directly into 
the parallel-worker, can't you instead pass in the destreceiver and 
leave that logic to ExecInitParallelPlan?



+   if (IS_PARALLEL_CTAS_DEST(gstate->dest) 
&&
+   ((DR_intorel *) 
gstate->dest)->into->rel &&
+   ((DR_intorel *) 
gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted? 
because it seems from the intorel_startup function that that would be 
set as soon as startup was done, which i assume (wrongly?) is always done?



+* In case if no workers were launched, allow the leader to insert 
entire
+* tuples.
what does "entire tuples" mean? should it maybe be "all tuples"?



wrt v18-0002patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to 
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs


what i'm wondering is why you opted to put logic around 
generate_useful_gather_paths and in cost_gather when to me it seems more 
logical to put it in create_gather_path? i'm probably missing something 
there?




wrt v18-0003patch:

not sure if it is needed, but i was wondering if we would want more 
tests with multiple gather nodes existing? caused e.g. by using CTE's, 
valid subquery's (like the one test you have, but without the group 
by/having)?



Kind regards,
Luc

Re: zstd compression for pg_dump

2021-01-04 Thread Daniil Zakhlystov

Hi!

> On Jan 4, 2021, at 11:04 AM, Andrey Borodin  wrote:
> 
> Daniil, is levels definition compatible with libpq compression patch?
> +typedef struct Compress {
> + CompressionAlgorithmalg;
> + int level;
> + /* Is a nondefault level set ?  This is useful since different 
> compression
> +  * methods have different "default" levels.  For now we assume the 
> levels
> +  * are all integer, though.
> + */
> + boollevel_set;
> +} Compress;

Similarly to this patch, it is also possible to define the compression level at 
the initialization stage in libpq compression patch.

The difference is that in libpq compression patch the default compression level 
always equal to 1, independently of the chosen compression algorithm.

> On Jan 4, 2021, at 11:04 AM, Andrey Borodin  wrote:
> 
> Libpq compression encountered some problems with memory consumption which 
> required some extra config efforts.

> On Jan 4, 2021, at 12:06 PM, Justin Pryzby  wrote:
> 
> RAM use is not significantly different from zlib, except that zstd --long adds
> more memory.

Regarding ZSTD memory usage:

Recently I’ve made a couple of tests of libpq compression with different 
ZLIB/ZSTD compression levels which shown that compressing/decompressing ZSTD w/ 
high compression levels 
require to allocate more virtual (Commited_AS) memory, which may be exploited 
by malicious clients:

https://www.postgresql.org/message-id/62527092-16BD-479F-B503-FA527AF3B0C2%40yandex-team.ru

We can avoid high memory usage by limiting the max window size to 8MB. This 
should effectively disable the support of compression levels above 19:
https://www.postgresql.org/message-id/6A45DFAA-1682-4EF2-B835-C5F46615EC49%40yandex-team.ru

So maybe it is worthwhile to use similar restrictions in this patch.

—
Daniil Zakhlystov

Re: A failure of standby to follow timeline switch

2021-01-04 Thread Fujii Masao





On 2021/01/04 12:06, Kyotaro Horiguchi wrote:

At Sat, 26 Dec 2020 02:15:06 +0900, Fujii Masao  
wrote in



On 2020/12/25 12:03, Kyotaro Horiguchi wrote:

Thank you for looking this.
At Thu, 24 Dec 2020 15:33:04 +0900, Fujii Masao
 wrote in

When I applied two patches in the master branch and
ran "make check-world", I got the following error.

== creating database "contrib_regression" ==
# Looks like you planned 37 tests but ran 36.
# Looks like your test exited with 255 just after 36.
t/001_stream_rep.pl ..
Dubious, test returned 255 (wstat 65280, 0xff00)
Failed 1/37 subtests
...
Test Summary Report
---
t/001_stream_rep.pl(Wstat: 65280 Tests: 36 Failed: 0)
Non-zero exit status: 255
Parse errors: Bad plan.  You planned 37 tests but ran 36.
Files=21, Tests=239, 302 wallclock secs ( 0.10 usr 0.05 sys + 41.69
cusr 39.84 csys = 81.68 CPU)
Result: FAIL
make[2]: *** [check] Error 1
make[1]: *** [check-recovery-recurse] Error 2
make[1]: *** Waiting for unfinished jobs
t/070_dropuser.pl . ok

Mmm. I retried that and saw it succeed (with 0002 applied).
If I modified "user Test::More tests => 37" to 38 in the perl file, I
got a similar result.


What happens if you run make check-world with -j 4? When I ran that,
the test failed. But with -j 1, the test finished with success. I'm
not sure
why this happened, though..


Maybe this is it.

+   usleep(100_000);

If the script doesn't find the expected log line, it reaches the
usleep and bark that "Undefined subroutine ::usleep called...". I
thought I tested that path but perhaps I overlooked the error. "use
Time::HiRes" is needed.


Yes.



The attached is the fixed version.


Thanks for updating the patches!


In the first patch, the test added to 001_stream_rep.pl involves two
copied functions related to server-log investigation from
019_repslot_limit.pl.


So you're planning to define them commonly in TestLib.pm or elsewhere?

+$node_primary_2->init(allows_streaming => 1);
+$node_primary_2->enable_archiving; # needed to make .paritial segment

Isn't it better to use has_archiving flag in init() instead of doing
enable_archiving, like other tests do?

0002 looks good to me.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Parallel Inserts in CREATE TABLE AS

On Wed, Dec 30, 2020 at 5:28 PM vignesh C  wrote:
> Few comments:
> -   /*
> -* To allow parallel inserts, we need to ensure that they are safe to 
> be
> -* performed in workers. We have the infrastructure to allow parallel
> -* inserts in general except for the cases where inserts generate a 
> new
> -* CommandId (eg. inserts into a table having a foreign key column).
> -*/
> -   if (IsParallelWorker())
> -   ereport(ERROR,
> -   (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> -errmsg("cannot insert tuples in a
> parallel worker")));
>
> Is it possible to add a check if it is a CTAS insert here as we do not
> support insert in parallel workers from others as of now.

Currently, there's no global variable in which we can selectively skip
this in case of parallel insertion in CTAS. How about having a
variable in any of the worker global contexts, set that when parallel
insertion is chosen for CTAS and use that in heap_prepare_insert() to
skip the above error? Eventually, we can remove this restriction
entirely in case we fully allow parallelism for INSERT INTO SELECT,
CTAS, and COPY.

Thoughts?

> +   Oid objectid;   /* workers to
> open relation/table.  */
> +   /* Number of tuples inserted by all the workers. */
> +   pg_atomic_uint64processed;
>
> We can just mention relation instead of relation/table.

I will modify it in the next patch set.

> +select explain_pictas(
> +'create table parallel_write as select length(stringu1) from tenk1;');
> +  explain_pictas
> +--
> + Gather (actual rows=N loops=N)
> +   Workers Planned: 4
> +   Workers Launched: N
> + ->  Create parallel_write
> +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> +(5 rows)
> +
> +select count(*) from parallel_write;
>
> Can we include selection of cmin, xmin for one of the test to verify
> that it uses the same transaction id  in the parallel workers
> something like:
> select distinct(cmin,xmin) from parallel_write;

This is not possible since cmin and xmin are dynamic, we can not use
them in test cases. I think it's not necessary to check whether the
leader and workers are in the same txn or not, since we are not
creating a new txn. All the txn state from the leader is serialized in
SerializeTransactionState and restored in
StartParallelWorkerTransaction.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Inserts in CREATE TABLE AS

On Wed, Dec 30, 2020 at 5:26 PM vignesh C  wrote:
>
> On Wed, Dec 30, 2020 at 10:47 AM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar  wrote:
> > > I have completed reviewing 0001, I don't have more comments, just one
> > > question.  Soon I will review the remaining patches.
> >
> > Thanks.
> >
> > > +/* If parallel inserts are to be allowed, set a few extra 
> > > information. */
> > > +if (myState->is_parallel)
> > > +{
> > > +myState->object_id = intoRelationAddr.objectId;
> > > +
> > > +/*
> > > + * We don't need to skip contacting FSM while inserting tuples 
> > > for
> > > + * parallel mode, while extending the relations, workers instead 
> > > of
> > > + * blocking on a page while another worker is inserting, can 
> > > check the
> > > + * FSM for another page that can accommodate the tuples. This 
> > > results
> > > + * in major benefit for parallel inserts.
> > > + */
> > > +myState->ti_options = 0;
> > >
> > > Is there any performance data for this or just theoretical analysis?
> >
> > I have seen that we don't get much performance with the skip fsm
> > option, though I don't have the data to back it up. I'm planning to
> > run performance tests after the patches 0001, 0002 and 0003 get
> > reviewed. I will capture the data at that time. Hope that's fine.
> >
>
> When you run the performance tests, you can try to capture and publish
> relation size & the number of pages that are getting created for base
> table and the CTAS table, you can use something like SELECT relpages
> FROM pg_class WHERE relname = 'tablename &  SELECT
> pg_total_relation_size('tablename'). Just to make sure that there is
> no significant difference between the base table and CTAS table.

I can do that, I'm sure the number of pages will be equal or little
more, since I observed this for parallel copy.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Single transaction in the tablesync worker?

On Mon, Jan 4, 2021 at 8:06 PM Amit Kapila  wrote:
> Few other comments:
> =
> 1.
> + elog(LOG, "!!>> DropSubscription: dropping the tablesync slot
> \"%s\".", syncslotname);
> + ReplicationSlotDropAtPubNode(wrconn, syncslotname);
> + elog(LOG, "!!>> DropSubscription: dropped the tablesync slot
> \"%s\".", syncslotname);
>
> ...
> ...
>
> + elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot
> \"%s\".", syncslotname);
> + ReplicationSlotDropAtPubNode(wrconn, syncslotname);
> + elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot
> \"%s\".", syncslotname);
>
> Remove these and other elogs added to aid debugging or testing. If you
> need these for development purposes then move these to separate patch.

Fixed in latest patch (v10).

>
> 2. Remove WIP from the commit message and patch name.
>
> --

Fixed in latest patch (v10)

---
v10 = 
https://www.postgresql.org/message-id/CAHut%2BPuzPmFzk3p4oL9H3nkiY6utFryV9c5dW6kRhCe_RY%3DgnA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Single transaction in the tablesync worker?

Hi Amit.

PSA my v10 patch for the Solution1.

v10 is essentially the same as v9, except now all the temporary "!!>>"
logging has been isolated to a separate (optional) patch 0002.



Features:

* tablesync slot is now permanent instead of temporary. The tablesync
slot name is no longer tied to the Subscription slot na

* the tablesync slot cleanup (drop) code is added for DropSubscription
and for finish_sync_worker functions

* tablesync worked now allowing multiple tx instead of single tx

* a new state (SUBREL_STATE_COPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.

* if a re-launched tablesync finds the state is SUBREL_STATE_COPYDONE
then it will bypass the initial copy_table phase.

* tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.

* tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.

* the DropSubscription cleanup code was enhanced (v7+) to take care of
crashed sync workers.

* minor updates to PG docs

TODO / Known Issues:

* address review comments

---

Kind Regards,
Peter Smith.
Fujitsu Australia


v10-0002-Tablesync-extra-logging.patch
Description: Binary data


v10-0001-Tablesync-Solution1.patch
Description: Binary data

Re: Single transaction in the tablesync worker?

On Wed, Dec 30, 2020 at 11:51 AM Peter Smith  wrote:
>
> On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila  wrote:
> >
> > 1.
> > + * Rarely, the DropSubscription may be issued when a tablesync still
> > + * is in SYNCDONE but not yet in READY state. If this happens then
> > + * the drop slot could fail because it is already dropped.
> > + * In this case suppress and drop slot error.
> > + *
> > + * FIXME - Is there a better way than this?
> > + */
> > + if (rstate->state != SUBREL_STATE_SYNCDONE)
> > + PG_RE_THROW();
> >
> > So, does this situation happens when we try to drop subscription after
> > the state is changed to syncdone but not syncready. If so, then can't
> > we write a function GetSubscriptionNotDoneRelations similar to
> > GetSubscriptionNotReadyRelations where we get a list of relations that
> > are not in done stage. I think this should be safe because once we are
> > here we shouldn't be allowed to start a new worker and old workers are
> > already stopped by this function.
>
> Yes, but I don't see how adding such a function is an improvement over
> the existing code:
>

The advantage is that we don't need to use try..catch to deal with
such conditions which I don't think is a good way to deal with such
cases. Also, even after using try...catch, still, we can leak the
slots because the patch drops the slot after changing the state to
syncdone and if there is any error while dropping the slot, it simply
skips it. So, it is possible that the rel state is syncdone but the
slot still exists and we get an error due to some different reason,
and then we will silently skip it again and allow the subscription to
be dropped.

I think instead what we should do is to drop the slot before we change
the rel state to syncdone. Also, if the apply workers fail to drop the
slot, it should try to again drop it after restart. In
DropSubscription, we can then check if the rel state is not SYNC or
READY, we can drop the corresponding slots.

> e.g.1. GetSubscriptionNotDoneRelations will include the READY state
> (which we don't want) just like GetSubscriptionNotReadyRelations
> includes the SYNCDONE state.
> e.g.2. Or, something like GetSubscriptionNotDoneAndNotReadyRelations
> would be an unnecessary overkill replacement for the current simple
> "if".
>

or we can probably modify the function as
GetSubscriptionRelationsNotInStates and pass it an array of states
which we don't want.

> AFAIK the code is OK as-is.
>

As described above, there are still race conditions where we can leak
slots and also this doesn't look clean.

Few other comments:
=
1.
+ elog(LOG, "!!>> DropSubscription: dropping the tablesync slot
\"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> DropSubscription: dropped the tablesync slot
\"%s\".", syncslotname);

...
...

+ elog(LOG, "!!>> finish_sync_worker: dropping the tablesync slot
\"%s\".", syncslotname);
+ ReplicationSlotDropAtPubNode(wrconn, syncslotname);
+ elog(LOG, "!!>> finish_sync_worker: dropped the tablesync slot
\"%s\".", syncslotname);

Remove these and other elogs added to aid debugging or testing. If you
need these for development purposes then move these to separate patch.

2. Remove WIP from the commit message and patch name.

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] logical decoding of two-phase transactions