Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-03-07 Thread Alexey Kondratov

On 07.03.2019 10:26, David Steele wrote:

On 3/6/19 5:38 PM, Andrey Borodin wrote:


The new patch is much smaller (less than 400 lines) and works as 
advertised.

There's a typo "retreive" there.


Ough, corrected this in three different places. Not my word, definitely. 
Thanks!




These lines look a little suspicious:
    char  postgres_exec_path[MAXPGPATH],
  postgres_cmd[MAXPGPATH],
  cmd_output[MAX_RESTORE_COMMAND];
Is it supposed to be any difference between MAXPGPATH and 
MAX_RESTORE_COMMAND?




Yes, it was supposed to be, but after your message I have double checked 
everything and figured out that we use MAXPGPATH for final 
restore_command build (with all aliases replaced). Thus, there is no 
need in a separated constant. I have replaced it with MAXPGPATH.




This patch appears to need attention from the author so I have marked 
it Waiting on Author.




I hope I have addressed all issues in the new patch version which is 
attached. Also, I have added more detailed explanation of new 
functionality into the multi-line commit-message.



Regards,

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 9770cab4909a3cd98c2db2b8a9fa4af1fedd4614 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v5] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 -
 src/bin/pg_rewind/parsexlog.c | 161 +-
 src/bin/pg_rewind/pg_rewind.c |  96 ++-
 src/bin/pg_rewind/pg_rewind.h |   7 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 +-
 8 files changed, 370 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a64ee29e..90e3f22f97 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
+   files might no longer be present. In that case, they can be automatically
+   copied by pg_rewind from the WAL archive to the 
+   pg_wal directory if either -r or
+   -R option is specified, or
fetched on startup by configuring  or
.  The use of
pg_rewind is not limited to failover, e.g.  a standby
@@ -200,6 +202,30 @@ PostgreSQL documentation
   
  
 
+ 
+  -r
+  --use-postgresql-conf
+  
+   
+Use restore_command in the postgresql.conf to
+retrieve missing in the target pg_wal directory
+WAL files from the WAL archive.
+   
+  
+ 
+
+ 
+  -R restore_command
+  --restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieval of the missing
+in the target pg_wal directory WAL files from
+the WAL archive.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index e19c265cbb..6be6dab7e0 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -12,6 +12,7 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
 
 #include "pg_rewind.h"
 #include "filemap.h"
@@ -45,6 +46,7 @@ static char xlogfpath[MAXPGPATH];
 typedef struct XLogPageReadPrivate
 {
 	const char *datadir;
+	const char *restoreCommand;
 	int			tliIndex;
 } XLogPageReadPrivate;
 
@@ -53,6 +55,9 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader,
    int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
    TimeLineID *pageTLI);
 
+static int RestoreArchivedWAL(const char *path, const char *xlogfname,
+   off_t expectedSize, const char *restoreCommand);
+
 /*
  * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline
  * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of
@@ -60,7 +65,7 @@ static int SimpleX

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-03-27 Thread Alexey Kondratov

On 26.03.2019 11:19, Michael Paquier wrote:

+ * This is a simplified and adapted to frontend version of
+ * RestoreArchivedFile function from transam/xlogarchive.c
+ */
+static int
+RestoreArchivedWAL(const char *path, const char *xlogfname,
I don't think that we should have duplicates for that, so I would
recommend refactoring the code so as a unique code path is taken by
both, especially since the user can fetch the command from
postgresql.conf.


This comment is here since the beginning of my work on this patch and 
now it is rather misleading.


Even if we does not take into account obvious differences like error 
reporting, different log levels based on many conditions, cleanup 
options, check for standby mode; restore_command execution at backend 
recovery and during pg_rewind has a very important difference. If it 
fails at backend, then as stated in the comment 'Remember, we 
rollforward UNTIL the restore fails so failure here is just part of the 
process' -- it is OK. In opposite, in pg_rewind if we failed to recover 
some required WAL segment, then it definitely means the end of the 
entire process, since we will fail at finding last common checkpoint or 
extracting page map.


The only part we can share is constructing restore_command with aliases 
replacement. However, even in this place the logic is slightly 
different, since we do not need %r alias for pg_rewind. The only use 
case of %r in restore_command I know is pg_standby, which seems to be as 
not a case for pg_rewind. I have tried to move this part to the common, 
but it becomes full of conditions and less concise.


Please, correct me if I am wrong, but it seems that there are enough 
differences to keep this function separated, isn't it?



Why two options?  Wouldn't actually be enough use-postgresql-conf to
do the job?  Note that "postgres" should always be installed if
pg_rewind is present because it is a backend-side utility, so while I
don't like adding a dependency to other binaries in one binary, having
an option to pass out a command directly via the command line of
pg_rewind stresses me more.


I am not familiar enough with DBA scenarios, where -R option may be 
useful, but I was asked a few times for that. I can only speculate that 
for example someone may want to run freshly rewinded cluster as master, 
not replica, so its config may differ from replica's one, where 
restore_command is surely intended to be. Thus, it is easier to leave 
master's config at the place and just specify restore_command as command 
line argument.



Don't we need to worry about signals interrupting the restore command?
It seems to me that some refactoring from the stuff in xlogarchive.c
would be in order.


Thank you for pointing me to this place again. Previously, I thought 
that we should not care about it, since if restore_command was not 
successful due to any reason, then rewind failed, so we will stop and 
exit at upper levels. However, if it was due to a signal, then some of 
next messages may be misleading, if e.g. user manually interrupted it 
for some reason. So that, I added a similar check here as well.


Updated version of patch is attached.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 9e00f7a7696a88f350e1e328a9758ab85631c813 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v6] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 -
 src/bin/pg_rewind/parsexlog.c | 167 +-
 src/bin/pg_rewind/pg_rewind.c |  96 ++-
 src/bin/pg_rewind/pg_rewind.h |   7 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 -
 8 files changed, 376 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a64ee29e..90e3f22f97 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   f

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-02-04 Thread Alexey Kondratov

Hi Tomas,

On 14.01.2019 21:23, Tomas Vondra wrote:

Attached is an updated patch series, merging fixes and changes to TAP
tests proposed by Alexey. I've merged the fixes into the appropriate
patches, and I've kept the TAP changes / new tests as separate patches
towards the end of the series.


I had problems applying this patch along with 2pc streaming one to the 
current master, but everything applied well on 97c39498e5. Regression 
tests pass. What I personally do not like in the current TAP tests set 
is that you have added "WITH (streaming=on)" to all tests including old 
non-streaming ones. It seems unclear, which mechanism is tested there: 
streaming, but those transactions probably do not hit memory limit, so 
it depends on default server parameters; or non-streaming, but then what 
is the need for (streaming=on)? I would prefer to add (streaming=on) 
only to the new tests, where it is clearly necessary.



I'm a bit unhappy with two aspects of the current patch series:

1) We now track schema changes in two ways - using the pre-existing
schema_sent flag in RelationSyncEntry, and the (newly added) flag in
ReorderBuffer. While those options are used for regular vs. streamed
transactions, fundamentally it's the same thing and so having two
competing ways seems like a bad idea. Not sure what's the best way to
resolve this, though.


Yes, sure, when I have found problems with streaming of extensive DDL, I 
added new flag in the simplest way, and it worked. Now, old schema_sent 
flag is per relation based, while the new one - is_schema_sent - is per 
top-level transaction based. If I get it correctly, the former seems to 
be more thrifty, since new schema is sent only if we are streaming 
change for relation, whose schema is outdated. In contrast, in the 
latter case we will send new schema even if there will be no new changes 
which belong to this relation.


I guess, it would be better to stick to the old behavior. I will try to 
investigate how to better use it in the streaming mode as well.



2) We've removed quite a few asserts, particularly ensuring sanity of
cmin/cmax values. To some extent that's expected, because by allowing
decoding of in-progress transactions relaxes some of those rules. But
I'd be much happier if some of those asserts could be reinstated, even
if only in a weaker form.



Asserts have been removed from two places: (1) 
HeapTupleSatisfiesHistoricMVCC, which seems inevitable, since we are 
touching the essence of the MVCC visibility rules, when trying to decode 
an in-progress transaction, and (2) ReorderBufferBuildTupleCidHash, 
which is probably not related directly to the topic of the ongoing 
patch, since Arseny Sher faced the same issue with simple repetitive DDL 
decoding [1] recently.


Not many, but I agree, that replacing them with some softer asserts 
would be better, than just removing, especially point 1).



[1] https://www.postgresql.org/message-id/flat/874l9p8hyw.fsf%40ars-thinkpad


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Too rigorous assert in reorderbuffer.c

2019-02-04 Thread Alexey Kondratov

Hi,

On 31.01.2019 9:21, Arseny Sher wrote:

My colleague Alexander Lakhin has noticed an assertion failure in
reorderbuffer.c:1330. Here is a simple snippet reproducing it:

SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
'test_decoding');

create table t(k int);
begin;
savepoint a;
alter table t alter column k type text;
rollback to savepoint a;
alter table t alter column k type bigint;
commit;

SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');


I just want to add, that I have accidentally discovered the same issue 
during the testing of the Tomas's large transactions streaming patch 
[1], and had to remove this assert to get things working. I thought that 
it was somehow related to the streaming mode and did not test the same 
query alone.



[1] 
https://www.postgresql.org/message-id/76fc440e-91c3-afe2-b78a-987205b3c758%402ndquadrant.com



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-02-08 Thread Alexey Kondratov

On 21.01.2019 23:50, a.kondra...@postgrespro.ru wrote:

Thank you for the review! I have updated the patch according to your
comments and remarks. Please, find new version attached.


During the self-reviewing of the code and tests, I discovered some 
problems with build on Windows. New version of the patch is attached and 
it fixes this issue as well as includes some minor code revisions.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 99c6d94f37a797400d41545a271ff111b92e9361 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 21 Dec 2018 14:00:30 +0300
Subject: [PATCH] pg_rewind: options to use restore_command from
 postgresql.conf or command line.

---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 +-
 src/backend/Makefile  |   4 +-
 src/backend/commands/extension.c  |   1 +
 src/backend/utils/misc/.gitignore |   1 -
 src/backend/utils/misc/Makefile   |   8 -
 src/backend/utils/misc/guc.c  | 434 +--
 src/bin/pg_rewind/Makefile|   2 +-
 src/bin/pg_rewind/parsexlog.c | 166 +-
 src/bin/pg_rewind/pg_rewind.c | 100 +++-
 src/bin/pg_rewind/pg_rewind.h |  10 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  93 +++-
 src/common/.gitignore |   1 +
 src/common/Makefile   |   9 +-
 src/{backend/utils/misc => common}/guc-file.l | 518 --
 src/include/common/guc-file.h |  50 ++
 src/include/utils/guc.h   |  39 +-
 src/tools/msvc/Mkvcbuild.pm   |   7 +-
 src/tools/msvc/clean.bat  |   2 +-
 21 files changed, 973 insertions(+), 514 deletions(-)
 delete mode 100644 src/backend/utils/misc/.gitignore
 rename src/{backend/utils/misc => common}/guc-file.l (60%)
 create mode 100644 src/include/common/guc-file.h

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a64ee29e..0c2441afa7 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
+   files might no longer be present. In that case, they can be automatically
+   copied by pg_rewind from the WAL archive to the 
+   pg_wal directory if either -r or
+   -R option is specified, or
fetched on startup by configuring  or
.  The use of
pg_rewind is not limited to failover, e.g.  a standby
@@ -200,6 +202,30 @@ PostgreSQL documentation
   
  
 
+ 
+  -r
+  --use-postgresql-conf
+  
+   
+Use restore_command in the postgresql.conf to
+retreive missing in the target pg_wal directory
+WAL files from the WAL archive.
+   
+  
+ 
+
+ 
+  -R restore_command
+  --restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieval of the missing
+in the target pg_wal directory WAL files from
+the WAL archive.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 478a96db9b..721cb57e89 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -186,7 +186,7 @@ distprep:
 	$(MAKE) -C replication	repl_gram.c repl_scanner.c syncrep_gram.c syncrep_scanner.c
 	$(MAKE) -C storage/lmgr	lwlocknames.h lwlocknames.c
 	$(MAKE) -C utils	distprep
-	$(MAKE) -C utils/misc	guc-file.c
+	$(MAKE) -C common	guc-file.c
 	$(MAKE) -C utils/sort	qsort_tuple.c
 
 
@@ -307,7 +307,7 @@ maintainer-clean: distclean
 	  replication/syncrep_scanner.c \
 	  storage/lmgr/lwlocknames.c \
 	  storage/lmgr/lwlocknames.h \
-	  utils/misc/guc-file.c \
+	  common/guc-file.c \
 	  utils/sort/qsort_tuple.c
 
 
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index daf3f51636..195eb8a821 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -50,6 +50,7 @@
 #include "commands/defrem.h"
 #include "commands/extension.h"
 #include "commands/schemacmds.h"
+#include "common/guc-file.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
diff --git a/src/backend/utils/misc/.gitignore b/src/backend/utils/misc/.gitignore
deleted file mode 10064

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-02-11 Thread Alexey Kondratov

Hi!

On 09.02.2019 14:31, Andrey Borodin wrote:

Here's a typo in postgreslq.conf
+   fprintf(stderr, _("%s: option -r/--use-postgresql-conf is 
specified, but postgreslq.conf is absent in the target directory\n"),


Fixed, thanks. I do not attach new version of the patch for just one 
typo, maybe there will be some more remarks from others.



Besides this, I think you can switch patch to "Ready for committer".

check-world is passing on macbook, docs are here, feature is implemented and 
tested.


OK, cfbot [1] does not complain about anything on Linux and Windows as 
well, so I am setting it to "Ready for committer" for the next commitfest.



[1] http://cfbot.cputube.org/alexey-kondratov.html


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Logical replication and restore from pg_basebackup

2019-02-12 Thread Alexey Kondratov

Hi Dmitry,

On 11.02.2019 17:39, Dmitry Vasiliev wrote:
What is the scope of logical replication if I cannot make recovery 
from pg_basebackup?



No, you can, but there are some things to keep in mind:

1) I could be wrong, but usage of pgbench in such a test seems to be a 
bad idea, since it drops and creates tables from the scratch, when -i is 
passed. However, if I recall it correctly, pub/sub slots use OIDs of 
relations, so I expect that you should get only initial sync data on 
replica and last pgbench results on master.


2) Next, 'srsubstate' check works only for initial sync. After that you 
should poll master's replication slot lsn for 'pg_current_wal_lsn() <= 
replay_lsn'.


Please, find attached a slightly modified version of your test (and gist 
[1]), which works just fine. You should replace %username% with your 
current username, since I did not run it as postgres user.


[1] https://gist.github.com/ololobus/a8a11f11eb67dfa1b6a95bff5e8f0096


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



logical-replication-test.sh
Description: application/shellscript


Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-02-18 Thread Alexey Kondratov

Hi Andres,

Thank you for your feedback.

On 16.02.2019 6:41, Andres Freund wrote:

It sounds like a seriously bad idea to use a different parser for
pg_rewind.  Why don't you just use postgres for it? As in
/path/to/postgres -D /path/to/datadir/ -C shared_buffers
?



Initially, when I started working on this patch, recovery options were 
not a part of GUCs, so it was not possible. Now, recovery.conf is a part 
of postgresql.conf and postgres -C only reads config files, initializes 
GUCs, prints required parameter and shuts down. Thus, it seems like an 
acceptable solution for me. Though I am still a little bit afraid to 
start up a server, which is meant to be shut down during rewind process, 
even for such a short period of time.


The only thing I am concerned most about is that pg_rewind always has 
been a standalone utility, so you were able to simply rewind two 
separated data directories one relatively another without any need for 
other postgres binaries. If we rely on postgres -C this would be tricky 
in some cases:


- end user should always care about postmaster binaries availability;
- even so, appropriate postgres executable may be absent in the ENV/PATH;
- locations of pg_rewind and postgres may be arbitrary depending on the 
distribution, which may be custom as well.


I cannot propose a reliable way of detecting path to postgres executable 
without directly asking users to provide it via PATH, command line 
option, etc. If someone can suggest anything, then it would be possible 
to make patch simpler in some way, but I always wanted to keep pg_rewind 
standalone and as simple as possible for end users.


Anyway, currently I do not use a different parser for pg_rewind. A few 
versions back I have made guc-file.l common for frontend/backend. So 
technically speaking it is the same parser as postmaster use, only small 
number of sophisticated error reporting is wrapped with IFDEF.



But if we go for that, that part of the patch *NEEDS* to be split
into a separate commit/patch. It's too hard to see functional
changes otherwise.


Yes, sure, please find attached new version of the patch set consisting 
of two separated patches. First is for making guc-file.l common between 
frontend/backend and second one is for adding new options into pg_rewind.



+   if (restore_ok)
+   {
+   xlogreadfd = open(xlogfpath, O_RDONLY | 
PG_BINARY, 0);
+
+   if (xlogreadfd < 0)
+   {
+   printf(_("could not open restored from archive file 
\"%s\": %s\n"), xlogfpath,
+   strerror(errno));
+   return -1;
+   }
+   else
+   pg_log(PG_DEBUG, "using restored from archive version 
of file \"%s\"\n", xlogfpath);
+   }
+   else
+   {
+   printf(_("could not restore file \"%s\" from 
archive: %s\n"), xlogfname,
+  strerror(errno));
+   return -1;
+   }
}
}
I suggest moving this to a separate function.


OK, I have slightly refactored and simplified this part. All checks of 
the recovered file have been moved into RestoreArchivedWAL. Hope it 
looks better now.



Isn't this entirely broken? restore_command could be set in a different
file no?


Maybe I got it wrong, but I do not think so. Since recovery options are 
now a part of GUCs, restore_command may be only set inside 
postgresql.conf or any files/subdirs which are included there to take an 
effect, isn't it? Parser will walk postgresql.conf with all includes 
recursively and should eventually find it, if it was set.




Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From c012e1e1149d04abc39bb4099fe1e18a4cd2ca2d Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 18 Feb 2019 12:23:37 +0300
Subject: [PATCH v3 2/2] Options to use restore_command with pg_rewind

---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 -
 src/bin/pg_rewind/Makefile|   2 +-
 src/bin/pg_rewind/parsexlog.c | 163 +-
 src/bin/pg_rewind/pg_rewind.c | 100 +++-
 src/bin/pg_rewind/pg_rewind.h |  10 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  93 ++-
 9 files changed, 388 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a6

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-02-18 Thread Alexey Kondratov

On 18.02.2019 19:49, Alvaro Herrera wrote:



On 16.02.2019 6:41, Andres Freund wrote:

It sounds like a seriously bad idea to use a different parser for
pg_rewind.  Why don't you just use postgres for it? As in
/path/to/postgres -D /path/to/datadir/ -C shared_buffers
?

Eh, this is what I suggested in this thread four months ago, though I
didn't remember at the time that aaa6e1def292 had already introduced -C
in 2011.  It's definitely the way to go ... all this messing about with
the parser is insane.



Yes, but four months ago recovery options were not a part of GUCs.

OK, if you and Andres are surely negative about solution with parser, 
then I will work out this one with postgres -C and come back till the 
next commitfest. I found that something similar is already used in 
pg_ctl and there is a mechanism for finding valid executables in exec.c. 
So it does not seem to be a big deal at the first sight.


Thanks for replies!


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-02-20 Thread Alexey Kondratov

Hi,



I will work out this one with postgres -C and come back till the next 
commitfest. I found that something similar is already used in pg_ctl 
and there is a mechanism for finding valid executables in exec.c. So 
it does not seem to be a big deal at the first sight.




I have reworked the patch, please find new version attached. It is 3 
times as smaller than the previous one and now touches pg_rewind's code 
only. Tests are also slightly refactored in order to remove duplicated 
code. Execution of postgres -C is used for restore_command retrieval (if 
-r is passed) as being suggested. Otherwise everything works as before.


Andres, Alvaro does it make sense now?



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 4c8f5c228e089e7e72835ae5c409a5bc8425ab15 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v4] pg_rewind: options to use restore_command from command
 line or cluster config

---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 -
 src/bin/pg_rewind/parsexlog.c | 161 +-
 src/bin/pg_rewind/pg_rewind.c |  98 +++-
 src/bin/pg_rewind/pg_rewind.h |   7 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 +-
 8 files changed, 372 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a64ee29e..0c2441afa7 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
+   files might no longer be present. In that case, they can be automatically
+   copied by pg_rewind from the WAL archive to the 
+   pg_wal directory if either -r or
+   -R option is specified, or
fetched on startup by configuring  or
.  The use of
pg_rewind is not limited to failover, e.g.  a standby
@@ -200,6 +202,30 @@ PostgreSQL documentation
   
  
 
+ 
+  -r
+  --use-postgresql-conf
+  
+   
+Use restore_command in the postgresql.conf to
+retreive missing in the target pg_wal directory
+WAL files from the WAL archive.
+   
+  
+ 
+
+ 
+  -R restore_command
+  --restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieval of the missing
+in the target pg_wal directory WAL files from
+the WAL archive.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index e19c265cbb..5978ec9b99 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -12,6 +12,7 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
 
 #include "pg_rewind.h"
 #include "filemap.h"
@@ -45,6 +46,7 @@ static char xlogfpath[MAXPGPATH];
 typedef struct XLogPageReadPrivate
 {
 	const char *datadir;
+	const char *restoreCommand;
 	int			tliIndex;
 } XLogPageReadPrivate;
 
@@ -53,6 +55,9 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader,
    int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
    TimeLineID *pageTLI);
 
+static int RestoreArchivedWAL(const char *path, const char *xlogfname,
+   off_t expectedSize, const char *restoreCommand);
+
 /*
  * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline
  * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of
@@ -60,7 +65,7 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader,
  */
 void
 extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex,
-			   XLogRecPtr endpoint)
+			   XLogRecPtr endpoint, const char *restore_command)
 {
 	XLogRecord *record;
 	XLogReaderState *xlogreader;
@@ -69,6 +74,7 @@ extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex,
 
 	private.datadir = datadir;
 	private.tliIndex = tliIndex;
+	private.restoreCommand = restore_command;
 	xlogreader = XLogReaderAllocate(WalSegSz, &SimpleXLogPageRead,
 	&private);
 	if (xlogreader == NULL)
@@ -156,7 +162,7 @@ readOneRecord(const char *datadir, XLogRecPtr ptr, int tliIndex)
 void
 findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, int tliIndex,
    XLogRecPtr *lastchkptrec, TimeLineID *lastchkpttli,
-   XLogRecPtr *lastchkptredo)
+   XLogRecPtr *lastchkptredo, const char *restoreCommand)
 {

Re: 2019-03 CF Summary / Review - Tranche #2

2019-02-20 Thread Alexey Kondratov

On 16.02.2019 8:45, Andres Freund wrote:

- pg_rewind: options to use restore_command from recovery.conf or
   command line

   WOA: Was previously marked as RFC, but I don't see how it is. Possibly
   can be finished, but does require a good bit more work.


Just sent new version of the patch to the thread [1], which removes all 
unnecessary complexity. I am willing to address all new issues during 
2019-03 CF if any.


[1] 
https://www.postgresql.org/message-id/c9cfabce-8fb6-493f-68ec-e0a72d957bf4%40postgrespro.ru



Thanks

--
Alexey Kondratov





Probably misleading comments or lack of tests in autoHeld portals management

2019-02-26 Thread Alexey Kondratov

Hi hackers,

I am trying to figure out current cursors/portals management and life 
cycle in Postgres. There are two if conditions for autoHeld portals:


- 'if (portal->autoHeld)' inside AtAbort_Portals at portalmem.c:802;
- '|| portal->autoHeld' inside AtCleanup_Portals at portalmem.c:871.

Their removal does not seem to affect anything, make check-world is 
passed. I have tried configure --with-perl/--with-python, which should 
be a case for autoHeld portals, but nothing changed.


For me it seems to be expectable, since autoHeld flag is always set 
along with createSubid=InvalidSubTransactionId inside HoldPinnedPortals, 
so the only one check 'createSubid == InvalidSubTransactionId' should be 
enough. However, comment sections are rather misleading:


(1) portal.h:126 confirms my guess 'If the portal is held over from a 
previous transaction, both subxids are InvalidSubTransactionId';


(2) while portalmem.c:797 states 'This is similar to the case of a 
cursor from a previous transaction, but it could also be that the cursor 
was auto-held in this transaction, so it wants to live on'.


I have tried, but could not build an example of valid query for the case 
described in (2), and it is definitely absent in regression tests.


Am I missing something?

Added Peter to cc, since he is a commiter of 056a5a3, where autoHeld has 
been introduced. Maybe it will be easier for him to recall the context. 
Anyway, sorry for disturb if this question is actually trivial.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index a92b4541bd..841d88df76 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -798,8 +798,6 @@ AtAbort_Portals(void)
 		 * cursor from a previous transaction, but it could also be that the
 		 * cursor was auto-held in this transaction, so it wants to live on.
 		 */
-		if (portal->autoHeld)
-			continue;
 
 		/*
 		 * If it was created in the current transaction, we can't do normal
@@ -868,7 +866,7 @@ AtCleanup_Portals(void)
 		 * Do nothing to cursors held over from a previous transaction or
 		 * auto-held ones.
 		 */
-		if (portal->createSubid == InvalidSubTransactionId || portal->autoHeld)
+		if (portal->createSubid == InvalidSubTransactionId)
 		{
 			Assert(portal->status != PORTAL_ACTIVE);
 			Assert(portal->resowner == NULL);


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2018-12-18 Thread Alexey Kondratov

On 18.12.2018 1:28, Tomas Vondra wrote:

4) There was a problem with marking top-level transaction as having
catalog changes if one of its subtransactions has. It was causing a
problem with DDL statements just after subtransaction start (savepoint),
so data from new columns is not replicated.

5) Similar issue with schema send. You send schema only once per each
sub/transaction (IIRC), while we have to update schema on each catalog
change: invalidation execution, snapshot rebuild, adding new tuple cids.
So I ended up with adding is_schema_send flag to ReorderBufferTXN, since
it is easy to set it inside RB and read in the output plugin. Probably,
we have to choose a better place for this flag.


Hmm. Can you share an example how to trigger these issues?


Test cases inside 014_stream_tough_ddl.pl and old ones (with 
streaming=true option added) should reproduce all these issues. In 
general, it happens in a txn like:


INSERT
SAVEPOINT
ALTER TABLE ... ADD COLUMN
INSERT

then the second insert may discover old version of catalog.


Interesting. Any idea where does the extra overhead in this particular
case come from? It's hard to deduce that from the single flame graph,
when I don't have anything to compare it with (i.e. the flame graph for
the "normal" case).


I guess that bottleneck is in disk operations. You can check 
logical_repl_worker_new_perf.svg flame graph: disk reads (~9%) and 
writes (~26%) take around 35% of CPU time in summary. To compare, 
please, see attached flame graph for the following transaction:


INSERT INTO large_text
SELECT (SELECT string_agg('x', ',')
FROM generate_series(1, 2000)) FROM generate_series(1, 100);

Execution Time: 44519.816 ms
Time: 98333,642 ms (01:38,334)

where disk IO is only ~7-8% in total. So we get very roughly the same 
~x4-5 performance drop here. JFYI, I am using a machine with SSD for tests.


Therefore, probably you may write changes on receiver in bigger chunks, 
not each change separately.



So I'm not particularly worried, but I'll look into that. I'd be much
more worried if there was measurable overhead in cases when there's no
streaming happening (either because it's disabled or the memory limit
was not hit).


What I have also just found, is that if a table row is large enough to 
be TOASTed, e.g.:


INSERT INTO large_text
SELECT (SELECT string_agg('x', ',')
FROM generate_series(1, 100)) FROM generate_series(1, 1000);

then logical_work_mem limit is not hit and we neither stream, nor spill 
to disk this transaction, while it is still large. In contrast, the 
transaction above (with 100 smaller rows) being comparable in size 
is streamed. Not sure, that it is easy to add proper accounting of 
TOAST-able columns, but it worth it.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

<>


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2018-12-19 Thread Alexey Kondratov

Hi Tomas,


I'm a bit confused by the changes to TAP tests. Per the patch summary,
some .pl files get renamed (nor sure why), a new one is added, etc.


I added new tap test case, streaming=true option inside old stream_* 
ones and incremented streaming tests number (+2) because of the 
collision between 009_matviews.pl / 009_stream_simple.pl and 
010_truncate.pl / 010_stream_subxact.pl. At least in the previous 
version of the patch they were under the same numbers. Nothing special, 
but for simplicity, please, find attached my new tap test separately.



  So
I've instead enabled streaming subscriptions in all tests, which with
this patch produces two failures:

Test Summary Report
---
t/004_sync.pl(Wstat: 7424 Tests: 1 Failed: 0)
   Non-zero exit status: 29
   Parse errors: Bad plan.  You planned 7 tests but ran 1.
t/011_stream_ddl.pl  (Wstat: 256 Tests: 2 Failed: 1)
   Failed test:  2
   Non-zero exit status: 1

So yeah, there's more stuff to fix. But I can't directly apply your
fixes because the updated patches are somewhat different.


Fixes should apply clearly to the previous version of your patch. Also, 
I am not sure, that it is a good idea to simply enable streaming 
subscriptions in all tests (e.g. pre streaming patch t/004_sync.pl), 
since then they do not hit not streaming code.



Interesting. Any idea where does the extra overhead in this particular
case come from? It's hard to deduce that from the single flame graph,
when I don't have anything to compare it with (i.e. the flame graph for
the "normal" case).

I guess that bottleneck is in disk operations. You can check
logical_repl_worker_new_perf.svg flame graph: disk reads (~9%) and
writes (~26%) take around 35% of CPU time in summary. To compare,
please, see attached flame graph for the following transaction:

INSERT INTO large_text
SELECT (SELECT string_agg('x', ',')
FROM generate_series(1, 2000)) FROM generate_series(1, 100);

Execution Time: 44519.816 ms
Time: 98333,642 ms (01:38,334)

where disk IO is only ~7-8% in total. So we get very roughly the same
~x4-5 performance drop here. JFYI, I am using a machine with SSD for tests.

Therefore, probably you may write changes on receiver in bigger chunks,
not each change separately.


Possibly, I/O is certainly a possible culprit, although we should be
using buffered I/O and there certainly are not any fsyncs here. So I'm
not sure why would it be cheaper to do the writes in batches.

BTW does this mean you see the overhead on the apply side? Or are you
running this on a single machine, and it's difficult to decide?


I run this on a single machine, but walsender and worker are utilizing 
almost 100% of CPU per each process all the time, and at apply side I/O 
syscalls take about 1/3 of CPU time. Though I am still not sure, but for 
me this result somehow links performance drop with problems at receiver 
side.


Writing in batches was just a hypothesis and to validate it I have 
performed test with large txn, but consisting of a smaller number of 
wide rows. This test does not exhibit any significant performance drop, 
while it was streamed too. So it seems to be valid. Anyway, I do not 
have other reasonable ideas beside that right now.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



0xx_stream_tough_ddl.pl
Description: Perl program


Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2018-12-24 Thread Alexey Kondratov

Hi Hackers,

I would like to propose a change, which allow CLUSTER, VACUUM FULL and 
REINDEX to modify relation tablespace on the fly. Actually, all these 
commands rebuild relation filenodes from the scratch, thus it seems 
natural to allow specifying them a new location. It may be helpful, when 
a server went out of disk, so you can attach new partition and perform 
e.g. VACUUM FULL, which will free some space and move data to a new 
location at the same time. Otherwise, you cannot complete VACUUM FULL 
until you have up to x2 relation disk space on a single partition.


Please, find attached a patch, which extend CLUSTER, VACUUM FULL and 
REINDEX with additional options:


REINDEX [ ( VERBOSE ) ] { INDEX | TABLE } name [ SET TABLESPACE 
new_tablespace ]


CLUSTER [VERBOSE] table_name [ USING index_name ] [ SET TABLESPACE 
new_tablespace ]

CLUSTER [VERBOSE] [ SET TABLESPACE new_tablespace ]

VACUUM ( FULL [, ...] ) [ SET TABLESPACE new_tablespace ] [ 
table_and_columns [, ...] ]
VACUUM FULL [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ SET TABLESPACE 
new_tablespace ] [ table_and_columns [, ...] ]


Thereby I have a few questions:

1) What do you think about this concept in general?

2) Is SET TABLESPACE an appropriate syntax for this functionality? I 
thought also about a plain TABLESPACE keyword, but it seems to be 
misleading, and WITH (options) clause like in CREATE SUBSCRIPTION ... 
WITH (options). So I preferred SET TABLESPACE, since the same syntax is 
used currently in ALTER to change tablespace, but maybe someone will 
have a better idea.


3) I was not able to update the lexer for VACUUM FULL to use SET 
TABLESPACE after table_and_columns and completely get rid of 
shift/reduce conflicts. I guess it happens, since table_and_columns is 
optional and may be of variable length, but have no idea how to deal 
with it. Any thoughts?



Regards

--
Alexey Kondratov

Postgres Professionalhttps://www.postgrespro.com
Russian Postgres Company

>From 0d971ce85f62baca7f6f713fa75a1bc20e09b3a2 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 21 Dec 2018 14:54:10 +0300
Subject: [PATCH] Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace.

---
 doc/src/sgml/ref/cluster.sgml |  13 ++-
 doc/src/sgml/ref/reindex.sgml |  10 ++
 doc/src/sgml/ref/vacuum.sgml  |  12 ++
 src/backend/catalog/index.c   | 128 ++
 src/backend/commands/cluster.c|  26 +++--
 src/backend/commands/indexcmds.c  |  23 +++-
 src/backend/commands/tablecmds.c  |  59 +-
 src/backend/commands/vacuum.c |  39 ++-
 src/backend/parser/gram.y |  62 +--
 src/backend/tcop/utility.c|  16 ++-
 src/include/catalog/index.h   |   4 +-
 src/include/commands/cluster.h|   2 +-
 src/include/commands/defrem.h |   6 +-
 src/include/commands/tablecmds.h  |   2 +
 src/include/commands/vacuum.h |   2 +
 src/include/nodes/parsenodes.h|   3 +
 src/test/regress/input/tablespace.source  |  43 
 src/test/regress/output/tablespace.source |  57 ++
 18 files changed, 424 insertions(+), 83 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 4da60d8d56..6e61587809 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -21,8 +21,8 @@ PostgreSQL documentation
 
  
 
-CLUSTER [VERBOSE] table_name [ USING index_name ]
-CLUSTER [VERBOSE]
+CLUSTER [VERBOSE] table_name [ USING index_name ] [ SET TABLESPACE new_tablespace ]
+CLUSTER [VERBOSE] [ SET TABLESPACE new_tablespace ]
 
  
 
@@ -99,6 +99,15 @@ CLUSTER [VERBOSE]
 

 
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store clustered relations.
+ 
+
+   
+

 VERBOSE
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 47cef987d4..661820c1e2 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE } name [ SET TABLESPACE new_tablespace ]
 
  
 
@@ -151,6 +152,15 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 

 
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store rebuilt indexes.
+ 
+
+   
+

 VERBOSE
 
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index fd911f5776..b4e3c59e1f 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -23,6 +23,8 @@ PostgreSQL documentation
 
 VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, ...] ]
+VACUUM ( FULL [, ...] ) [ SET TABLESPACE new_tablespace ] [ table_and_colu

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-12-24 Thread Alexey Kondratov

Hi Dmitry,

On 30.11.2018 19:04, Dmitry Dolgov wrote:

Just to confirm, patch still can be applied without conflicts, and pass all the
tests. Also I like the original motivation for the feature, sounds pretty
useful. For now I'm moving it to the next CF.


Thanks, although I have slightly updated patch to handle recent merge of 
the recovery.conf into GUCs and postgresq.conf [1], new patch is attached.





- Reusing the GUC parser is something I would avoid as well.  Not worth
the complexity.

Yes, I don't like it either. I will try to make guc-file.l frontend safe.

Any success with that?


I looked into it and found that currently guc-file.c is built as part of 
guc.c, so it seems to be even more complicated to unbound guc-file.c 
from backend. Thus, I have some plan of how to proceed with patch:


1) Add guc-file.h and build guc-file.c separately from guc.c

2) Put guc-file.l / guc-file.h into common/*

3) Isolate all backend specific calls in guc-file.l with #ifdef FRONTEND

Though I am not sure that this work is worth doing against extra 
redundancy added by simply adding frontend-safe copy of guc-file.l 
lexer. If someone has any thoughts I would be glad to receive comments.



[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2dedf4d9a899b36d1a8ed29be5efbd1b31a8fe85



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 521f62872d4e95cd02ddb535b8320256ff5e90cc Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 21 Dec 2018 14:00:30 +0300
Subject: [PATCH] pg_rewind: options to use restore_command from
 postgresql.conf or command line.

---
 src/bin/pg_rewind/Makefile|   5 +-
 src/bin/pg_rewind/RewindTest.pm   |  46 +-
 src/bin/pg_rewind/guc-file-fe.h   |  40 ++
 src/bin/pg_rewind/guc-file-fe.l   | 776 ++
 src/bin/pg_rewind/parsexlog.c | 182 +-
 src/bin/pg_rewind/pg_rewind.c |  91 ++-
 src/bin/pg_rewind/pg_rewind.h |  10 +-
 src/bin/pg_rewind/t/001_basic.pl  |   3 +-
 src/bin/pg_rewind/t/002_databases.pl  |   3 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   3 +-
 src/tools/msvc/Mkvcbuild.pm   |   1 +
 11 files changed, 1141 insertions(+), 19 deletions(-)
 create mode 100644 src/bin/pg_rewind/guc-file-fe.h
 create mode 100644 src/bin/pg_rewind/guc-file-fe.l

diff --git a/src/bin/pg_rewind/Makefile b/src/bin/pg_rewind/Makefile
index 2bcfcc61af..a0f5f97544 100644
--- a/src/bin/pg_rewind/Makefile
+++ b/src/bin/pg_rewind/Makefile
@@ -15,11 +15,12 @@ subdir = src/bin/pg_rewind
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-override CPPFLAGS := -I$(libpq_srcdir) -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) -DFRONTEND $(CPPFLAGS)
 LDFLAGS_INTERNAL += $(libpq_pgport)
 
 OBJS	= pg_rewind.o parsexlog.o xlogreader.o datapagemap.o timeline.o \
 	fetch.o file_ops.o copy_fetch.o libpq_fetch.o filemap.o logging.o \
+	guc-file-fe.o \
 	$(WIN32RES)
 
 EXTRA_CLEAN = xlogreader.c
@@ -32,6 +33,8 @@ pg_rewind: $(OBJS) | submake-libpq submake-libpgport
 xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/%
 	rm -f $@ && $(LN_S) $< .
 
+distprep: guc-file-fe.c
+
 install: all installdirs
 	$(INSTALL_PROGRAM) pg_rewind$(X) '$(DESTDIR)$(bindir)/pg_rewind$(X)'
 
diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm
index 3d07da5d94..b43c18a8c3 100644
--- a/src/bin/pg_rewind/RewindTest.pm
+++ b/src/bin/pg_rewind/RewindTest.pm
@@ -39,7 +39,9 @@ use Carp;
 use Config;
 use Exporter 'import';
 use File::Copy;
-use File::Path qw(rmtree);
+use File::Glob ':bsd_glob';
+use File::Path qw(remove_tree make_path);
+use File::Spec::Functions 'catpath';
 use IPC::Run qw(run);
 use PostgresNode;
 use TestLib;
@@ -250,6 +252,48 @@ sub run_pg_rewind
 			],
 			'pg_rewind remote');
 	}
+	elsif ($test_mode eq "archive")
+	{
+
+		# Do rewind using a local pgdata as source and
+		# specified directory with target WALs archive.
+		my $wals_archive_dir = catpath(${TestLib::tmp_check}, 'master_wals_archive');
+		my $test_master_datadir = $node_master->data_dir;
+		my @wal_files = bsd_glob catpath($test_master_datadir, 'pg_wal', '000*');
+		my $restore_command;
+
+		remove_tree($wals_archive_dir);
+		make_path($wals_archive_dir) or die;
+
+		# Move all old master WAL files to the archive.
+		# Old master should be stopped at this point.
+		foreach my $wal_file (@wal_files)
+		{
+			move($wal_file, "$wals_archive_dir/") or die;
+		}
+
+		if ($windows_os)
+		{
+			$restore_command = "copy $wals_archive_dir\\\%f \%p";
+		}
+		else
+		{
+			$restore_command = "cp $wals_archive_dir/\%f \%p";
+		}
+
+		# Stop the new master and be ready to perform the rewind.
+		$node_standby->stop;
+		command_ok(
+			[
+'pg_rewind',
+	

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-12-26 Thread Alexey Kondratov

Greetings,



- Reusing the GUC parser is something I would avoid as well.  Not 
worth

the complexity.
Yes, I don't like it either. I will try to make guc-file.l frontend 
safe.

Any success with that?


I looked into it and found that currently guc-file.c is built as part 
of guc.c, so it seems to be even more complicated to unbound 
guc-file.c from backend. Thus, I have some plan of how to proceed with 
patch:


1) Add guc-file.h and build guc-file.c separately from guc.c

2) Put guc-file.l / guc-file.h into common/*

3) Isolate all backend specific calls in guc-file.l with #ifdef FRONTEND

Though I am not sure that this work is worth doing against extra 
redundancy added by simply adding frontend-safe copy of guc-file.l 
lexer. If someone has any thoughts I would be glad to receive comments.




I have finally worked it out. Now there is a common version of 
guc-file.l and guc-file.c is built separately from guc.c. I had to use a 
limited number of #ifndef FRONTEND, mostly to replace erreport calls. 
Also, ProcessConfigFile and ProcessConfigFileInternal have been moved 
inside guc.c explicitly as being a backend specific. So for me this 
solution looks much more concise and neat.


Please, find the new version of patch attached. Tap tests have been 
updated as well in order to handle both command line and postgresql.conf 
specified restore_command.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 8a6c9f89f45c9568d95e05b0586d1cc54905e6de Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 21 Dec 2018 14:00:30 +0300
Subject: [PATCH] pg_rewind: options to use restore_command from
 postgresql.conf or command line.

---
 src/backend/Makefile  |   4 +-
 src/backend/commands/extension.c  |   1 +
 src/backend/utils/misc/Makefile   |   8 -
 src/backend/utils/misc/guc.c  | 434 +--
 src/bin/pg_rewind/Makefile|   2 +-
 src/bin/pg_rewind/RewindTest.pm   |  96 +++-
 src/bin/pg_rewind/parsexlog.c | 182 ++-
 src/bin/pg_rewind/pg_rewind.c |  91 +++-
 src/bin/pg_rewind/pg_rewind.h |  12 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/common/Makefile   |   7 +-
 src/{backend/utils/misc => common}/guc-file.l | 514 --
 src/include/common/guc-file.h |  50 ++
 src/include/utils/guc.h   |  39 +-
 src/tools/msvc/Mkvcbuild.pm   |   2 +-
 src/tools/msvc/clean.bat  |   2 +-
 18 files changed, 952 insertions(+), 504 deletions(-)
 rename src/{backend/utils/misc => common}/guc-file.l (60%)
 create mode 100644 src/include/common/guc-file.h

diff --git a/src/backend/Makefile b/src/backend/Makefile
index 25eb043941..ddbe2f3fce 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -186,7 +186,7 @@ distprep:
 	$(MAKE) -C replication	repl_gram.c repl_scanner.c syncrep_gram.c syncrep_scanner.c
 	$(MAKE) -C storage/lmgr	lwlocknames.h lwlocknames.c
 	$(MAKE) -C utils	distprep
-	$(MAKE) -C utils/misc	guc-file.c
+	$(MAKE) -C common	guc-file.c
 	$(MAKE) -C utils/sort	qsort_tuple.c
 
 
@@ -307,7 +307,7 @@ maintainer-clean: distclean
 	  replication/syncrep_scanner.c \
 	  storage/lmgr/lwlocknames.c \
 	  storage/lmgr/lwlocknames.h \
-	  utils/misc/guc-file.c \
+	  common/guc-file.c \
 	  utils/sort/qsort_tuple.c
 
 
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 31dcfe7b11..ec0367d068 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -47,6 +47,7 @@
 #include "commands/defrem.h"
 #include "commands/extension.h"
 #include "commands/schemacmds.h"
+#include "common/guc-file.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index a53fcdf188..2e6a879c46 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -25,11 +25,3 @@ override CPPFLAGS += -DPG_KRB_SRVTAB='"$(krb_srvtab)"'
 endif
 
 include $(top_srcdir)/src/backend/common.mk
-
-# guc-file is compiled as part of guc
-guc.o: guc-file.c
-
-# Note: guc-file.c is not deleted by 'make clean',
-# since we want to ship it in distribution tarballs.
-clean:
-	@rm -f lex.yy.c
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6fe1939881..a866503186 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -41,6 +41,7 @@
 #include "commands/vacuum.h"
 #include "commands/variable.h"
 #include "commands/trigg

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2018-12-27 Thread Alexey Kondratov

Hi,

Thank you all for replies.


ALTER TABLE already has a lot of logic that is oriented towards being
able to do multiple things at the same time.  If we added CLUSTER,
VACUUM FULL, and REINDEX to that set, then you could, say, change a
data type, cluster, and change tablespaces all in a single SQL
command.

That's a great observation.


Indeed, I thought that ALTER TABLE executes all actions sequentially one 
by one, e.g. in the case of


ALTER TABLE test_int CLUSTER ON test_int_idx, SET TABLESPACE test_tblspc;

it executes CLUSTER and THEN executes SET TABLESPACE. However, if I get 
it right, ALTER TABLE is rather smart, so in such a case it follows the 
steps:


1) Only saves new tablespace Oid during prepare phase 1 without actual work;

2) Only executes mark_index_clustered during phase 2, again without 
actual work done;


3) And finally rewrites relation during phase 3, where CLUSTER and SET 
TABLESPACE are effectively performed.



That would be cool, but probably a lot of work.  :-(

But is it?  ALTER TABLE is already doing one kind of table rewrite
during phase 3, and CLUSTER is just a different kind of table rewrite
(which happens to REINDEX), and VACUUM FULL is just a special case of
CLUSTER.  Maybe what we need is an ALTER TABLE variant that executes
CLUSTER's table rewrite during phase 3 instead of its ad-hoc table
rewrite.


According to the ALTER TABLE example above, it is already exist for CLUSTER.


As for REINDEX, I think it's valuable to move tablespace together with
the reindexing.  You can already do it with the CREATE INDEX
CONCURRENTLY recipe we recommend, of course; but REINDEX CONCURRENTLY is
not going to provide that, and it seems worth doing.


Maybe I am missing something, but according to the docs REINDEX 
CONCURRENTLY does not exist yet, DROP then CREATE CONCURRENTLY is 
suggested instead. Thus, we have to add REINDEX CONCURRENTLY first, but 
it is a matter of different patch, I guess.



Even for plain REINDEX that seems useful.
--
Michael


To summarize:

1) Alvaro and Michael agreed, that REINDEX with tablespace move may be 
useful. This is done in the patch attached to my initial email. Adding 
REINDEX to ALTER TABLE as new action seems quite questionable for me and 
not completely semantically correct. ALTER already looks bulky.


2) If I am correct, 'ALTER TABLE ... CLUSTER ON ..., SET TABLESPACE ...' 
does exactly what I wanted to add to CLUSTER in my patch. So probably no 
work is necessary here.


3) VACUUM FULL. It seems, that we can add special case 'ALTER TABLE ... 
VACUUM FULL, SET TABLESPACE ...', which will follow relatively the same 
path as with CLUSTER ON, but without any specific index. Relation should 
be rewritten in the new tablespace during phase 3.


What do you think?


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




[Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-10-19 Thread Alexey Kondratov

Hi hackers,

Currently Postgres has options for continuous WAL files archiving, which 
is quite often used along with master-replica setup. OK, then the worst 
is happened and it's time to get your old master back and synchronize it 
with new master (ex-replica) with pg_rewind. However, required WAL files 
may be already archived and pg_rewind will fail. You can copy these 
files manually, but it is difficult to calculate, which ones you need. 
Anyway, it complicates building failover system with automatic failure 
recovery.


I expect, that it will be a good idea to allow pg_rewind to look for a 
restore_command in the target data directory recovery.conf or pass it is 
as a command line argument. Then pg_rewind can use it to get missing WAL 
files from the archive. I had a few talks with DBAs and came to 
conclusion, that this is a highly requested feature.


I prepared a proof of concept patch (please, find attached), which does 
exactly what I described above. I played with it a little and it seems 
to be working, tests were accordingly updated to verify this archive 
retrieval functionality too.


Patch is relatively simple excepting the one part: if we want to parse 
recovery.conf (with all possible includes, etc.) and get 
restore_command, then we should use guc-file.l parser, which is heavily 
linked to backend, e.g. in error reporting part. So I copied it and made 
frontend-safe version guc-file-fe.l. Personally, I don't think it's a 
good idea, but nothing else came to mind. It is also possible to leave 
the only one option -- passing restore_command as command line argument.


What do you think?


--

Alexey Kondratov

Postgres Professional: https://www.postgrespro.com

Russian Postgres Company

diff --combined src/bin/pg_rewind/Makefile
index a22fef1352,2bcfcc61af..00
--- a/src/bin/pg_rewind/Makefile
+++ b/src/bin/pg_rewind/Makefile
@@@ -20,7 -20,6 +20,7 @@@ LDFLAGS_INTERNAL += $(libpq_pgport
  
  OBJS	= pg_rewind.o parsexlog.o xlogreader.o datapagemap.o timeline.o \
  	fetch.o file_ops.o copy_fetch.o libpq_fetch.o filemap.o logging.o \
 +	guc-file-fe.o \
  	$(WIN32RES)
  
  EXTRA_CLEAN = xlogreader.c
diff --combined src/bin/pg_rewind/RewindTest.pm
index 8dc39dbc05,1dce56d035..00
--- a/src/bin/pg_rewind/RewindTest.pm
+++ b/src/bin/pg_rewind/RewindTest.pm
@@@ -40,7 -40,6 +40,7 @@@ use Config
  use Exporter 'import';
  use File::Copy;
  use File::Path qw(rmtree);
 +use File::Glob;
  use IPC::Run qw(run);
  use PostgresNode;
  use TestLib;
@@@ -249,41 -248,6 +249,41 @@@ sub run_pg_rewin
  "--no-sync"
  			],
  			'pg_rewind remote');
 +	}
 +	elsif ($test_mode eq "archive")
 +	{
 +
 +		# Do rewind using a local pgdata as source and
 +		# specified directory with target WALs archive.
 +		my $wals_archive_dir = "${TestLib::tmp_check}/master_wals_archive";
 +		my $test_master_datadir = $node_master->data_dir;
 +		my @wal_files = glob "$test_master_datadir/pg_wal/000*";
 +		my $restore_command;
 +
 +		rmtree($wals_archive_dir);
 +		mkdir($wals_archive_dir) or die;
 +
 +		# Move all old master WAL files to the archive.
 +		# Old master should be stopped at this point.
 +		foreach my $wal_file (@wal_files)
 +		{
 +			move($wal_file, "$wals_archive_dir/") or die;
 +		}
 +
 +		$restore_command = "cp $wals_archive_dir/\%f \%p";
 +
 +		# Stop the new master and be ready to perform the rewind.
 +		$node_standby->stop;
 +		command_ok(
 +			[
 +'pg_rewind',
 +"--debug",
 +"--source-pgdata=$standby_pgdata",
 +"--target-pgdata=$master_pgdata",
 +"--no-sync",
 +"-R", $restore_command
 +			],
 +			'pg_rewind archive');
  	}
  	else
  	{
diff --combined src/bin/pg_rewind/parsexlog.c
index 11a9c26cd2,40028471bf..00
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@@ -12,7 -12,6 +12,7 @@@
  #include "postgres_fe.h"
  
  #include 
 +#include 
  
  #include "pg_rewind.h"
  #include "filemap.h"
@@@ -46,10 -45,7 +46,10 @@@ static char xlogfpath[MAXPGPATH]
  typedef struct XLogPageReadPrivate
  {
  	const char *datadir;
 +	const char *restoreCommand;
  	int			tliIndex;
 +	XLogRecPtr  oldrecptr;
 +	TimeLineID  oldtli;
  } XLogPageReadPrivate;
  
  static int SimpleXLogPageRead(XLogReaderState *xlogreader,
@@@ -57,10 -53,6 +57,10 @@@
     int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
     TimeLineID *pageTLI);
  
 +static bool RestoreArchivedWAL(const char *path, const char *xlogfname, 
 +	off_t expectedSize, const char *restoreCommand,
 +	const char *lastRestartPointFname);
 +
  /*
   * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline
   * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of
@@@ -68,19 -60,15 +68,19 @@@
   */
  void
  extract

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-10-22 Thread Alexey Kondratov

Hi Andrey,

Thank you for your reply.


I think it is better to load restore_command from recovery.conf.
Yes, it seems to be the most native way. That's why I needed this 
rewritten (mostly copy-pasted) frontend-safe version of parser (guc-file.l).


I didn't actually try patch yet, but the idea seems interesting. Will 
you add it to the commitfest?
I am willing to add it to the November commitfest, but I have some 
concerns regarding frontend version of GUC parser. Probably, it is 
possible to refactor guc-file.l to use it on both front- and backend. 
However, it requires usage of IFDEF and mocking up ereport for frontend, 
which is a bit ugly.



--
Alexey Kondratov

Postgres Professional: https://www.postgrespro.com
Russian Postgres Company






Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-10-25 Thread Alexey Kondratov

On 22.10.2018 20:19, Alvaro Herrera wrote:

I didn't actually try patch yet, but the idea seems interesting. Will
you add it to the commitfest?

I am willing to add it to the November commitfest, but I have some concerns
regarding frontend version of GUC parser. Probably, it is possible to
refactor guc-file.l to use it on both front- and backend. However, it
requires usage of IFDEF and mocking up ereport for frontend, which is a bit
ugly.

Hmm, I remember we had a project to have a new postmaster option that
would report the value of some GUC option, so instead of parsing the
file in the frontend, you'd invoke the backend to do the parsing.  But I
don't know what became of that ...


Brief searching in the mailing list doesn't return something relevant, 
but the project seems to be pretty straightforward at first sight.

Of course, recovery.conf options are not GUCs either ... that's another
pending patch.

We do have some backend-mock for frontends, e.g. in pg_waldump; plus
palloc is already implemented in libpgcommon.  I don't know if what you
need to compile the lexer is a project much bigger than finishing the
other two patches I mention.


This thing, in opposite, is a long-living, there are several threads 
starting from the 2011th. I have found Michael's, Simon's, Fujii's 
patches and Greg Smith's proposal (see, e.g. [1, 2]). If I get it right, 
the main point is that if we turn all options in the recovery.conf into 
GUCs, then it becomes possible to set them inside postgresql.conf and 
get rid of recovery.conf. However, it ruins backward compatibility and 
brings some other issues noted by Heikki 
https://www.postgresql.org/message-id/5152f778.2070...@vmware.com, while 
keeping both options is excess and ambiguous.


Thus, though everyone agreed that recovery.conf options should be turned 
into GUCs, there is still no consensus in details. I don't think that I 
know Postgres architecture enough to start this discussion again, but 
thank you for pointing me in this direction, it was quite interesting 
from the historical perspective.


I will check guc-file.l again, maybe it is not so painful to make it 
frontend-safe too.



[1] 
https://www.postgresql.org/message-id/flat/CAHGQGwHi%3D4GV6neLRXF7rexTBkjhcAEqF9_xq%2BtRvFv2bVd59w%40mail.gmail.com


[2] 
https://www.postgresql.org/message-id/flat/CA%2BU5nMKyuDxr0%3D5PSen1DZJndauNdz8BuSREau%3DScN-7DZ9acA%40mail.gmail.com


--
Alexey Kondratov

Postgres Professional:https://www.postgrespro.com
Russian Postgres Company




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-10-29 Thread Alexey Kondratov

Hi Andrey,


Will you add this patch to CF?
I'm going to review it.

Best regards, Andrey Borodin


Here it is https://commitfest.postgresql.org/20/1849/


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-10-29 Thread Alexey Kondratov

Something that we could think about is directly to provide a command to
pg_rewind via command line.


In my patch I added this option too. One can pass restore_command via -R 
option, e.g.:


pg_rewind -P --target-pgdata=/path/to/master/pg_data 
--source-pgdata=/path/to/standby/pg_data -R 'cp /path/to/wals_archive/%f %p'



Another possibility would be to have a
separate tool which scans a data folder and fetches by itself a range of
WAL segments wanted.


Currently in the patch, with dry-run option (-n) pg_rewind only fetches 
missing WALs to be able to build file map, while doesn't touch any data 
files. So I guess it behaves exactly as you described and we do not need 
a separate tool.



--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2018-11-07 Thread Alexey Kondratov

On 30.10.2018 06:01, Michael Paquier wrote:


On Mon, Oct 29, 2018 at 12:09:21PM +0300, Alexey Kondratov wrote:

Currently in the patch, with dry-run option (-n) pg_rewind only fetches
missing WALs to be able to build file map, while doesn't touch any data
files. So I guess it behaves exactly as you described and we do not need a
separate tool.

Makes sense perhaps.  Fetching only WAL segments which are needed for
the file map is critical, as you don't want to spend bandwidth for
nothing.  Now, I look at your patch, and I can see things to complain
about, at least three at short glance:
- The TAP test added will fail on Windows.


Thank you for this. Build on Windows has been broken as well. I fixed it 
in the new version of patch, please find attached.



- Simply copy-pasting RestoreArchivedWAL() from the backend code to
pg_rewind is not an acceptable option.  You don't care about %r either
in this case.


According to the docs [1] %r is a valid alias and may be used in 
restore_command too, so if we take restore_command from recovery.conf it 
might be there. If we just drop it, then restore_command may stop 
working. Though I do not know real life examples of restore_command with 
%r, we should treat it in expected way (as backend does), of course if 
we want an option to take it from recovery.conf.



- Reusing the GUC parser is something I would avoid as well.  Not worth
the complexity.


Yes, I don't like it either. I will try to make guc-file.l frontend safe.

[1] https://www.postgresql.org/docs/11/archive-recovery-settings.html

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

diff --git a/src/bin/pg_rewind/Makefile b/src/bin/pg_rewind/Makefile
index 2bcfcc61af2e..a22fef1352b9 100644
--- a/src/bin/pg_rewind/Makefile
+++ b/src/bin/pg_rewind/Makefile
@@ -20,6 +20,7 @@ LDFLAGS_INTERNAL += $(libpq_pgport)
 
 OBJS	= pg_rewind.o parsexlog.o xlogreader.o datapagemap.o timeline.o \
 	fetch.o file_ops.o copy_fetch.o libpq_fetch.o filemap.o logging.o \
+	guc-file-fe.o \
 	$(WIN32RES)
 
 EXTRA_CLEAN = xlogreader.c
diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm
index 1dce56d0352e..a5499c7027b1 100644
--- a/src/bin/pg_rewind/RewindTest.pm
+++ b/src/bin/pg_rewind/RewindTest.pm
@@ -39,7 +39,9 @@ use Carp;
 use Config;
 use Exporter 'import';
 use File::Copy;
-use File::Path qw(rmtree);
+use File::Glob ':bsd_glob';
+use File::Path qw(remove_tree make_path);
+use File::Spec::Functions 'catpath';
 use IPC::Run qw(run);
 use PostgresNode;
 use TestLib;
@@ -249,6 +251,48 @@ sub run_pg_rewind
 			],
 			'pg_rewind remote');
 	}
+	elsif ($test_mode eq "archive")
+	{
+
+		# Do rewind using a local pgdata as source and
+		# specified directory with target WALs archive.
+		my $wals_archive_dir = catpath(${TestLib::tmp_check}, 'master_wals_archive');
+		my $test_master_datadir = $node_master->data_dir;
+		my @wal_files = bsd_glob catpath($test_master_datadir, 'pg_wal', '000*');
+		my $restore_command;
+
+		remove_tree($wals_archive_dir);
+		make_path($wals_archive_dir) or die;
+
+		# Move all old master WAL files to the archive.
+		# Old master should be stopped at this point.
+		foreach my $wal_file (@wal_files)
+		{
+			move($wal_file, "$wals_archive_dir/") or die;
+		}
+
+		if ($windows_os)
+		{
+			$restore_command = "copy $wals_archive_dir\\\%f \%p";
+		}
+		else
+		{
+			$restore_command = "cp $wals_archive_dir/\%f \%p";
+		}
+
+		# Stop the new master and be ready to perform the rewind.
+		$node_standby->stop;
+		command_ok(
+			[
+'pg_rewind',
+"--debug",
+"--source-pgdata=$standby_pgdata",
+"--target-pgdata=$master_pgdata",
+"--no-sync",
+"-R", $restore_command
+			],
+			'pg_rewind archive');
+	}
 	else
 	{
 
diff --git a/src/bin/pg_rewind/guc-file-fe.h b/src/bin/pg_rewind/guc-file-fe.h
new file mode 100644
index ..cf480b806ae5
--- /dev/null
+++ b/src/bin/pg_rewind/guc-file-fe.h
@@ -0,0 +1,40 @@
+#ifndef PG_REWIND_GUC_FILE_FE_H
+#define PG_REWIND_GUC_FILE_FE_H
+
+#include "c.h"
+
+#define RECOVERY_COMMAND_FILE	"recovery.conf"
+
+/*
+ * Parsing the configuration file(s) will return a list of name-value pairs
+ * with source location info.  We also abuse this data structure to carry
+ * error reports about the config files.  An entry reporting an error will
+ * have errmsg != NULL, and might have NULLs for name, value, and/or filename.
+ *
+ * If "ignore" is true, don't attempt to apply the item (it might be an error
+ * report, or an item we determined to be duplicate).  "applied" is set true
+ * if we successfully applied, or could have applied, the setting.
+ */
+typedef struct ConfigVariable
+{
+	char	   *name;
+	char	   *value;
+	char	   

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-12-01 Thread Alexey Kondratov
On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera  wrote:
> On a *very* quick look, please use an enum to return from NextCopyFrom
> rather than 'int'.  The chunks that change bool to int are very
> odd-looking.  This would move the comment that explains the value from
> copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII dashes
> in the descriptions of those values; please don't.

I will fix it, thank you.

>
> Or maybe I misunderstood the patch completely.
>

I hope so. Here is my thoughts how it all works, please correct me,
where I am wrong:

1) First, I have simply changed ereport level to WARNING for specific
validations (extra or missing columns, etc) if INGONE_ERRORS option is
used. All these checks are inside NextCopyFrom. Thus, this patch
performs here pretty much the same as before, excepting that it is
possible to skip bad lines, and this part should be safe as well

2) About PG_TRY/CATCH. I use it to catch the only one specific
function call inside NextCopyFrom--it is InputFunctionCall--which is
used just to parse datatype from the input string. I have no idea how
WAL write or trigger errors could get here

All of these is done before actually forming a tuple, putting it into
the heap, firing insert-related triggers, etc. I am not trying to
catch all errors during the row processing, only input data errors. So
why is it unsafe?


Best,

Alexey



Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-07-01 Thread Alexey Kondratov

Hi Thomas,

On 01.07.2019 15:02, Thomas Munro wrote:


Hi Alexey,

This no longer applies.  Since the Commitfest is starting now, could
you please rebase it?


Thank you for a reminder. Rebased version of the patch is attached. I've 
also modified my logging code in order to obey new unified logging 
system for command-line programs commited by Peter (cc8d415117).



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From f5f359274322020c2338b5b494f6327eaa61c0e1 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v7] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 -
 src/bin/pg_rewind/parsexlog.c | 164 +-
 src/bin/pg_rewind/pg_rewind.c |  92 ++-
 src/bin/pg_rewind/pg_rewind.h |   6 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 -
 8 files changed, 371 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 4d91eeb0ff..746c07e4df 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
+   files might no longer be present. In that case, they can be automatically
+   copied by pg_rewind from the WAL archive to the 
+   pg_wal directory if either -r or
+   -R option is specified, or
fetched on startup by configuring  or
.  The use of
pg_rewind is not limited to failover, e.g.  a standby
@@ -202,6 +204,30 @@ PostgreSQL documentation
   
  
 
+ 
+  -r
+  --use-postgresql-conf
+  
+   
+Use restore_command in the postgresql.conf to
+retrieve missing in the target pg_wal directory
+WAL files from the WAL archive.
+   
+  
+ 
+
+ 
+  -R restore_command
+  --restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieval of the missing
+in the target pg_wal directory WAL files from
+the WAL archive.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index 287af60c4e..d1de08320c 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -12,6 +12,7 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
 
 #include "pg_rewind.h"
 #include "filemap.h"
@@ -44,6 +45,7 @@ static char xlogfpath[MAXPGPATH];
 typedef struct XLogPageReadPrivate
 {
 	const char *datadir;
+	const char *restoreCommand;
 	int			tliIndex;
 } XLogPageReadPrivate;
 
@@ -52,6 +54,9 @@ static int	SimpleXLogPageRead(XLogReaderState *xlogreader,
 			   int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
 			   TimeLineID *pageTLI);
 
+static int RestoreArchivedWAL(const char *path, const char *xlogfname,
+   off_t expectedSize, const char *restoreCommand);
+
 /*
  * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline
  * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of
@@ -59,7 +64,7 @@ static int	SimpleXLogPageRead(XLogReaderState *xlogreader,
  */
 void
 extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex,
-			   XLogRecPtr endpoint)
+			   XLogRecPtr endpoint, const char *restore_command)
 {
 	XLogRecord *record;
 	XLogReaderState *xlogreader;
@@ -68,6 +73,7 @@ extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex,
 
 	private.datadir = datadir;
 	private.tliIndex = tliIndex;
+	private.restoreCommand = restore_command;
 	xlogreader = XLogReaderAllocate(WalSegSz, &SimpleXLogPageRead,
 	&private);
 	if (xlogreader == NULL)
@@ -155,7 +161,7 @@ readOneRecord(const char *datadir, XLogRecPtr ptr, int tliIndex)
 void
 findLastCheckpoint(const char *datadir, XLogRecPtr forkp

Fix two issues after moving to unified logging system for command-line utils

2019-07-01 Thread Alexey Kondratov

Hi hackers,

I have found two minor issues with unified logging system for 
command-line programs (commited by Peter cc8d415117), while was rebasing 
my pg_rewind patch:


1) forgotten new-line symbol in pg_fatal call inside pg_rewind, which 
will cause the following Assert in common/logging.c to fire


Assert(fmt[strlen(fmt) - 1] != '\n');

It seems not to be a problem for a production Postgres installation 
without asserts, but should be removed for sanity.


2) swapped progname <-> full_path in initdb.c setup_bin_paths's call 
[1], while logging message remained the same. So the output will be 
rather misleading, since in the pg_ctl and pg_dumpall the previous order 
is used.


Attached is a small patch that fixes these issues.

[1] 
https://github.com/postgres/postgres/commit/cc8d41511721d25d557fc02a46c053c0a602fed0#diff-c4414062a0071ec15df504d39a6df705R2500




Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 2ea4a17ecc8f9bd57bb676f684fb729279339534 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 1 Jul 2019 18:11:25 +0300
Subject: [PATCH v1] Fix usage of unified logging pg_log_* in pg_rewind and
 initdb

---
 src/bin/initdb/initdb.c   | 2 +-
 src/bin/pg_rewind/pg_rewind.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2ef179165b..70273be783 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2497,7 +2497,7 @@ setup_bin_paths(const char *argv0)
 			pg_log_error("The program \"postgres\" is needed by %s but was not found in the\n"
 		 "same directory as \"%s\".\n"
 		 "Check your installation.",
-		 full_path, progname);
+		 progname, full_path);
 		else
 			pg_log_error("The program \"postgres\" was found by \"%s\"\n"
 		 "but was not the same version as %s.\n"
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 6e77201be6..d378053de4 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -555,7 +555,7 @@ getTimelineHistory(ControlFileData *controlFile, int *nentries)
 		else if (controlFile == &ControlFile_target)
 			histfile = slurpFile(datadir_target, path, NULL);
 		else
-			pg_fatal("invalid control file\n");
+			pg_fatal("invalid control file");
 
 		history = rewind_parseTimeLineHistory(histfile, tli, nentries);
 		pg_free(histfile);

base-commit: 95bbe5d82e428db342fa3ec60b95f1b9873741e5
-- 
2.17.1



Re: Conflict handling for COPY FROM

2019-07-02 Thread Alexey Kondratov

On 28.06.2019 16:12, Alvaro Herrera wrote:

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund  wrote:

Or even just return it as a row. CopyBoth is relatively widely supported
these days.

i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

Warnings seem useless for this purpose.  I'm with Andres: returning rows
would make this a fine feature.  If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.


I agree with previous commentators that returning rows will make this 
feature more versatile. Though, having a possibility to simply skip 
conflicting/malformed rows is worth of doing from my perspective. 
However, pushing every single skipped row to the client as a separated 
WARNING will be too much for a bulk import. So maybe just overall stats 
about skipped rows number will be enough?


Also, I would prefer having an option to ignore all errors, e.g. with 
option ERROR_LIMIT set to -1. Because it is rather difficult to estimate 
a number of future errors if you are playing with some badly structured 
data, while always setting it to 100500k looks ugly.


Anyway, below are some issues with existing code after a brief review of 
the patch:


1) Calculation of processed rows isn't correct (I've checked). You do it 
in two places, and


-            processed++;
+            if (!cstate->error_limit)
+                processed++;

is never incremented if ERROR_LIMIT is specified and no errors 
occurred/no constraints exist, so the result will always be 0. However, 
if primary column with constraints exists, then processed is calculated 
correctly, since another code path is used:


+                        if (specConflict)
+                        {
+                            ...
+                        }
+                        else
+                            processed++;

I would prefer this calculation in a single place (as it was before 
patch) for simplicity and in order to avoid such problems.



2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT 
is specified and was exceeded, which doesn't seem to be correct, does it?


-                        if (resultRelInfo->ri_NumIndices > 0)
+                        if (resultRelInfo->ri_NumIndices > 0 && 
cstate->error_limit == 0)

                         recheckIndexes = ExecInsertIndexTuples(myslot,


3) Trailing whitespaces added to error messages and tests for some reason:

+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("skipping \"%s\" --- missing data 
for column \"%s\" ",


+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
 CONTEXT:  COPY x, line 1: "2000    230    23    23"

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
 CONTEXT:  COPY x, line 1: "2001    231    \N    \N"


Otherwise, the patch applies/compiles cleanly and regression tests are 
passed.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-08-01 Thread Alexey Kondratov

On 26.07.2019 20:43, Liudmila Mantrova wrote:


I would like to suggest a couple of changes to docs and comments, 
please see the attachment.
The "...or fetched on startup" part also seems wrong here, but it's 
not a part of your patch, so I'm going to ask about it on psql-docs 
separately.


Agreed, thank you a lot! Yes, "...or fetched on startup" looks a bit 
confusing for me, since the whole paragraph is about target server 
before running pg_rewind, but this statement is more about target server 
started first time after running pg_rewind, which is discussed in the 
next paragraph.




It might also be useful to reword the following error messages:
- "using restored from archive version of file \"%s\""
- "could not open restored from archive file \"%s\"
We could probably say something like "could not open file \"%s\" 
restored from WAL archive" instead.


I have reworded these and some similar messages, thanks. New patch with 
changed messages is attached.




On a more general note, I wonder if everyone is happy with the 
--using-postgresql-conf option name, or we should continue searching 
for a narrower term. Unfortunately, I don't have any better 
suggestions right now, but I believe it should be clear that its 
purpose is to fetch missing WAL files for target. What do you think?




I don't like it either, but this one was my best guess then. Maybe 
--restore-target-wal instead of --using-postgresql-conf will be better? 
And --target-restore-command instead of --restore-command if we want to 
specify that this is restore_command for target server?



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 328ed78356e2b270ffe4c84baa462eb6b8e6befb Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v9] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  49 +++-
 src/bin/pg_rewind/parsexlog.c | 164 +-
 src/bin/pg_rewind/pg_rewind.c |  92 ++-
 src/bin/pg_rewind/pg_rewind.h |   6 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 -
 8 files changed, 386 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 52a1caa246..d5a14a2e08 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -66,11 +66,12 @@ PostgreSQL documentation
can be found either on the target timeline, the source timeline, or their common
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
-   target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
-   fetched on startup by configuring  or
-   .  The use of
+   target cluster ran for a long time after the divergence, its old WAL
+   files might no longer be present. In this case, you can manually copy them
+   from the WAL archive to the pg_wal directory, or run
+   pg_rewind with the -r or
+   -R option to automatically retrieve them from the WAL
+   archive. The use of
pg_rewind is not limited to failover, e.g.  a standby
server can be promoted, run some write transactions, and then rewinded
to become a standby again.
@@ -202,6 +203,39 @@ PostgreSQL documentation
   
  
 
+ 
+  -r
+  --use-postgresql-conf
+  
+   
+Use the restore_command defined in
+postgresql.conf to retrieve WAL files from
+the WAL archive if these files are no longer available in the
+pg_wal directory of the target cluster.
+   
+   
+This option cannot be used together with --restore-command.
+   
+  
+ 
+
+ 
+  -R restore_command
+  --restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieving
+WAL files from the WAL archive if these files are no longer available
+in the pg_wal directory of the target cluster.
+   
+   
+If restore

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-08-28 Thread Alexey Kondratov
---
|    Stream + spill   |
---
| 1kk | 5.9   | 18   | x3 |
---
| 3kk | 19.5  | 52.4 | x2.7   |
---
| 5kk | 33.3  | 86.7 | x2.86  |
---
|    Stream + BGW pool    |
---
| 1kk | 6 | 12   | x2 |
---
| 3kk | 18.5  | 30.5 | x1.65  |
---
| 5kk | 35.6  | 53.9 | x1.51  |
---

It seems that overhead added by synchronous replica is lower by 2-3 
times compared with Postgres master and streaming with spilling. 
Therefore, the original patch eliminated delay before large transaction 
processing start by sender, while this additional patch speeds up the 
applier side.


Although the overall speed up is surely measurable, there is a room for 
improvements yet:


1) Currently bgworkers are only spawned on demand without some initial 
pool and never stopped. Maybe we should create a small pool on 
replication start and offload some of idle bgworkers if they exceed some 
limit?


2) Probably we can track somehow that incoming change has conflicts with 
some of being processed xacts, so we can wait for specific bgworkers 
only in that case?


3) Since the communication between main logical apply worker and each 
bgworker from the pool is a 'single producer --- single consumer' 
problem, then probably it is possible to wait and set/check flags 
without locks, but using just atomics.


What do you think about this concept in general? Any concerns and 
criticism are welcome!



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

P.S. This patch shloud be applicable to your last patch set. I would rebase it 
against master, but it depends on 2pc patch, that I don't know well enough.

>From 11c7549d2732f2f983d4548a81cd509dd7e41ec4 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 28 Aug 2019 15:26:50 +0300
Subject: [PATCH 11/11] BGWorkers pool for streamed transactions apply without
 spilling on disk

---
 src/backend/postmaster/bgworker.c|3 +
 src/backend/postmaster/pgstat.c  |3 +
 src/backend/replication/logical/proto.c  |   17 +-
 src/backend/replication/logical/worker.c | 1780 +++---
 src/include/pgstat.h |1 +
 src/include/replication/logicalproto.h   |4 +-
 src/include/replication/logicalworker.h  |1 +
 7 files changed, 933 insertions(+), 876 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f5db5a8c4a..6860df07ca 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -129,6 +129,9 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"LogicalApplyBgwMain", LogicalApplyBgwMain
 	}
 };
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e5a4d147a7..b32994784f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3637,6 +3637,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_HASH_GROW_BUCKETS_REINSERTING:
 			event_name = "Hash/GrowBuckets/Reinserting";
 			break;
+		case WAIT_EVENT_LOGICAL_APPLY_WORKER_READY:
+			event_name = "LogicalApplyWorkerReady";
+			break;
 		case WAIT_EVENT_LOGICAL_SYNC_DATA:
 			event_name = "LogicalSyncData";
 			break;
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 4bec9fe8b5..954ce7343a 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -789,14 +789,11 @@ logicalrep_write_stream_commit(StringInfo out, ReorderBufferTXN *txn,
 	pq_sendint64(out, txn->commit_time);
 }
 
-TransactionId
+void
 logicalrep_read_stream_commit(StringInfo in, LogicalRepCommitData *commit_data)
 {
-	TransactionId	xid;
 	uint8			flags;
 
-	xid = pq_getmsgint(in, 4);
-
 	/* read flags (unused for now) */
 	flags = pq_getmsgbyte(in);
 
@@ -807,8 +804,6 @@ logicalrep_read_stream_commit(StringInfo in, LogicalRepCommitData *commit_data)
 	commit_data->commit_lsn = pq_getmsgint64(in);
 	commit_data->end_lsn = pq_getmsgint64(in);
 	commit_data->committime = pq_getmsgint64(in);
-
-	return xid;
 }
 
 void
@@ -823,13 +818,3 @@ logicalrep_

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-08-29 Thread Alexey Kondratov
all xacts have been commited on 
master since the streamed one started, because we do not start streaming 
immediately, but only after logical_work_mem hit. I have performed some 
tests with conflicting xacts and it seems that it's not a problem, since 
locking mechanism in Postgres guarantees that if there would some 
deadlocks, they will happen earlier on master. So if some records hit 
the WAL, it is safe to apply the sequentially. Am I wrong?


Anyway, I'm going to double check the safety of this part later.


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Probably misleading comments or lack of tests in autoHeld portals management

2019-02-26 Thread Alexey Kondratov

Hi hackers,

I am trying to figure out current cursors/portals management and life 
cycle in Postgres. There are two if conditions for autoHeld portals:


- 'if (portal->autoHeld)' inside AtAbort_Portals at portalmem.c:802;
- '|| portal->autoHeld' inside AtCleanup_Portals at portalmem.c:871.

Their removal does not seem to affect anything, make check-world is 
passed. I have tried configure --with-perl/--with-python, which should 
be a case for autoHeld portals, but nothing changed.


For me it seems to be expectable, since autoHeld flag is always set 
along with createSubid=InvalidSubTransactionId inside HoldPinnedPortals, 
so the only one check 'createSubid == InvalidSubTransactionId' should be 
enough. However, comment sections are rather misleading:


(1) portal.h:126 confirms my guess 'If the portal is held over from a 
previous transaction, both subxids are InvalidSubTransactionId';


(2) while portalmem.c:797 states 'This is similar to the case of a 
cursor from a previous transaction, but it could also be that the cursor 
was auto-held in this transaction, so it wants to live on'.


I have tried, but could not build an example of valid query for the case 
described in (2), and it is definitely absent in regression tests.


Am I missing something?

Added Peter to cc, since he is a commiter of 056a5a3, where autoHeld has 
been introduced. Maybe it will be easier for him to recall the context. 
Anyway, sorry for disturb if this question is actually trivial.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index a92b4541bd..841d88df76 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -798,8 +798,6 @@ AtAbort_Portals(void)
 		 * cursor from a previous transaction, but it could also be that the
 		 * cursor was auto-held in this transaction, so it wants to live on.
 		 */
-		if (portal->autoHeld)
-			continue;
 
 		/*
 		 * If it was created in the current transaction, we can't do normal
@@ -868,7 +866,7 @@ AtCleanup_Portals(void)
 		 * Do nothing to cursors held over from a previous transaction or
 		 * auto-held ones.
 		 */
-		if (portal->createSubid == InvalidSubTransactionId || portal->autoHeld)
+		if (portal->createSubid == InvalidSubTransactionId)
 		{
 			Assert(portal->status != PORTAL_ACTIVE);
 			Assert(portal->resowner == NULL);


Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-03-07 Thread Alexey Kondratov

On 07.03.2019 10:26, David Steele wrote:

On 3/6/19 5:38 PM, Andrey Borodin wrote:


The new patch is much smaller (less than 400 lines) and works as 
advertised.

There's a typo "retreive" there.


Ough, corrected this in three different places. Not my word, definitely. 
Thanks!




These lines look a little suspicious:
    char  postgres_exec_path[MAXPGPATH],
  postgres_cmd[MAXPGPATH],
  cmd_output[MAX_RESTORE_COMMAND];
Is it supposed to be any difference between MAXPGPATH and 
MAX_RESTORE_COMMAND?




Yes, it was supposed to be, but after your message I have double checked 
everything and figured out that we use MAXPGPATH for final 
restore_command build (with all aliases replaced). Thus, there is no 
need in a separated constant. I have replaced it with MAXPGPATH.




This patch appears to need attention from the author so I have marked 
it Waiting on Author.




I hope I have addressed all issues in the new patch version which is 
attached. Also, I have added more detailed explanation of new 
functionality into the multi-line commit-message.



Regards,

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 9770cab4909a3cd98c2db2b8a9fa4af1fedd4614 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v5] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 -
 src/bin/pg_rewind/parsexlog.c | 161 +-
 src/bin/pg_rewind/pg_rewind.c |  96 ++-
 src/bin/pg_rewind/pg_rewind.h |   7 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 +-
 8 files changed, 370 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a64ee29e..90e3f22f97 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
+   files might no longer be present. In that case, they can be automatically
+   copied by pg_rewind from the WAL archive to the 
+   pg_wal directory if either -r or
+   -R option is specified, or
fetched on startup by configuring  or
.  The use of
pg_rewind is not limited to failover, e.g.  a standby
@@ -200,6 +202,30 @@ PostgreSQL documentation
   
  
 
+ 
+  -r
+  --use-postgresql-conf
+  
+   
+Use restore_command in the postgresql.conf to
+retrieve missing in the target pg_wal directory
+WAL files from the WAL archive.
+   
+  
+ 
+
+ 
+  -R restore_command
+  --restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieval of the missing
+in the target pg_wal directory WAL files from
+the WAL archive.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index e19c265cbb..6be6dab7e0 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -12,6 +12,7 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
 
 #include "pg_rewind.h"
 #include "filemap.h"
@@ -45,6 +46,7 @@ static char xlogfpath[MAXPGPATH];
 typedef struct XLogPageReadPrivate
 {
 	const char *datadir;
+	const char *restoreCommand;
 	int			tliIndex;
 } XLogPageReadPrivate;
 
@@ -53,6 +55,9 @@ static int SimpleXLogPageRead(XLogReaderState *xlogreader,
    int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
    TimeLineID *pageTLI);
 
+static int RestoreArchivedWAL(const char *path, const char *xlogfname,
+   off_t expectedSize, const char *restoreCommand);
+
 /*
  * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline
  * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of
@@ -60,7 +65,7 @@ static int SimpleX

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-03-27 Thread Alexey Kondratov

On 26.03.2019 11:19, Michael Paquier wrote:

+ * This is a simplified and adapted to frontend version of
+ * RestoreArchivedFile function from transam/xlogarchive.c
+ */
+static int
+RestoreArchivedWAL(const char *path, const char *xlogfname,
I don't think that we should have duplicates for that, so I would
recommend refactoring the code so as a unique code path is taken by
both, especially since the user can fetch the command from
postgresql.conf.


This comment is here since the beginning of my work on this patch and 
now it is rather misleading.


Even if we does not take into account obvious differences like error 
reporting, different log levels based on many conditions, cleanup 
options, check for standby mode; restore_command execution at backend 
recovery and during pg_rewind has a very important difference. If it 
fails at backend, then as stated in the comment 'Remember, we 
rollforward UNTIL the restore fails so failure here is just part of the 
process' -- it is OK. In opposite, in pg_rewind if we failed to recover 
some required WAL segment, then it definitely means the end of the 
entire process, since we will fail at finding last common checkpoint or 
extracting page map.


The only part we can share is constructing restore_command with aliases 
replacement. However, even in this place the logic is slightly 
different, since we do not need %r alias for pg_rewind. The only use 
case of %r in restore_command I know is pg_standby, which seems to be as 
not a case for pg_rewind. I have tried to move this part to the common, 
but it becomes full of conditions and less concise.


Please, correct me if I am wrong, but it seems that there are enough 
differences to keep this function separated, isn't it?



Why two options?  Wouldn't actually be enough use-postgresql-conf to
do the job?  Note that "postgres" should always be installed if
pg_rewind is present because it is a backend-side utility, so while I
don't like adding a dependency to other binaries in one binary, having
an option to pass out a command directly via the command line of
pg_rewind stresses me more.


I am not familiar enough with DBA scenarios, where -R option may be 
useful, but I was asked a few times for that. I can only speculate that 
for example someone may want to run freshly rewinded cluster as master, 
not replica, so its config may differ from replica's one, where 
restore_command is surely intended to be. Thus, it is easier to leave 
master's config at the place and just specify restore_command as command 
line argument.



Don't we need to worry about signals interrupting the restore command?
It seems to me that some refactoring from the stuff in xlogarchive.c
would be in order.


Thank you for pointing me to this place again. Previously, I thought 
that we should not care about it, since if restore_command was not 
successful due to any reason, then rewind failed, so we will stop and 
exit at upper levels. However, if it was due to a signal, then some of 
next messages may be misleading, if e.g. user manually interrupted it 
for some reason. So that, I added a similar check here as well.


Updated version of patch is attached.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 9e00f7a7696a88f350e1e328a9758ab85631c813 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v6] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 -
 src/bin/pg_rewind/parsexlog.c | 167 +-
 src/bin/pg_rewind/pg_rewind.c |  96 ++-
 src/bin/pg_rewind/pg_rewind.h |   7 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 -
 8 files changed, 376 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a64ee29e..90e3f22f97 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   f

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-02-26 Thread Alexey Kondratov

On 2020-02-26 22:03, Alexander Korotkov wrote:

On Tue, Feb 25, 2020 at 1:48 PM Alexander Korotkov
 wrote:


I think usage of chmod() deserves comment.  As I get default
permissions are sufficient for work, but we need to set them to
satisfy 'check PGDATA permissions' test.



I've added this comment myself.



Thanks for doing it yourself, I was going to answer tonight, but it 
would be obviously too late.




I've also fixes some indentation.
Patch now looks good to me.  I'm going to push it if no objections.



I think that docs should be corrected. Previously Michael was against 
the phrase 'restore_command defined in the postgresql.conf', since it 
also could be defined in any config file included there. We corrected it 
in the pg_rewind --help output, but now docs say:


+Use the restore_command defined in
+postgresql.conf to retrieve WAL files from
+the WAL archive if these files are no longer available in the
+pg_wal directory of the target cluster.

Probably it should be something like:

+Use the restore_command defined in
+the target cluster configuration to retrieve WAL files from
+the WAL archive if these files are no longer available in the
+pg_wal directory.

Here the only text split changed:

-* Ignore restore_command when not in archive recovery (meaning
-* we are in crash recovery).
+	 * Ignore restore_command when not in archive recovery (meaning we are 
in

+* crash recovery).

Should we do so in this patch?

I think that this extra dot at the end is not necessary here:

+		pg_log_debug("using config variable restore_command=\'%s\'.", 
restore_command);


If you agree then attached is a patch with all the corrections above. It 
is made with default git format-patch format, but yours were in a 
slightly different format, so I only was able to apply them with git am 
--patch-format=stgit.



--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company

From fa2fc359dd9852afc608663fa32733e800652ffa Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Tue, 25 Feb 2020 02:22:45 +0300
Subject: [PATCH v17] pg_rewind: Add options to restore WAL files from archive

Currently, pg_rewind fails when it could not find required WAL files in the
target data directory.  One have to manually figure out which WAL files are
required and copy them back from archive.

This commit implements new pg_rewind options, which allow pg_rewind to
automatically retrieve missing WAL files from archival storage. The
restore_command option is read from postgresql.conf.

Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru
Author: Alexey Kondratov
Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera
Reviewed-by: Andres Freund, Alexander Korotkov
---
 doc/src/sgml/ref/pg_rewind.sgml  |  28 --
 src/backend/access/transam/xlogarchive.c |  58 +
 src/bin/pg_rewind/parsexlog.c|  33 ++-
 src/bin/pg_rewind/pg_rewind.c|  77 ++--
 src/bin/pg_rewind/pg_rewind.h|   6 +-
 src/bin/pg_rewind/t/001_basic.pl |   3 +-
 src/bin/pg_rewind/t/RewindTest.pm|  67 +-
 src/common/Makefile  |   2 +
 src/common/archive.c |  97 +
 src/common/fe_archive.c  | 106 +++
 src/include/common/archive.h |  21 +
 src/include/common/fe_archive.h  |  18 
 src/tools/msvc/Mkvcbuild.pm  |   8 +-
 13 files changed, 443 insertions(+), 81 deletions(-)
 create mode 100644 src/common/archive.c
 create mode 100644 src/common/fe_archive.c
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/include/common/fe_archive.h

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 42d29edd4e..64a6942031 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -66,11 +66,11 @@ PostgreSQL documentation
can be found either on the target timeline, the source timeline, or their common
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
-   target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
-   fetched on startup by configuring  or
-   .  The use of
+   target cluster ran for a long time after the divergence, its old WAL
+   files might no longer be present. In this case, you can manually copy them
+   from the WAL archive to the pg_wal directory, or run
+   pg_rewind with the -c option to
+   automatically retrieve them from the WAL archive. The use of
  

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-02-27 Thread Alexey Kondratov

On 2020-02-27 04:52, Michael Paquier wrote:

On Thu, Feb 27, 2020 at 12:43:55AM +0300, Alexander Korotkov wrote:

Regarding text split change, it was made by pgindent.  I didn't notice
it belongs to unchanged part of code.  Sure, we shouldn't include this
into the patch.


I have read through v17 (not tested, sorry), and spotted a couple of
issues that need to be addressed.

+   "--source-pgdata=$standby_pgdata",
+   "--target-pgdata=$master_pgdata",
+   "--no-sync", "--no-ensure-shutdown",
FWIW, I think that perl indenting would reshape this part.  I would
recommend to run src/tools/pgindent/pgperltidy and
./src/tools/perlcheck/pgperlcritic before commit.



Thanks, formatted this part with perltidy. It also has modified 
RecursiveCopy's indents. Pgperlcritic has no complains about this file. 
BTW, being executed on the whole project pgperltidy modifies dozens of 
perl files an even pgindent itself.




+ * Copyright (c) 2020, PostgreSQL Global Development Group
Wouldn't it be better to just use the full copyright here?  I mean the
following:
Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
Portions Copyright (c) 1994, The Regents of the University of 
California




I think so, it contains some older code parts, so it is better to use 
unified copyrights.




+++ b/src/common/archive.c
[...]
+#include "postgres.h"
+
+#include "common/archive.h"
This is incorrect.  All files shared between the backend and the
frontend in src/common/ have to include the following set of headers:
#ifndef FRONTEND
#include "postgres.h"
#else
#include "postgres_fe.h"
#endif

+++ b/src/common/fe_archive.c
[...]
+#include "postgres_fe.h"
This is incomplete.  The following piece should be added:
#ifndef FRONTEND
#error "This file is not expected to be compiled for backend code"
#endif



Fixed both.



+   snprintf(postgres_cmd, sizeof(postgres_cmd), "%s -D %s
-C restore_command",
+postgres_exec_path, datadir_target);
+
I think that this is missing proper quoting.



Yep, added the same quoting as in pg_upgrade/options.



I would rename ConstructRestoreCommand() to BuildRestoreCommand()
while on it..



OK, shorter is better.



I think that it would be saner to check the return status of
ConstructRestoreCommand() in xlogarchive.c as a sanity check, with an
elog(ERROR) if not 0, as that should never happen.



Added.

New version of the patch is attached. Thanks again for your review.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company
From c775a2e40e405474f6ecef35843d276d43fb462f Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Tue, 25 Feb 2020 02:22:45 +0300
Subject: [PATCH v18] pg_rewind: Add options to restore WAL files from archive

Currently, pg_rewind fails when it could not find required WAL files in the
target data directory.  One have to manually figure out which WAL files are
required and copy them back from archive.

This commit implements new pg_rewind options, which allow pg_rewind to
automatically retrieve missing WAL files from archival storage. The
restore_command option is read from postgresql.conf.

Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru
Author: Alexey Kondratov
Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera
Reviewed-by: Andres Freund, Alexander Korotkov
---
 doc/src/sgml/ref/pg_rewind.sgml  |  28 --
 src/backend/access/transam/xlogarchive.c |  60 ++--
 src/bin/pg_rewind/parsexlog.c|  33 ++-
 src/bin/pg_rewind/pg_rewind.c|  77 +++-
 src/bin/pg_rewind/pg_rewind.h|   6 +-
 src/bin/pg_rewind/t/001_basic.pl |   3 +-
 src/bin/pg_rewind/t/RewindTest.pm|  66 +-
 src/common/Makefile  |   2 +
 src/common/archive.c | 102 +
 src/common/fe_archive.c  | 111 +++
 src/include/common/archive.h |  22 +
 src/include/common/fe_archive.h  |  19 
 src/tools/msvc/Mkvcbuild.pm  |   8 +-
 13 files changed, 457 insertions(+), 80 deletions(-)
 create mode 100644 src/common/archive.c
 create mode 100644 src/common/fe_archive.c
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/include/common/fe_archive.h

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 42d29edd4e..64a6942031 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -66,11 +66,11 @@ PostgreSQL documentation
can be found either on the target timeline, the source timeline, or their common
ancestor. In the

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-02-27 Thread Alexey Kondratov

On 2020-02-27 16:41, Alexey Kondratov wrote:


New version of the patch is attached. Thanks again for your review.



Last patch (v18) got a conflict with one of today commits (05d8449e73). 
Rebased version is attached.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company
From ea93b52b298d80aac547735c5917386b37667595 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Tue, 25 Feb 2020 02:22:45 +0300
Subject: [PATCH v19] pg_rewind: Add options to restore WAL files from archive

Currently, pg_rewind fails when it could not find required WAL files in the
target data directory.  One have to manually figure out which WAL files are
required and copy them back from archive.

This commit implements new pg_rewind options, which allow pg_rewind to
automatically retrieve missing WAL files from archival storage. The
restore_command option is read from postgresql.conf.

Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru
Author: Alexey Kondratov
Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera
Reviewed-by: Andres Freund, Alexander Korotkov
---
 doc/src/sgml/ref/pg_rewind.sgml  |  28 --
 src/backend/access/transam/xlogarchive.c |  60 ++--
 src/bin/pg_rewind/parsexlog.c|  33 ++-
 src/bin/pg_rewind/pg_rewind.c|  77 +++-
 src/bin/pg_rewind/pg_rewind.h|   6 +-
 src/bin/pg_rewind/t/001_basic.pl |   3 +-
 src/bin/pg_rewind/t/RewindTest.pm|  66 +-
 src/common/Makefile  |   2 +
 src/common/archive.c | 102 +
 src/common/fe_archive.c  | 111 +++
 src/include/common/archive.h |  22 +
 src/include/common/fe_archive.h  |  19 
 src/tools/msvc/Mkvcbuild.pm  |   8 +-
 13 files changed, 457 insertions(+), 80 deletions(-)
 create mode 100644 src/common/archive.c
 create mode 100644 src/common/fe_archive.c
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/include/common/fe_archive.h

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 42d29edd4e..64a6942031 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -66,11 +66,11 @@ PostgreSQL documentation
can be found either on the target timeline, the source timeline, or their common
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
-   target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
-   fetched on startup by configuring  or
-   .  The use of
+   target cluster ran for a long time after the divergence, its old WAL
+   files might no longer be present. In this case, you can manually copy them
+   from the WAL archive to the pg_wal directory, or run
+   pg_rewind with the -c option to
+   automatically retrieve them from the WAL archive. The use of
pg_rewind is not limited to failover, e.g.  a standby
server can be promoted, run some write transactions, and then rewinded
to become a standby again.
@@ -232,6 +232,19 @@ PostgreSQL documentation
   
  
 
+ 
+  -c
+  --restore-target-wal
+  
+   
+Use the restore_command defined in
+the target cluster configuration to retrieve WAL files from
+the WAL archive if these files are no longer available in the
+pg_wal directory.
+   
+  
+ 
+
  
   --debug
   
@@ -318,7 +331,10 @@ GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, b
   history forked off from the target cluster. For each WAL record,
   record each data block that was touched. This yields a list of all
   the data blocks that were changed in the target cluster, after the
-  source cluster forked off.
+  source cluster forked off. If some of the WAL files are no longer
+  available, try re-running pg_rewind with
+  the -c option to search for the missing files in
+  the WAL archive.
  
 
 
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 188b73e752..f78a7e8f02 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -21,6 +21,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "common/archive.h"
 #include "miscadmin.h"
 #include "postmaster/startup.h"
 #include "replication/walsender.h"
@@ -55,9 +56,6 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	char		xlogpath[MAXPGPATH];
 	char		xlogRestoreCmd[MAXPGPATH];
 	c

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-02-28 Thread Alexey Kondratov

On 2020-02-28 09:43, Michael Paquier wrote:

On Thu, Feb 27, 2020 at 06:29:34PM +0300, Alexey Kondratov wrote:

On 2020-02-27 16:41, Alexey Kondratov wrote:
>
> New version of the patch is attached. Thanks again for your review.
>

Last patch (v18) got a conflict with one of today commits 
(05d8449e73).

Rebased version is attached.


The shape of the patch is getting better.  I have found some issues
when reading through the patch, but nothing huge.

+   printf(_("  -c, --restore-target-wal   use restore_command in
target config\n"));
+   printf(_(" to retrieve WAL files
from archive\n"));
[...]
{"progress", no_argument, NULL, 'P'},
+   {"restore-target-wal", no_argument, NULL, 'c'},
It may be better to reorder that alphabetically.



Sure, I put it in order. However, the recent -R option is out of order 
too.




+   if (rc != 0)
+   /* Sanity check, should never happen. */
+   elog(ERROR, "failed to build restore_command due to missing
parameters");
No point in having this comment IMO.



I would prefer to keep it, since there are plenty of similar comments 
near Asserts and elogs all over the Postgres. Otherwise it may look like 
a valid error state. It may be obvious now, but for someone who is not 
aware of BuildRestoreCommand refactoring it may be not. So from my 
perspective there is nothing bad in this extra one line comment.




+/* logging support */
+#define pg_fatal(...) do { pg_log_fatal(__VA_ARGS__); exit(1); } 
while(0)

Actually, I don't think that this is a good idea to name this
pg_fatal() as we have the same think in pg_rewind so it could be
confusing.



I have added explicit exit(1) calls, since pg_fatal was used only twice 
in the archive.c. Probably, pg_log_fatal from common/logging should obey 
the same logic as FATAL log-level in the backend and do exit the 
process, but for now including pg_rewind.h inside archive.c or vice 
versa does not look like a solution.




-   while ((c = getopt_long(argc, argv, "D:nNPR", long_options,
&option_index)) != -1)
+   while ((c = getopt_long(argc, argv, "D:nNPRc", long_options,
&option_index)) != -1)
Alphabetical order here.



Done.



+   rmdir($node_master->archive_dir);
rmtree() is used in all our other tests.



Done. There was an unobvious logic that rmdir only deletes empty 
directories, which is true in the case of archive_dir in that test, but 
I have unified it for consistency.




+   pg_log_error("archive file \"%s\" has wrong size: %lu
instead of %lu, %s",
+xlogfname, (unsigned long) 
stat_buf.st_size,
+(unsigned long) expectedSize, 
strerror(errno));

I think that the error message should be reworded: "unexpected WAL
file size for \"%s\": %lu instead of %lu".  Please note that there is
no need for strerror() here at all, as errno should be 0.

+if (xlogfd < 0)
+pg_log_error("could not open file \"%s\" restored from 
archive: %s\n",

+ xlogpath, strerror(errno));
[...]
+pg_log_error("could not stat file \"%s\" restored from archive: 
%s",

+xlogpath, strerror(errno));
No need for strerror() as you can just use %m.  And no need for the
extra newline at the end as pg_log_* routines do that by themselves.

+   pg_log_error("could not restore file \"%s\" from archive\n",
+xlogfname);
No need for a newline here.



Thanks, I have cleaned up these log statements.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company

From ba20808ffddf3fe2eefe96d3385697fb6583ce9a Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Tue, 25 Feb 2020 02:22:45 +0300
Subject: [PATCH v20] pg_rewind: Add options to restore WAL files from archive

Currently, pg_rewind fails when it could not find required WAL files in the
target data directory.  One have to manually figure out which WAL files are
required and copy them back from archive.

This commit implements new pg_rewind options, which allow pg_rewind to
automatically retrieve missing WAL files from archival storage. The
restore_command option is read from postgresql.conf.

Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru
Author: Alexey Kondratov
Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera
Reviewed-by: Andres Freund, Alexander Korotkov
---
 doc/src/sgml/ref/pg_rewind.sgml  |  28 --
 src/backend/access/transam/xlogarchive.c |  60 ++--
 src/bin/pg_rewind/parsexlog.c|  33 ++-
 src/bin/pg_rewind/pg_rewind.c|  77 ++-
 src/bin/pg_rewind/pg_rewind.h|   6 +-
 src/bin/pg_rewind

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-02-29 Thread Alexey Kondratov

On 2020-02-11 19:48, Justin Pryzby wrote:
For your v7 patch, which handles REINDEX to a new tablespace, I have a 
few

minor comments:

+ * the relation will be rebuilt.  If InvalidOid is used, the default

=> should say "currrent", not default ?



Yes, it keeps current index tablespace in that case, thanks.



+++ b/doc/src/sgml/ref/reindex.sgml
+TABLESPACE
...
+class="parameter">new_tablespace


=> I saw you split the description of TABLESPACE from new_tablespace 
based on
comment earlier in the thread, but I suggest that the descriptions for 
these

should be merged, like:

+   
+TABLESPACEnew_tablespace
+
+ 
+  Allow specification of a tablespace where all rebuilt indexes
will be created.
+  Cannot be used with "mapped" relations. If 
SCHEMA,

+  DATABASE or SYSTEM are
specified, then
+  all unsuitable relations will be skipped and a single
WARNING
+  will be generated.
+ 
+
+   



It sounds good to me, but here I just obey the structure, which is used 
all around. Documentation of ALTER TABLE/DATABASE, REINDEX and many 
others describes each literal/parameter in a separate entry, e.g. 
new_tablespace. So I would prefer to keep it as it is for now.




The existing patch is very natural, especially the parts in the 
original patch

handling vacuum full and cluster.  Those were removed to concentrate on
REINDEX, and based on comments that it might be nice if ALTER handled 
CLUSTER
and VACUUM FULL.  On a separate thread, I brought up the idea of ALTER 
using
clustered order.  Tom pointed out some issues with my implementation, 
but

didn't like the idea, either.

So I suggest to re-include the CLUSTER/VAC FULL parts as a separate 
0002 patch,

the same way they were originally implemented.

BTW, I think if "ALTER" were updated to support REINDEX (to allow 
multiple

operations at once), it might be either:
|ALTER INDEX i SET TABLESPACE , REINDEX -- to reindex a single index
on a given tlbspc
or
|ALTER TABLE tbl REINDEX USING INDEX TABLESPACE spc; -- to reindex all
inds on table inds moved to a given tblspc
"USING INDEX TABLESPACE" is already used for ALTER..ADD column/table 
CONSTRAINT.




Yes, I also think that allowing REINDEX/CLUSTER/VACUUM FULL to put 
resulting relation in a different tablespace is a very natural 
operation. However, I did a couple of attempts to integrate latter two 
with ALTER TABLE and failed with it, since it is already complex enough. 
I am still willing to proceed with it, but not sure how soon it will be.


Anyway, new version is attached. It is rebased in order to resolve 
conflicts with a recent fix of REINDEX CONCURRENTLY + temp relations, 
and includes this small comment fix.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company
From d2b7a5fa2e11601759b47af0c142a7824ef907a2 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 30 Dec 2019 20:00:37 +0300
Subject: [PATCH v8] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml | 24 +-
 src/backend/catalog/index.c   | 75 --
 src/backend/commands/cluster.c|  2 +-
 src/backend/commands/indexcmds.c  | 96 ---
 src/backend/commands/tablecmds.c  |  2 +-
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/parser/gram.y | 14 ++--
 src/backend/tcop/utility.c|  6 +-
 src/bin/psql/tab-complete.c   |  6 ++
 src/include/catalog/index.h   |  7 +-
 src/include/commands/defrem.h |  6 +-
 src/include/nodes/parsenodes.h|  1 +
 src/test/regress/input/tablespace.source  | 49 
 src/test/regress/output/tablespace.source | 66 
 15 files changed, 323 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index c54a7c420d4..0628c94bb1e 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name [ TABLESPACE new_tablespace ]
 
 where option can be one of:
 
@@ -174,6 +174,28 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM is specified, then
+  all unsuitable relations will be skipped and a single WARNING
+  will be generated.
+ 

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-02 Thread Alexey Kondratov

On 2020-03-02 07:53, Michael Paquier wrote:




+ * For fixed-size files, the caller may pass the expected size as an
+ * additional crosscheck on successful recovery.  If the file size is 
not

+ * known, set expectedSize = 0.
+ */
+int
+RestoreArchivedWALFile(const char *path, const char *xlogfname,
+  off_t expectedSize, const char 
*restoreCommand)


Actually, expectedSize is IMO a bad idea, because any caller of this
routine passing down zero could be trapped with an incorrect file
size.  So let's remove the behavior where it is possible to bypass
this sanity check.  We don't need it in pg_rewind either.



OK, sounds reasonable, but just to be clear. I will remove only a 
possibility to bypass this sanity check (with 0), but leave expectedSize 
argument intact. We still need it, since pg_rewind takes WalSegSz from 
ControlFile and should pass it further, am I right?





+   /* Remove trailing newline */
+   if (strchr(cmd_output, '\n') != NULL)
+   *strchr(cmd_output, '\n') = '\0';


It seems to me that what you are looking here is to use
pg_strip_crlf().  Thinking harder, we have pipe_read_line() in
src/common/exec.c which does the exact same job..



pg_strip_crlf fits well, but would you mind if I also make 
pipe_read_line external in this patch?





-   /*
-* construct the command to be executed
-*/


Perhaps you meant "build" here.



Actually, the verb 'construct' is used historically applied to 
archive/restore commands (see also xlogarchive.c and pgarch.c), but it 
should be 'build' in (fe_)archive.c, since we have BuildRestoreCommand 
there now.


All other remarks look clear for me, so I fix them in the next patch 
version, thanks.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-04 Thread Alexey Kondratov

On 04.03.2020 10:45, Michael Paquier wrote:

On Mon, Mar 02, 2020 at 08:59:49PM +0300, Alexey Kondratov wrote:

All other remarks look clear for me, so I fix them in the next patch
version, thanks.

Already done as per the attached, with a new routine named
getRestoreCommand() and more done.


Many thanks for doing that. I went through the diff between v21 and v20. 
Most of the changes look good to me.


- *        Functions for finding and validating executable files
+ *        Functions for finding and validating from executables files

There is probably something missing here. Finding and validating what? 
And 'executables files' does not seem to be correct as well.


+        # First, remove all the content in the archive directory,
+        # as RecursiveCopy::copypath does not support copying to
+        # existing directories.

I think that 'remove all the content' is not completely correct in this 
case. We are simply removing archive directory. There is no content 
there yet, so 'First, remove archive directory...' should be fine.



- I did not actually get why you don't check for a missing command
when using wait_result_is_any_signal.  In this case I'd think that it
is better to exit immediately as follow-up calls would just fail.


Believe me or not, but I put 'false' there intentionally. The idea was 
that if the reason is a signal, then maybe user tired of waiting and 
killed that restore_command process theirself or something like that, so 
it is better to exit immediately. If it was a missing command, then 
there is no hurry, so we can go further and complain that attempt of 
recovering WAL segment has failed.


Actually, I guess that there is no big difference if we include missing 
command here or not. There is no complicated logic further compared to 
real recovery process in Postgres, where we cannot simply return false 
in that case.



- The code was rather careless about error handling and
RestoreArchivedWALFile(), and it seemed to me that it is rather
pointless to report an extra message "could not restore file \"%s\"
from archive" on top of the other error.


Probably you mean several pg_log_error calls not followed by 'return 
-1;'. Yes, I did it to fall down to the function end and show this extra 
message, but I agree that there is no much sense in doing so.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-05 Thread Alexey Kondratov

On 05.03.2020 09:24, Michael Paquier wrote:

On Wed, Mar 04, 2020 at 08:14:20PM +0300, Alexey Kondratov wrote:

- I did not actually get why you don't check for a missing command
when using wait_result_is_any_signal.  In this case I'd think that it
is better to exit immediately as follow-up calls would just fail.

Believe me or not, but I put 'false' there intentionally. The idea was that
if the reason is a signal, then maybe user tired of waiting and killed that
restore_command process theirself or something like that, so it is better to
exit immediately. If it was a missing command, then there is no hurry, so we
can go further and complain that attempt of recovering WAL segment has
failed.

Actually, I guess that there is no big difference if we include missing
command here or not. There is no complicated logic further compared to real
recovery process in Postgres, where we cannot simply return false in that
case.

On the contrary, it seems to me that the difference is very important.
Imagine for example a frontend tool which calls RestoreArchivedWALFile
in a loop, and that this one fails because the command called is
missing.  This tool would keep looping for nothing.  So checking for a
missing command and leaving immediately would be more helpful for the
user.  Can you think about scenarios where it would make sense to be
able to loop in this case instead of failing?



OK, I was still having in mind pg_rewind as the only one user of this 
routine. Now it is a part of the common and I could imagine a 
hypothetical tool that is polling the archive and waiting for a specific 
WAL segment to become available. In this case 'command not found' is 
definitely the end of game, while the absence of segment is expected 
error, so we can continue looping.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Conflict handling for COPY FROM

2020-03-10 Thread Alexey Kondratov

On 09.03.2020 15:34, Surafel Temesgen wrote:


okay attached is a rebased patch with it



+    Portal        portal = NULL;
...
+        portal = GetPortalByName("");
+        SetRemoteDestReceiverParams(dest, portal);

I think that you do not need this, since you are using a ready 
DestReceiver. The whole idea of passing DestReceiver down to the 
CopyFrom was to avoid that code. This unnamed portal is created in the 
exec_simple_query [1] and has been already set to the DestReceiver there 
[2].


Maybe I am missing something, but I have just removed this code and 
everything works just fine.


[1] 
https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1178


[2] 
https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1226



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-12 Thread Alexey Kondratov

On 12.03.2020 07:39, Michael Paquier wrote:

I'd like to commit the refactoring piece in 0001 tomorrow, then let's
move on with the rest as of 0002.  If more comments and docs are
needed for archive.c, let's continue discussing that.


I just went through the both patches and realized that I cannot get into 
semantics of splitting frontend code between common and fe_utils. This 
applies only to 0002, where we introduce fe_archive.c. Should it be 
placed into fe_utils alongside of the recent recovery_gen.c also used by 
pg_rewind? This is a frontend-only code intended to be used by frontend 
applications, so fe_utils feels like the right place, doesn't it? Just 
tried to do so and everything went fine, so it seems that there is no 
obstacles from the build system.


BTW, most of 'common' is a really common code with only four exceptions 
like logging.c, which is frontend-only. Is it there for historical 
reasons only or something else?



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-03-12 Thread Alexey Kondratov

Hi Justin,

On 09.03.2020 23:04, Justin Pryzby wrote:

On Sat, Feb 29, 2020 at 08:53:04AM -0600, Justin Pryzby wrote:

On Sat, Feb 29, 2020 at 03:35:27PM +0300, Alexey Kondratov wrote:

Anyway, new version is attached. It is rebased in order to resolve conflicts
with a recent fix of REINDEX CONCURRENTLY + temp relations, and includes
this small comment fix.

Thanks for rebasing - I actually started to do that yesterday.

I extracted the bits from your original 0001 patch which handled CLUSTER and
VACUUM FULL.  I don't think if there's any interest in combining that with
ALTER anymore.  On another thread (1), I tried to implement that, and Tom
pointed out problem with the implementation, but also didn't like the idea.

I'm including some proposed fixes, but didn't yet update the docs, errors or
tests for that.  (I'm including your v8 untouched in hopes of not messing up
the cfbot).  My fixes avoid an issue if you try to REINDEX onto pg_default, I
think due to moving system toast indexes.

I was able to avoid this issue by adding a call to GetNewRelFileNode, even
though that's already called by RelationSetNewRelfilenode().  Not sure if
there's a better way, or if it's worth Alexey's v3 patch which added a
tablespace param to RelationSetNewRelfilenode.


Do you have any understanding of what exactly causes this error? I have 
tried to debug it a little bit, but still cannot figure out why we need 
this extra GetNewRelFileNode() call and a mechanism how it helps.


Probably you mean v4 patch. Yes, interestingly, if we do everything at 
once inside RelationSetNewRelfilenode(), then there is no issue at all with:


REINDEX DATABASE template1 TABLESPACE pg_default;

It feels like I am doing a monkey coding here, so I want to understand 
it better :)



The current logic allows moving all the indexes and toast indexes, but I think
we should use IsSystemRelation() unless allow_system_table_mods, like existing
behavior of ALTER.

template1=# ALTER TABLE pg_extension_oid_index SET tablespace pg_default;
ERROR:  permission denied: "pg_extension_oid_index" is a system catalog
template1=# REINDEX INDEX pg_extension_oid_index TABLESPACE pg_default;
REINDEX


Yeah, we definitely should obey the same rules as ALTER TABLE / INDEX in 
my opinion.



Finally, I think the CLUSTER is missing permission checks.  It looks like
relation_is_movable was factored out, but I don't see how that helps ?


I did this relation_is_movable refactoring in order to share the same 
check between REINDEX + TABLESPACE and ALTER INDEX + SET TABLESPACE. 
Then I realized that REINDEX already has its own temp tables check and 
does mapped relations validation in multiple places, so I just added 
global tablespace checks instead. Thus, relation_is_movable seems to be 
outdated right now. Probably, we have to do another refactoring here 
once all proper validations will be accumulated in this patch set.



Alexey, I'm hoping to hear back if you think these changes are ok or if you'll
publish a new version of the patch addressing the crash I reported.
Or if you're too busy, maybe someone else can adopt the patch (I can help).


Sorry for the late response, I was not going to abandon this patch, but 
was a bit busy last month.


Many thanks for you review and fixups! There are some inconsistencies 
like mentions of SET TABLESPACE in error messages and so on. I am going 
to refactor and include your fixes 0003-0004 into 0001 and 0002, but 
keep 0005 separated for now, since this part requires more understanding 
IMO (and comparison with v4 implementation).


That way, I am going to prepare a more clear patch set till the middle 
of the next week. I will be glad to receive more feedback from you then.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-18 Thread Alexey Kondratov

Hi Surafel,

Thank you for looking at the patch!

On 17.09.2019 14:04, Surafel Temesgen wrote:
* There are NOWAIT option in alter index, is there a reason not to 
have similar option here?


Currently in Postgres SET TABLESPACE always comes with [ NOWAIT ] 
option, so I hope it worth adding this option here for convenience. 
Added in the new version.



* SET TABLESPACE command is not documented


Actually, new_tablespace parameter was documented, but I've added a more 
detailed section for SET TABLESPACE too.


* There are multiple checking for whether the relation is temporary 
tables of other sessions, one in check_relation_is_movable and other 
independently


Yes, and there is a comment section in the code describing why. There is 
a repeatable bunch of checks for verification whether relation movable 
or not, so I put it into a separated function -- 
check_relation_is_movable. However, if we want to do only REINDEX, then 
some of them are excess, so the only one RELATION_IS_OTHER_TEMP is used. 
Thus, RELATION_IS_OTHER_TEMP is never executed twice, just different 
code paths.



*+ char *tablespacename;

calling it new_tablespacename will make it consistent with other places



OK, changed, although I don't think it is important, since this is the 
only one tablespace variable there.


*The patch did't applied cleanly 
http://cfbot.cputube.org/patch_24_2269.log




Patch is rebased and attached with all the fixes described above.


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 7a19b1fd945502ad55f1fa9e61c3014d8715e404 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 18 Sep 2019 15:22:04 +0300
Subject: [PATCH v2] Allow REINDEX and REINDEX CONCURRENTLY to SET TABLESPACE

---
 doc/src/sgml/ref/reindex.sgml |  25 +
 src/backend/catalog/index.c   | 109 ++
 src/backend/commands/cluster.c|   2 +-
 src/backend/commands/indexcmds.c  |  38 +---
 src/backend/commands/tablecmds.c  |  59 +++-
 src/backend/parser/gram.y |  29 --
 src/backend/tcop/utility.c|  22 -
 src/include/catalog/index.h   |   7 +-
 src/include/commands/defrem.h |   6 +-
 src/include/commands/tablecmds.h  |   2 +
 src/include/nodes/parsenodes.h|   2 +
 src/test/regress/input/tablespace.source  |  32 +++
 src/test/regress/output/tablespace.source |  44 +
 13 files changed, 308 insertions(+), 69 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 10881ab03a..192243e58f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE } name [ SET TABLESPACE new_tablespace [NOWAIT] ]
 
  
 
@@ -165,6 +166,30 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURR
 

 
+   
+SET TABLESPACE
+
+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Can be used only with REINDEX INDEX and
+  REINDEX TABLE, since the system indexes are not
+  movable, but SCHEMA, DATABASE or
+  SYSTEM very likely will has one.  If the
+  NOWAIT option is specified then the command will fail
+  if it is unable to acquire all of the locks required immediately.
+ 
+
+   
+
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store rebuilt indexes.
+ 
+
+   
+

 VERBOSE
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 54288a498c..715abfdf65 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1194,7 +1194,8 @@ index_create(Relation heapRelation,
  * on.  This is called during concurrent reindex processing.
  */
 Oid
-index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char *newName)
+index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
+			   Oid tablespaceOid, const char *newName)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1324,7 +1325,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char
 			  newInfo,
 			  indexColNames,
 			  indexRelation->rd_rel->relam,
-			  indexRelation->rd_rel->reltablespace,
+			  tablespaceOid ? tablespaceOid : indexRelation->rd_rel->reltablespace,
 			  indexRelation->rd_indcollation,
 			  indclass->values,
 			  indcoloptions->values,
@@ -3297,16 +3298,22 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+reind

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-19 Thread Alexey Kondratov

Hi Michael,

Thank you for your comments.

On 19.09.2019 7:43, Michael Paquier wrote:

On Wed, Sep 18, 2019 at 03:46:20PM +0300, Alexey Kondratov wrote:

Currently in Postgres SET TABLESPACE always comes with [ NOWAIT ] option, so
I hope it worth adding this option here for convenience. Added in the new
version.

It seems to me that it would be good to keep the patch as simple as
possible for its first version, and split it into two if you would
like to add this new option instead of bundling both together.  This
makes the review of one and the other more simple.


OK, it makes sense. I would also prefer first patch as simple as 
possible, but adding this NOWAIT option required only a few dozens of 
lines, so I just bundled everything together. Anyway, I will split 
patches if we decide to keep [ SET TABLESPACE ... [NOWAIT] ] grammar.



Anyway, regarding
the grammar, is SET TABLESPACE really our best choice here?  What
about:
- TABLESPACE = foo, in parenthesis only?
- Only using TABLESPACE, without SET at the end of the query?

SET is used in ALTER TABLE per the set of subqueries available there,
but that's not the case of REINDEX.


I like SET TABLESPACE grammar, because it already exists and used both 
in ALTER TABLE and ALTER INDEX. Thus, if we once add 'ALTER INDEX 
index_name REINDEX SET TABLESPACE' (as was proposed earlier in the 
thread), then it will be consistent with 'REINDEX index_name SET 
TABLESPACE'. If we use just plain TABLESPACE, then it may be misleading 
in the following cases:


- REINDEX TABLE table_name TABLESPACE tablespace_name
- REINDEX (TABLESPACE = tablespace_name) TABLE table_name

since it may mean 'Reindex all indexes of table_name, that stored in the 
tablespace_name', doesn't it?


However, I have rather limited experience with Postgres, so I doesn't 
insist.



+-- check that all relations moved to new tablespace
+SELECT relname FROM pg_class
+WHERE reltablespace=(SELECT oid FROM pg_tablespace WHERE
spcname='regress_tblspace')
+AND relname IN ('regress_tblspace_test_tbl_idx');
+relname
+---
+ regress_tblspace_test_tbl_idx
+(1 row)
Just to check one relation you could use \d with the relation (index
or table) name.


Yes, \d outputs tablespace name if it differs from pg_default, but it 
shows other information in addition, which is not necessary here. Also 
its output has more chances to be changed later, which may lead to the 
failed tests. This query output is more or less stable and new relations 
may be easily added to tests if we once add tablespace change to 
CLUSTER/VACUUM FULL. I can change test to use \d, but not sure that it 
would reduce test output length or will be helpful for a future tests 
support.



-   if (RELATION_IS_OTHER_TEMP(iRel))
-   ereport(ERROR,
-   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-errmsg("cannot reindex temporary tables of other
-   sessions")))
I would keep the order of this operation in order with
CheckTableNotInUse().


Sure, I haven't noticed that reordered these operations, thanks.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-19 Thread Alexey Kondratov

On 19.09.2019 16:21, Robert Haas wrote:

On Thu, Sep 19, 2019 at 12:43 AM Michael Paquier  wrote:

It seems to me that it would be good to keep the patch as simple as
possible for its first version, and split it into two if you would
like to add this new option instead of bundling both together.  This
makes the review of one and the other more simple.  Anyway, regarding
the grammar, is SET TABLESPACE really our best choice here?  What
about:
- TABLESPACE = foo, in parenthesis only?
- Only using TABLESPACE, without SET at the end of the query?

SET is used in ALTER TABLE per the set of subqueries available there,
but that's not the case of REINDEX.

So, earlier in this thread, I suggested making this part of ALTER
TABLE, and several people seemed to like that idea. Did we have a
reason for dropping that approach?


If we add this option to REINDEX, then for 'ALTER TABLE tb_name action1, 
REINDEX SET TABLESPACE tbsp_name, action3' action2 will be just a direct 
alias to 'REINDEX TABLE tb_name SET TABLESPACE tbsp_name'. So it seems 
practical to do this for REINDEX first.


The only one concern I have against adding REINDEX to ALTER TABLE in 
this context is that it will allow user to write such a chimera:


ALTER TABLE tb_name REINDEX SET TABLESPACE tbsp_name, SET TABLESPACE 
tbsp_name;


when they want to move both table and all the indexes. Because simple

ALTER TABLE tb_name REINDEX, SET TABLESPACE tbsp_name;

looks ambiguous. Should it change tablespace of table, indexes or both?


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Conflict handling for COPY FROM

2019-09-20 Thread Alexey Kondratov

Hi Surafel,

On 16.07.2019 10:08, Surafel Temesgen wrote:

i also add an option to ignore all errors in ERROR set to -1


Great!


The patch still applies cleanly (tested on e1c8743e6c), but I've got 
some problems using more elaborated tests.


First of all, there is definitely a problem with grammar. In docs ERROR 
is defined as option and


COPY test FROM '/path/to/copy-test-simple.csv' ERROR -1;

works just fine, but if modern 'WITH (...)' syntax is used:

COPY test FROM '/path/to/copy-test-simple.csv' WITH (ERROR -1);
ERROR:  option "error" not recognized

while 'WITH (error_limit -1)' it works again.

It happens, since COPY supports modern and very-very old syntax:

* In the preferred syntax the options are comma-separated
* and use generic identifiers instead of keywords.  The pre-9.0
* syntax had a hard-wired, space-separated set of options.

So I see several options here:

1) Everything is left as is, but then docs should be updated and 
reflect, that error_limit is required for modern syntax.


2) However, why do we have to support old syntax here? I guess it exists 
for backward compatibility only, but this is a completely new feature. 
So maybe just 'WITH (error_limit 42)' will be enough?


3) You also may simply change internal option name from 'error_limit' to 
'error' or SQL keyword from 'ERROR' tot 'ERROR_LIMIT'.


I would prefer the second option.


Next, you use DestRemoteSimple for returning conflicting tuples back:

+        dest = CreateDestReceiver(DestRemoteSimple);
+        dest->rStartup(dest, (int) CMD_SELECT, tupDesc);

However, printsimple supports very limited subset of built-in types, so

CREATE TABLE large_test (id integer primary key, num1 bigint, num2 
double precision);

COPY large_test FROM '/path/to/copy-test.tsv';
COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;

fails with following error 'ERROR:  unsupported type OID: 701', which 
seems to be very confusing from the end user perspective. I've tried to 
switch to DestRemote, but couldn't figure it out quickly.



Finally, I simply cannot get into this validation:

+        else if (strcmp(defel->defname, "error_limit") == 0)
+        {
+            if (cstate->ignore_error)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options"),
+                         parser_errposition(pstate, defel->location)));
+            cstate->error_limit = defGetInt64(defel);
+            cstate->ignore_error = true;
+            if (cstate->error_limit == -1)
+                cstate->ignore_all_error = true;
+        }

If cstate->ignore_error is defined, then we have already processed 
options list, since this is the only one place, where it's set. So we 
should never get into this ereport, doesn't it?



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


1100.5
2420.1
300



Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-09-24 Thread Alexey Kondratov

On 20.09.2019 19:38, Alvaro Herrera wrote:

On 2019-Sep-19, Robert Haas wrote:


So, earlier in this thread, I suggested making this part of ALTER
TABLE, and several people seemed to like that idea. Did we have a
reason for dropping that approach?

Hmm, my own reading of that was to add tablespace changing abilities to
ALTER TABLE *in addition* to this patch, not instead of it.


That was my understanding too.

On 20.09.2019 11:26, Jose Luis Tallon wrote:

On 20/9/19 4:06, Michael Paquier wrote:

Personally, I don't find this idea very attractive as ALTER TABLE is
already complicated enough with all the subqueries we already support
in the command, all the logic we need to maintain to make combinations
of those subqueries in a minimum number of steps, and also the number
of bugs we have seen because of the amount of complication present.


Yes, but please keep the other options: At it is, cluster, vacuum full 
and reindex already rewrite the table in full; Being able to write the 
result to a different tablespace than the original object was stored 
in enables a whole world of very interesting possibilities 
including a quick way out of a "so little disk space available that 
vacuum won't work properly" situation --- which I'm sure MANY users 
will appreciate, including me 


Yes, sure, that was my main motivation. The first message in the thread 
contains a patch, which adds SET TABLESPACE support to all of CLUSTER, 
VACUUM FULL and REINDEX. However, there came up an idea to integrate 
CLUSTER/VACUUM FULL with ALTER TABLE and do their work + all the ALTER 
TABLE stuff in a single table rewrite. I've dig a little bit into this 
and ended up with some architectural questions and concerns [1]. So I 
decided to start with a simple REINDEX patch.


Anyway, I've followed Michael's advice and split the last patch into two:

1) Adds all the main functionality, but with simplified 'REINDEX INDEX [ 
CONCURRENTLY ] ... [ TABLESPACE ... ]' grammar;


2) Adds a more sophisticated syntax with '[ SET TABLESPACE ... [ NOWAIT 
] ]'.


Patch 1 contains all the docs and tests and may be applied/committed 
separately or together with 2, which is fully optional.


Recent merge conflicts and reindex_index validations order are also 
fixed in the attached version.


[1] 
https://www.postgresql.org/message-id/6b2a5c4de19f111ef24b63428033bb67%40postgrespro.ru



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 4f06996f1e86dee389cb0f901cb83dba77c2abd8 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 24 Sep 2019 12:29:57 +0300
Subject: [PATCH v3 1/2] Allow REINDEX and REINDEX CONCURRENTLY to change
 TABLESPACE

---
 doc/src/sgml/ref/reindex.sgml | 23 ++
 src/backend/catalog/index.c   | 99 ---
 src/backend/commands/cluster.c|  2 +-
 src/backend/commands/indexcmds.c  | 34 +---
 src/backend/commands/tablecmds.c  | 59 --
 src/backend/parser/gram.y | 21 +++--
 src/backend/tcop/utility.c| 16 +++-
 src/include/catalog/index.h   |  7 +-
 src/include/commands/defrem.h |  6 +-
 src/include/commands/tablecmds.h  |  2 +
 src/include/nodes/parsenodes.h|  1 +
 src/test/regress/input/tablespace.source  | 31 +++
 src/test/regress/output/tablespace.source | 41 ++
 13 files changed, 279 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 10881ab03a..96c9363ad9 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE } [ CONCURRENTLY ] name [ TABLESPACE new_tablespace ]
 
  
 
@@ -165,6 +166,28 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURR
 

 
+   
+TABLESPACE
+
+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Can be used only with REINDEX INDEX and
+  REINDEX TABLE, since the system indexes are not
+  movable, but SCHEMA, DATABASE or
+  SYSTEM very likely will has one. 
+ 
+
+   
+
+   
+new_tablespace
+
+ 
+  The name of the specific tablespace to store rebuilt indexes.
+ 
+
+   
+

 VERBOSE
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 098732cc4a..b2fed5dc75 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1239,7 +1239,8 @@ index_create(Relation heapRelation,
  * on.  This is called during concurrent reindex processing.
  */
 Oid
-index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char *newName)
+index_concurrently_create_copy(Rela

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-09-26 Thread Alexey Kondratov

On 01.08.2019 19:53, Alexey Kondratov wrote:

On 26.07.2019 20:43, Liudmila Mantrova wrote:
On a more general note, I wonder if everyone is happy with the 
--using-postgresql-conf option name, or we should continue searching 
for a narrower term. Unfortunately, I don't have any better 
suggestions right now, but I believe it should be clear that its 
purpose is to fetch missing WAL files for target. What do you think?




I don't like it either, but this one was my best guess then. Maybe 
--restore-target-wal instead of --using-postgresql-conf will be 
better? And --target-restore-command instead of --restore-command if 
we want to specify that this is restore_command for target server?




As Alvaro correctly pointed in the nearby thread [1], we've got an 
interference regarding -R command line argument. I agree that it's a 
good idea to reserve -R for recovery configuration write to be 
consistent with pg_basebackup, so I've updated my patch to use another 
letters:


1. -c/--restore-target-wal --- to use restore_command from postgresql.conf
2. -C/--target-restore-command --- to pass restore_command as a command 
line argument


Updated and rebased patch is attached. However, now I'm wondering, do we 
actually need 1. as a separated option and not being enabled by default? 
I cannot imagine a situation, when restore_command is set in the 
postgresql.conf and someone prefer pg_rewind to fail instead of fetching 
missed WALs automatically, but maybe there are some cases?



[1] 
https://www.postgresql.org/message-id/20190925174812.GA4916%40alvherre.pgsql



--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From d7e1041c756b79e6e4636be1b0337453db8a7457 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v10] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  49 +++-
 src/bin/pg_rewind/parsexlog.c | 164 +-
 src/bin/pg_rewind/pg_rewind.c | 112 +++---
 src/bin/pg_rewind/pg_rewind.h |   6 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  84 -
 8 files changed, 396 insertions(+), 31 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index ac142d22fc..27c662cc83 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -66,11 +66,12 @@ PostgreSQL documentation
can be found either on the target timeline, the source timeline, or their common
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
-   target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
-   fetched on startup by configuring  or
-   .  The use of
+   target cluster ran for a long time after the divergence, its old WAL
+   files might no longer be present. In this case, you can manually copy them
+   from the WAL archive to the pg_wal directory, or run
+   pg_rewind with the -c or
+   -C option to automatically retrieve them from the WAL
+   archive. The use of
pg_rewind is not limited to failover, e.g.  a standby
server can be promoted, run some write transactions, and then rewinded
to become a standby again.
@@ -202,6 +203,39 @@ PostgreSQL documentation
   
  
 
+ 
+  -c
+  --restore-target-wal
+  
+   
+Use the restore_command defined in
+postgresql.conf to retrieve WAL files from
+the WAL archive if these files are no longer available in the
+pg_wal directory of the target cluster.
+   
+   
+This option cannot be used together with --target-restore-command.
+   
+  
+ 
+
+ 
+  -C restore_command
+  --target-restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieving
+WAL files from the WAL archive if these files are no longer available
+in the pg_wal directory of the target cluster.
+   
+   
+If restore_command is already set in
+postgresql.conf, you c

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-27 Thread Alexey Kondratov

On 27.09.2019 6:27, Paul Guo wrote:



Secondarily, I see no reason to test connstr_source rather than just
"conn" in the other patch; doing it the other way is more natural,
since
it's that thing that's tested as an argument.

pg_rewind.c: Please put the new #include line keeping the alphabetical
order.


Agreed to the above suggestions. I attached the v9.



I went through the remaining two patches and they seem to be very clear 
and concise. However, there are two points I could complain about:


1) Maybe I've missed it somewhere in the thread above, but currently 
pg_rewind allows to run itself with -R and --source-pgdata. In that case 
-R option is just swallowed and neither standby.signal, nor 
postgresql.auto.conf is written, which is reasonable though. Should it 
be stated somehow in the docs that -R option always has to go altogether 
with --source-server? Or should pg_rewind notify user that options are 
incompatible and no recovery configuration will be written?


2) Are you going to leave -R option completely without tap-tests? 
Attached is a small patch, which tests -R option along with the existing 
'remote' case. If needed it may be split into two separate cases. First, 
it tests that pg_rewind is able to succeed with minimal permissions 
according to the Michael's patch d9f543e [1]. Next, it checks presence 
of standby.signal and adds REPLICATION permission to rewind_user to test 
that new standby is able to start with generated recovery configuration.


[1] 
https://github.com/postgres/postgres/commit/d9f543e9e9be15f92abdeaf870e57ef289020191



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 8c607794f259cd4dec0fa6172b69d62e6468bee3 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 27 Sep 2019 14:30:57 +0300
Subject: [PATCH v9 3/3] Test new standby start with generated config during
 pg_rewind remote

---
 src/bin/pg_rewind/t/001_basic.pl   |  2 +-
 src/bin/pg_rewind/t/002_databases.pl   |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl  |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/RewindTest.pm  | 11 ++-
 5 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index 115192170e..c3293e93df 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 10;
+use Test::More tests => 11;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl
index f1eb4fe1d2..1db534c0dc 100644
--- a/src/bin/pg_rewind/t/002_databases.pl
+++ b/src/bin/pg_rewind/t/002_databases.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 6;
+use Test::More tests => 7;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl
index c4040bd562..f4710440fc 100644
--- a/src/bin/pg_rewind/t/003_extrafiles.pl
+++ b/src/bin/pg_rewind/t/003_extrafiles.pl
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 4;
+use Test::More tests => 5;
 
 use File::Find;
 
diff --git a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
index ed1ddb6b60..639eeb9c91 100644
--- a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
+++ b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
@@ -14,7 +14,7 @@ if ($windows_os)
 }
 else
 {
-	plan tests => 4;
+	plan tests => 5;
 }
 
 use FindBin;
diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm
index 68b6004e94..fcc48cb1d9 100644
--- a/src/bin/pg_rewind/t/RewindTest.pm
+++ b/src/bin/pg_rewind/t/RewindTest.pm
@@ -266,9 +266,18 @@ sub run_pg_rewind
 			[
 'pg_rewind',  "--debug",
 "--source-server",$standby_connstr,
-"--target-pgdata=$master_pgdata", "--no-sync"
+"--target-pgdata=$master_pgdata", "-R", "--no-sync"
 			],
 			'pg_rewind remote');
+
+		# Check that standby.signal has been created.
+		ok(-e "$master_pgdata/standby.signal");
+
+		# Now, when pg_rewind apparently succeeded with minimal permissions,
+		# add REPLICATION privilege.  So we could test that new standby
+		# is able to connect to the new master with generated config.
+		$node_standby->psql(
+			'postgres', "ALTER ROLE rewind_user WITH REPLICATION;");
 	}
 	else
 	{
-- 
2.17.1



Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-30 Thread Alexey Kondratov

On 27.09.2019 17:28, Alvaro Herrera wrote:



+   # Now, when pg_rewind apparently succeeded with minimal 
permissions,
+   # add REPLICATION privilege.  So we could test that new standby
+   # is able to connect to the new master with generated config.
+   $node_standby->psql(
+   'postgres', "ALTER ROLE rewind_user WITH REPLICATION;");

I think this better use safe_psql.



Yes, indeed.

On 30.09.2019 10:07, Paul Guo wrote:


2) Are you going to leave -R option completely without tap-tests?
Attached is a small patch, which tests -R option along with the
existing
'remote' case. If needed it may be split into two separate cases.
First,
it tests that pg_rewind is able to succeed with minimal permissions
according to the Michael's patch d9f543e [1]. Next, it checks
presence
of standby.signal and adds REPLICATION permission to rewind_user
to test
that new standby is able to start with generated recovery
configuration.

[1]

https://github.com/postgres/postgres/commit/d9f543e9e9be15f92abdeaf870e57ef289020191

It seems that we could further disabling recovery info setting code 
for the 'remote' test case?


-   my $port_standby = $node_standby->port;
-   $node_master->append_conf(
-       'postgresql.conf', qq(
-primary_conninfo='port=$port_standby'
-));
+   if ($test_mode ne "remote")
+   {
+       my $port_standby = $node_standby->port;
+       $node_master->append_conf(
+           'postgresql.conf',
+           qq(primary_conninfo='port=$port_standby'));

-   $node_master->set_standby_mode();
+       $node_master->set_standby_mode();
+   }




Yeah, it makes sense. It is excessive for remote if we add '-R' there. 
I've updated and attached my test adding patch.




--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From b38bc7d71f7e7d68d66d3bf9af4e6371445aeab2 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 27 Sep 2019 14:30:57 +0300
Subject: [PATCH v10 3/3] Test new standby start with generated config during
 pg_rewind remote

---
 src/bin/pg_rewind/t/001_basic.pl   |  2 +-
 src/bin/pg_rewind/t/002_databases.pl   |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl  |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/RewindTest.pm  | 27 +++---
 5 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index 115192170e..c3293e93df 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 10;
+use Test::More tests => 11;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl
index f1eb4fe1d2..1db534c0dc 100644
--- a/src/bin/pg_rewind/t/002_databases.pl
+++ b/src/bin/pg_rewind/t/002_databases.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 6;
+use Test::More tests => 7;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl
index c4040bd562..f4710440fc 100644
--- a/src/bin/pg_rewind/t/003_extrafiles.pl
+++ b/src/bin/pg_rewind/t/003_extrafiles.pl
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 4;
+use Test::More tests => 5;
 
 use File::Find;
 
diff --git a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
index ed1ddb6b60..639eeb9c91 100644
--- a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
+++ b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
@@ -14,7 +14,7 @@ if ($windows_os)
 }
 else
 {
-	plan tests => 4;
+	plan tests => 5;
 }
 
 use FindBin;
diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm
index 68b6004e94..2b45c2789c 100644
--- a/src/bin/pg_rewind/t/RewindTest.pm
+++ b/src/bin/pg_rewind/t/RewindTest.pm
@@ -149,7 +149,7 @@ sub start_master
 
 	# Create custom role which is used to run pg_rewind, and adjust its
 	# permissions to the minimum necessary.
-	$node_master->psql(
+	$node_master->safe_psql(
 		'postgres', "
 		CREATE ROLE rewind_user LOGIN;
 		GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean)
@@ -266,9 +266,18 @@ sub run_pg_rewind
 			[
 'pg_rewind',  "--debug",
 "--source-server",$standby_connstr,
-"--target-pgdata=$master_pgdata", "--no-sync"
+"--target-pgdata=$master_pgdata", "-R", "--no-sync"
 			],
 			'pg_rewind remote');
+
+		# Check

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-02 Thread Alexey Kondratov

Hi Alvaro,

On 30.09.2019 20:13, Alvaro Herrera wrote:

OK, I pushed this patch as well as Alexey's test patch.  It all works
for me, and the coverage report shows that we're doing the new thing ...
though only in the case that rewind *is* required.  There is no test to
verify the case where rewind is *not* required.  I guess it'd also be
good to test the case when we throw the new error, if only for
completeness ...


I've directly followed your guess and tried to elaborate pg_rewind test 
cases and... It seems I've caught a few bugs:


1) --dry-run actually wasn't completely 'dry'. It did update target 
controlfile, which could cause repetitive pg_rewind calls to fail after 
dry-run ones.


2) --no-ensure-shutdown flag was broken, it simply didn't turn off this 
new feature.


3) --write-recovery-conf didn't obey the --dry-run flag.

Thus, it was definitely a good idea to add new tests. Two patches are 
attached:


1) First one fixes all the issues above;

2) Second one slightly increases pg_rewind overall code coverage from 
74% to 78.6%.


Should I put this fix on the next commitfest?


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


P.S. My apologies that I've missed two of these bugs during review.

>From 7286e31ab0ebf50bb4ab460dd81b82f1c5989272 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 2 Oct 2019 19:24:46 +0300
Subject: [PATCH v1 1/2] Fix functionality of pg_rewind --dry-run and
 --no-ensure-shutdown options

Branch: pg-rewind-fixes
---
 src/bin/pg_rewind/pg_rewind.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index a7fd9e0cab..1a7fb5242b 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -101,7 +101,7 @@ main(int argc, char **argv)
 		{"write-recovery-conf", no_argument, NULL, 'R'},
 		{"source-pgdata", required_argument, NULL, 1},
 		{"source-server", required_argument, NULL, 2},
-		{"no-ensure-shutdown", no_argument, NULL, 44},
+		{"no-ensure-shutdown", no_argument, NULL, 4},
 		{"version", no_argument, NULL, 'V'},
 		{"dry-run", no_argument, NULL, 'n'},
 		{"no-sync", no_argument, NULL, 'N'},
@@ -435,13 +435,15 @@ main(int argc, char **argv)
 	ControlFile_new.minRecoveryPoint = endrec;
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
-	update_controlfile(datadir_target, &ControlFile_new, do_sync);
+
+	if (!dry_run)
+		update_controlfile(datadir_target, &ControlFile_new, do_sync);
 
 	if (showprogress)
 		pg_log_info("syncing target data directory");
 	syncTargetDirectory();
 
-	if (writerecoveryconf)
+	if (!dry_run && writerecoveryconf)
 		WriteRecoveryConfig(conn, datadir_target,
 			GenerateRecoveryConfig(conn, NULL));
 

base-commit: df86e52cace2c4134db51de6665682fb985f3195
-- 
2.17.1

>From 28fdd2fa58af718d8a894cb3c3d8f9b2cdf6759e Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 2 Oct 2019 19:25:27 +0300
Subject: [PATCH v1 2/2] Increase pg_rewind code coverage

Branch: pg-rewind-fixes
---
 src/bin/pg_rewind/t/001_basic.pl   |  2 +-
 src/bin/pg_rewind/t/002_databases.pl   |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl  |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/005_same_timeline.pl   | 27 ++
 src/bin/pg_rewind/t/RewindTest.pm  | 33 --
 6 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index c3293e93df..a1659460ec 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 11;
+use Test::More tests => 14;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl
index 1db534c0dc..921c4434f5 100644
--- a/src/bin/pg_rewind/t/002_databases.pl
+++ b/src/bin/pg_rewind/t/002_databases.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl
index f4710440fc..bce5b47148 100644
--- a/src/bin/pg_rewind/t/003_extrafiles.pl
+++ b/src/bin/pg_rewind/t/003_extrafiles.pl
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 5;
+use Test::More tests => 8;
 
 use File::Find;
 
diff --git a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
index 639eeb9c91..a501be8f78 100644
--- a/src/bin/pg_rewind/t/004_pg_xlo

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-03 Thread Alexey Kondratov

On 03.10.2019 6:07, Michael Paquier wrote:

On Wed, Oct 02, 2019 at 08:28:09PM +0300, Alexey Kondratov wrote:

I've directly followed your guess and tried to elaborate pg_rewind test
cases and... It seems I've caught a few bugs:

1) --dry-run actually wasn't completely 'dry'. It did update target
controlfile, which could cause repetitive pg_rewind calls to fail after
dry-run ones.

I have just paid attention to this thread, but this is a bug which
goes down to 12 actually so let's treat it independently of the rest.
The control file was not written thanks to the safeguards in
write_target_range() in past versions, but the recent refactoring
around control file handling broke that promise.  Another thing which
is not completely exact is the progress reporting which should be
reported even if the dry-run mode runs.  That's less critical, but
let's make things consistent.


I also thought about v12, though didn't check whether it's affected.


Patch 0001 also forgot that recovery.conf should not be written either
when no rewind is needed.


Yes, definitely, I forgot this code path, thanks.


I have reworked your first patch as per the attached.  What do you
think about it?  The part with the control file needs to go down to
v12, and I would likely split that into two commits on HEAD: one for
the control file and a second for the recovery.conf portion with the
fix for --no-ensure-shutdown to keep a cleaner history.


It looks fine for me excepting the progress reporting part. It now adds 
PG_CONTROL_FILE_SIZE to fetch_done. However, I cannot find that control 
file is either included into filemap and fetch_size or counted during 
calculate_totals(). Maybe I've missed something, but now it looks like 
we report something that wasn't planned for progress reporting, doesn't it?



+   # Check that incompatible options error out.
+   command_fails(
+   [
+   'pg_rewind', "--debug",
+   "--source-pgdata=$standby_pgdata",
+   "--target-pgdata=$master_pgdata", "-R",
+   "--no-ensure-shutdown"
+   ],
+   'pg_rewind local with -R');
Incompatible options had better be checked within a separate perl
script?  We generally do that for the other binaries.


Yes, it makes sense. I've reworked the patch with tests and added a 
couple of extra cases.



--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 9e828e311dc7c216e5bfb1936022be4f7fd3805f Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Thu, 3 Oct 2019 12:37:26 +0300
Subject: [PATCH v2 2/2] Increase pg_rewind code coverage

Branch: pg-rewind-fixes
---
 src/bin/pg_rewind/t/001_basic.pl   |  2 +-
 src/bin/pg_rewind/t/002_databases.pl   |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl  |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/005_same_timeline.pl   | 32 +---
 src/bin/pg_rewind/t/006_actions.pl | 61 ++
 src/bin/pg_rewind/t/RewindTest.pm  | 20 ++-
 7 files changed, 107 insertions(+), 14 deletions(-)
 create mode 100644 src/bin/pg_rewind/t/006_actions.pl

diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index c3293e93df..1ba1648af6 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 11;
+use Test::More tests => 13;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl
index 1db534c0dc..57674ff4b3 100644
--- a/src/bin/pg_rewind/t/002_databases.pl
+++ b/src/bin/pg_rewind/t/002_databases.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 9;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl
index f4710440fc..16c92cb2d6 100644
--- a/src/bin/pg_rewind/t/003_extrafiles.pl
+++ b/src/bin/pg_rewind/t/003_extrafiles.pl
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 5;
+use Test::More tests => 7;
 
 use File::Find;
 
diff --git a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
index 639eeb9c91..6dabd11db6 100644
--- a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
+++ b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
@@ -14,7 +14,7 @@ if ($windows_os)
 }
 else
 {
-	plan tests => 5;
+	plan tests => 7;
 }
 
 use FindBin;
diff --git a/src/bin/pg_rewind/t/005_same_timeline.pl b/src/bin/pg_rewind/t/005_same_timeline.pl
index 40dbc44caa..089466a

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-04 Thread Alexey Kondratov

On 04.10.2019 11:37, Michael Paquier wrote:

On Thu, Oct 03, 2019 at 12:43:37PM +0300, Alexey Kondratov wrote:

On 03.10.2019 6:07, Michael Paquier wrote:

I have reworked your first patch as per the attached.  What do you
think about it?  The part with the control file needs to go down to
v12, and I would likely split that into two commits on HEAD: one for
the control file and a second for the recovery.conf portion with the
fix for --no-ensure-shutdown to keep a cleaner history.

It looks fine for me excepting the progress reporting part. It now adds
PG_CONTROL_FILE_SIZE to fetch_done. However, I cannot find that control file
is either included into filemap and fetch_size or counted during
calculate_totals(). Maybe I've missed something, but now it looks like we
report something that wasn't planned for progress reporting, doesn't
it?

Right.  The pre-12 code actually handles that incorrecly as it assumed
that any files written through file_ops.c should be part of the
progress.  So I went with the simplest solution, and backpatched this
part with 6f3823b.  I have also committed the set of fixes for the new
options so as we have a better base of work than what's on HEAD
currently.


Great, thanks.



Regarding the tests, adding a --dry-run command is a good idea.
However I think that there is more value to automate the use of the
single user mode automatically in the tests as that's more critical
from the point of view of rewind run, and stopping the cluster with
immediate mode causes, as expected, the next --dry-run command to
fail.

Another thing is that I think that we should use -F with --single.
This makes recovery faster, and the target data folder is synced
at the end of pg_rewind anyway.

Using the long option names makes the tests easier to follow in this
case, so I have switched -R to --write-recovery-conf.

Some comments and the docs have been using some confusing wording, so
I have reworked what I found (like many "it" in a single sentence
referring different things).


I agree with all the points. Shutting down target server using 
'immediate' mode is a good way to test ensureCleanShutdown automatically.



Regarding all the set of incompatible options, we have much more of
that after the initial option parsing so I think that we should group
all the cheap ones together.  Let's tackle that as a separate patch.
We can also just check after --no-ensure-shutdown directly in
RewindTest.pm as I have switched the cluster to not be cleanly shut
down anymore to stress the automatic recovery path, and trigger that
before running pg_rewind for the local and remote mode.

Attached is an updated patch with all I found.  What do you think?


I've checked your patch, but it seems that it cannot be applied as is, 
since it e.g. adds a comment to 005_same_timeline.pl without actually 
changing the test. So I've slightly modified your patch and tried to fit 
both dry-run and ensureCleanShutdown testing together. It works just 
fine and fails immediately if any of recent fixes is reverted. I still 
think that dry-run testing is worth adding, since it helped to catch 
this v12 refactoring issue, but feel free to throw it way if it isn't 
commitable right now, of course.


As for incompatible options and sanity checks testing, yes, I agree that 
it is a matter of different patch. I attached it as a separate WIP patch 
just for history. Maybe I will try to gather more cases there later.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 6e5667edcad6b037004288635a7ae0eda40d4262 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 4 Oct 2019 17:14:12 +0300
Subject: [PATCH v3 1/2] Improve functionality, docs and tests of -R,
 --no-ensure-shutdown and --dry-run options

Branch: pg-rewind-fixes
---
 doc/src/sgml/ref/pg_rewind.sgml| 10 +--
 src/bin/pg_rewind/pg_rewind.c  | 19 +++---
 src/bin/pg_rewind/t/001_basic.pl   |  2 +-
 src/bin/pg_rewind/t/002_databases.pl   |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl  |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/005_same_timeline.pl   | 32 +++---
 src/bin/pg_rewind/t/RewindTest.pm  | 71 +-
 8 files changed, 103 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index fbf454803b..42d29edd4e 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -169,12 +169,14 @@ PostgreSQL documentation
   --no-ensure-shutdown
   

-pg_rewind verifies that the target server
-is cleanly shutdown before rewinding; by default, if it isn't, it
-starts the server in single-user mode to complete crash recovery.
+pg_rewind requires that the target server
+is cleanly shut down before rewinding. By default,

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-07 Thread Alexey Kondratov

On 07.10.2019 4:06, Michael Paquier wrote:

On Fri, Oct 04, 2019 at 05:21:25PM +0300, Alexey Kondratov wrote:

I've checked your patch, but it seems that it cannot be applied as is, since
it e.g. adds a comment to 005_same_timeline.pl without actually changing the
test. So I've slightly modified your patch and tried to fit both dry-run and
ensureCleanShutdown testing together. It works just fine and fails
immediately if any of recent fixes is reverted. I still think that dry-run
testing is worth adding, since it helped to catch this v12 refactoring
issue, but feel free to throw it way if it isn't commitable right now, of
course.

I can guarantee the last patch I sent can be applied on top of HEAD:
https://www.postgresql.org/message-id/20191004083721.ga1...@paquier.xyz


Yes, it did, but my comment was about these lines:

diff --git a/src/bin/pg_rewind/t/005_same_timeline.pl 
b/src/bin/pg_rewind/t/005_same_timeline.pl

index 40dbc44caa..df469d3939 100644
--- a/src/bin/pg_rewind/t/005_same_timeline.pl
+++ b/src/bin/pg_rewind/t/005_same_timeline.pl
@@ -1,3 +1,7 @@
+#
+# Test that running pg_rewind with the source and target clusters
+# on the same timeline runs successfully.
+#

You have added this new comment section, but kept the old one, which was 
pretty much the same [1].



Regarding the rest, I have hacked my way through as per the attached.
The previous set of patches did the following, which looked either
overkill or not necessary:
- Why running test 005 with the remote mode?


OK, it was definitely an overkill, since remote control file fetch will 
be also tested in any other remote test case.



- --dry-run coverage is basically the same with the local and remote
modes, so it seems like a waste of resource to run it for all the
tests and all the modes.


My point was to test --dry-run + --write-recover-conf in remote, since 
the last one may cause recovery configuration write without doing any 
actual work, due to some wrong refactoring for example.



- There is no need for the script checking for options combinations to
initialize a data folder.  It is important to design the tests to be
cheap and meaningful.


Yes, I agree, moving some of those tests to just a 001_basic seems to be 
a proper optimization.



Patch v3-0002 also had a test to make sure that the source server is
shut down cleanly before using it.  I have included that part as
well, as the flow feels right.

So, Alexey, what do you think?


It looks good for me. Two minor remarks:

+    # option combinations.  As the code paths taken by those tests
+    # does not change for the "local" and "remote" modes, just run them

I am far from being fluent in English, but should it be 'do not change' 
instead?


+command_fails(
+    [
+        'pg_rewind', '--target-pgdata',
+        $primary_pgdata, '--source-pgdata',
+        $standby_pgdata, 'extra_arg1'
+    ],

Here and below I would prefer traditional options ordering "'--key', 
'value'". It should be easier to recognizefrom the reader perspective:


+command_fails(
+    [
+        'pg_rewind',
+        '--target-pgdata', $primary_pgdata,
+        '--source-pgdata', $standby_pgdata,
+    'extra_arg1'
+    ],


[1] 
https://github.com/postgres/postgres/blob/caa078353ecd1f3b3681c0d4fa95ad4bb8c2308a/src/bin/pg_rewind/t/005_same_timeline.pl#L15



--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-22 Thread Alexey Kondratov

On 22.10.2019 20:22, Tomas Vondra wrote:

On Tue, Oct 22, 2019 at 11:01:48AM +0530, Dilip Kumar wrote:
On Tue, Oct 22, 2019 at 10:46 AM Amit Kapila 
 wrote:

  In general, yours and Alexy's test results

show that there is merit by having workers applying such transactions.
  OTOH, as noted above [1], we are also worried about the performance
of Rollbacks if we follow that approach.  I am not sure how much we
need to worry about Rollabcks if commits are faster, but can we think
of recording the changes in memory and only write to a file if the
changes are above a certain threshold?  I think that might help saving
I/O in many cases.  I am not very sure if we do that how much
additional workers can help, but they might still help.  I think we
need to do some tests and experiments to figure out what is the best
approach?  What do you think?

I agree with the point.  I think we might need to do some small
changes and test to see what could be the best method to handle the
streamed changes at the subscriber end.



Tomas, Alexey, do you have any thoughts on this matter?  I think it is
important that we figure out the way to proceed in this patch.

[1] - 
https://www.postgresql.org/message-id/b25ce80e-f536-78c8-d5c8-a5df3e230785%40postgrespro.ru






I think the patch should do the simplest thing possible, i.e. what it
does today. Otherwise we'll never get it committed.



I have to agree with Tomas, that keeping things as simple as possible 
should be a main priority right now. Otherwise, the entire patch set 
will pass next release cycle without being committed at least partially. 
In the same time, it resolves important problem from my perspective. It 
moves I/O overhead from primary to replica using large transactions 
streaming, which is a nice to have feature I guess.


Later it would be possible to replace logical apply worker with 
bgworkers pool in a separated patch, if we decide that it is a viable 
solution. Anyway, regarding the Amit's questions:


- I doubt that maintaining a separate buffer on the apply side before 
spilling to disk would help enough. We already have ReorderBuffer with 
logical_work_mem limit, and if we exceeded that limit on the sender 
side, then most probably we exceed it on the applier side as well, 
excepting the case when this new buffer size will be significantly 
higher then logical_work_mem to keep multiple open xacts.


- I still think that we should optimize database for commits, not 
rollbacks. BGworkers pool is dramatically slower for rollbacks-only 
load, though being at least twice as faster for commits-only. I do not 
know how it will perform with real life load, but this drawback may be 
inappropriate for such a general purpose database like Postgres.


- Tomas' implementation of streaming with spilling does not have this 
bias between commits/aborts. However, it has a noticeable performance 
drop (~x5 slower compared with master [1]) for large transaction 
consisting of many small rows. Although it is not of an order of 
magnitude slower.


Another thing is it that about a year ago I have found some problems 
with MVCC/visibility and fixed them somehow [1]. If I get it correctly 
Tomas adapted some of those fixes into his patch set, but I think that 
this part should be reviewed carefully again. I would be glad to check 
it, but now I am a little bit confused with all the patch set variants 
in the thread. Which is the last one? Is it still dependent on 2pc decoding?


[1] 
https://www.postgresql.org/message-id/flat/40c38758-04b5-74f4-c963-cf300f9e5dff%40postgrespro.ru#98d06fefc88122385dacb2f03f7c30f7



Thanks for moving this patch forward!

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Free port choosing freezes when PostgresNode::use_tcp is used on BSD systems

2021-04-19 Thread Alexey Kondratov

Hi Hackers,

Inside PostgresNode.pm there is a free port choosing routine --- 
get_free_port(). The comment section there says:


# On non-Linux, non-Windows kernels, binding to 127.0.0/24 addresses
# other than 127.0.0.1 might fail with EADDRNOTAVAIL.

And this is an absolute true, on BSD-like systems (macOS and FreeBSD 
tested) it hangs on looping through the entire ports range over and over 
when $PostgresNode::use_tcp = 1 is set, since bind fails with:


# Checking port 52208
# bind: 127.0.0.1 52208
# bind: 127.0.0.2 52208
bind: Can't assign requested address

To reproduce just apply reproduce.diff and try to run 'make -C 
src/bin/pg_ctl check'.


This is not a case with standard Postgres tests, since TestLib.pm 
chooses unix sockets automatically everywhere outside Windows. However, 
we got into this problem when tried to run a custom tap test that 
required TCP for stable running.


That way, if it really could happen why not to just skip binding to 
127.0.0/24 addresses other than 127.0.0.1 outside of Linux/Windows as 
per attached patch_PostgresNode.diff?



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Companydiff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index db47a97d196..9add9bde2a4 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1203,7 +1203,7 @@ sub get_free_port
 		if ($found == 1)
 		{
 			foreach my $addr (qw(127.0.0.1),
-$use_tcp ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ())
+$use_tcp && ($^O eq "linux" || $TestLib::windows_os) ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ())
 			{
 if (!can_bind($addr, $port))
 {
diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index b1e419f02e9..c25c0793537 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -11,6 +11,8 @@ use Test::More tests => 24;
 my $tempdir   = TestLib::tempdir;
 my $tempdir_short = TestLib::tempdir_short;
 
+$PostgresNode::use_tcp = 1;
+
 program_help_ok('pg_ctl');
 program_version_ok('pg_ctl');
 program_options_handling_ok('pg_ctl');


Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-09 Thread Alexey Kondratov

Hi Hackers,

Today I have accidentally noticed that autoprewarm feature of pg_prewarm 
used TimestampDifference()'s results in a wrong way.


First, it used *seconds* result from it as a *milliseconds*. It was 
causing it to make dump file autoprewarm.blocks ~every second with 
default setting of autoprewarm_interval = 300s.


Here is a log part with debug output in this case:

```
2020-11-09 19:09:00.162 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:01.161 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:02.160 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:03.159 MSK [85328] LOG:  dumping autoprewarm.blocks
```

After fixing this issue I have noticed that it still dumps blocks twice 
at each timeout (here I set autoprewarm_interval to 15s):


```
2020-11-09 19:18:59.692 MSK [85662] LOG:  dumping autoprewarm.blocks
2020-11-09 19:18:59.700 MSK [85662] LOG:  dumping autoprewarm.blocks

2020-11-09 19:19:14.694 MSK [85662] LOG:  dumping autoprewarm.blocks
2020-11-09 19:19:14.704 MSK [85662] LOG:  dumping autoprewarm.blocks
```

This happens because at timeout time we were using continue, but 
actually we still have to wait the entire autoprewarm_interval after 
successful dumping.


I have fixed both issues in the attached patches and also added a 
minimalistic tap test as a first one to verify that this automatic 
damping still works after refactoring. I put Robert into CC, since he is 
an author of this feature.


What do you think?


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 6d4bab7f21c3661dd4dd5a0de7e097b1de3f642c Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:24:55 +0300
Subject: [PATCH v1 3/3] pg_prewarm: refactor autoprewarm waits

Previously it was dumping twice at every timeout time.
---
 contrib/pg_prewarm/autoprewarm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index b18a065ed5..f52c83de1e 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -238,7 +238,9 @@ autoprewarm_main(Datum main_arg)
 			{
 last_dump_time = GetCurrentTimestamp();
 apw_dump_now(true, false);
-continue;
+
+/* We have to sleep even after a successfull dump */
+delay_in_ms = autoprewarm_interval * 1000;
 			}
 
 			/* Sleep until the next dump time. */
-- 
2.19.1

From 8793b8beb6a5c1ae730f1fffb09dff64c83bc631 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:12:00 +0300
Subject: [PATCH v1 2/3] pg_prewarm: fix autoprewarm_interval behaviour.

Previously it misused seconds from TimestampDifference() as
milliseconds, so it was dumping autoprewarm.blocks ~every second
event with default autoprewarm_interval = 300s.
---
 contrib/pg_prewarm/autoprewarm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index d3dec6e3ec..b18a065ed5 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -231,7 +231,7 @@ autoprewarm_main(Datum main_arg)
 			autoprewarm_interval * 1000);
 			TimestampDifference(GetCurrentTimestamp(), next_dump_time,
 &secs, &usecs);
-			delay_in_ms = secs + (usecs / 1000);
+			delay_in_ms = secs * 1000 + (usecs / 1000);
 
 			/* Perform a dump if it's time. */
 			if (delay_in_ms <= 0)
-- 
2.19.1

From 31dc30c97861afae9c34852afc5a5b1c91bbeadc Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:04:10 +0300
Subject: [PATCH v1 1/3] pg_prewarm: add tap test for autoprewarm feature

---
 contrib/pg_prewarm/Makefile |  2 +
 contrib/pg_prewarm/t/001_autoprewarm.pl | 51 +
 2 files changed, 53 insertions(+)
 create mode 100644 contrib/pg_prewarm/t/001_autoprewarm.pl

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index b13ac3c813..9cfde8c4e4 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -10,6 +10,8 @@ EXTENSION = pg_prewarm
 DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/pg_prewarm/t/001_autoprewarm.pl b/contrib/pg_prewarm/t/001_autoprewarm.pl
new file mode 100644
index 00..b564c29931
--- /dev/null
+++ b/contrib/pg_prewarm/t/001_autoprewarm.pl
@@ -0,0 +1,51 @@
+#
+# Check that pg_prewarm can dump blocks from shared buffers
+# to PGDATA/autoprewarm.blocks.
+#
+
+use strict;
+use Test::More;
+use TestLib;
+use Time::HiRes qw(usleep);
+use warnings;
+
+use PostgresNode;
+
+plan tests => 3;
+
+my $node = get_new_node("node");
+$node->init;
+$node->append_conf(
+'postgresql.conf', qq(
+sha

Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-09 Thread Alexey Kondratov

On 2020-11-09 21:53, Tom Lane wrote:

Alexey Kondratov  writes:
After fixing this issue I have noticed that it still dumps blocks 
twice

at each timeout (here I set autoprewarm_interval to 15s):
...
This happens because at timeout time we were using continue, but
actually we still have to wait the entire autoprewarm_interval after
successful dumping.


I don't think your 0001 is correct.  It would be okay if apw_dump_now()
could be counted on to take negligible time, but we shouldn't assume
that should we?



Yes, it seems so, if I understand you correctly. I had a doubt about 
possibility of pg_ctl to exit earlier than a dumping process. Now I 
added an explicit wait for dump file into test.



I agree that the "continue" seems a bit bogus, because it's skipping
the ResetLatch call at the bottom of the loop; it's not quite clear
to me whether that's a good thing or not.  But the general idea of
the existing code seems to be to loop around and make a fresh 
calculation

of how-long-to-wait, and that doesn't seem wrong.


I have left the last patch intact, since it resolves the 'double dump' 
issue, but I agree with нщгк point about existing logic of the code, 
although it is a bit broken. So I have to think more about how to fix it 
in a better way.


0002 seems like a pretty clear bug fix, though I wonder if this is 
exactly
what we want to do going forward.  It seems like a very large fraction 
of
the callers of TimestampDifference would like to have the value in 
msec,

which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again.  Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).


Yeah, I get into this problem after a bug in another extension — 
pg_wait_sampling. I have attached 0002, which implements 
TimestampDifferenceMilliseconds(), so 0003 just uses this new function 
to solve the initial issues. If it looks good to you, then we can switch 
all similar callers to it.



BTW, I see another bug of a related ilk.  Look what
postgres_fdw/connection.c is doing:

TimestampDifference(now, endtime, &secs, µsecs);

/* To protect against clock skew, limit sleep to one 
minute. */
cur_timeout = Min(6, secs * USECS_PER_SEC + 
microsecs);


/* Sleep until there's something to do */
wc = WaitLatchOrSocket(MyLatch,
   WL_LATCH_SET | 
WL_SOCKET_READABLE |
   WL_TIMEOUT | 
WL_EXIT_ON_PM_DEATH,

   PQsocket(conn),
   cur_timeout, PG_WAIT_EXTENSION);

WaitLatchOrSocket's timeout is measured in msec not usec.  I think the
comment about "clock skew" is complete BS, and the Min() calculation 
was

put in as a workaround by somebody observing that the sleep waited too
long, but not understanding why.


I wonder how many troubles one can get with all these unit conversions.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom c79de17014753b311858b4570ca475f713328c62 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:24:55 +0300
Subject: [PATCH v2 4/4] pg_prewarm: refactor autoprewarm waits

Previously it was dumping twice at every timeout time.
---
 contrib/pg_prewarm/autoprewarm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index e5bd130bc8..872c7d51b1 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -236,7 +236,9 @@ autoprewarm_main(Datum main_arg)
 			{
 last_dump_time = GetCurrentTimestamp();
 apw_dump_now(true, false);
-continue;
+
+/* We have to sleep even after a successful dump */
+delay_in_ms = autoprewarm_interval * 1000;
 			}
 
 			/* Sleep until the next dump time. */
-- 
2.19.1

From c38c07708d57d6dec5a8a1697ca9c9810ad4d7ce Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:12:00 +0300
Subject: [PATCH v2 3/4] pg_prewarm: fix autoprewarm_interval behaviour.

Previously it misused seconds from TimestampDifference() as
milliseconds, so it was dumping autoprewarm.blocks ~every second
event with default autoprewarm_interval = 300s.
---
 contrib/pg_prewarm/autoprewarm.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index d3dec6e3ec..e5bd130bc8 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -222,16 +222,14 @@ autoprewarm_main(Datum main_arg)
 		{
 			long		delay_in_ms = 0;
 			TimestampTz next_dump_time = 0;
-			long		secs = 0;
-			int			usecs = 0;
 
 			/* Compute the nex

Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-10 Thread Alexey Kondratov

On 2020-11-09 23:25, Tom Lane wrote:

Alexey Kondratov  writes:

On 2020-11-09 21:53, Tom Lane wrote:
0002 seems like a pretty clear bug fix, though I wonder if this is 
exactly
what we want to do going forward.  It seems like a very large 
fraction of
the callers of TimestampDifference would like to have the value in 
msec,

which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again.  Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).



Yeah, I get into this problem after a bug in another extension —
pg_wait_sampling. I have attached 0002, which implements
TimestampDifferenceMilliseconds(), so 0003 just uses this new function
to solve the initial issues. If it looks good to you, then we can 
switch

all similar callers to it.


Yeah, let's move forward with that --- in fact, I'm inclined to
back-patch it.  (Not till the current release cycle is done, though.
I don't find this important enough to justify a last-moment patch.)

BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds
round any fractional millisecond up rather than down.  Rounding down
seems to create a hazard of uselessly waking just before the delay is
completed.  Better to wake just after.



Yes, it make sense. I have changed TimestampDifferenceMilliseconds() to 
round result up if there is a reminder.


After looking on the autoprewarm code more closely I have realised that 
this 'double dump' issues was not an issues at all. I have just 
misplaced a debug elog(), so its second output in the log was only 
indicating that we calculated delay_in_ms one more time. Actually, even 
with wrong calculation of delay_in_ms the only problem was that we were 
busy looping with ~1 second interval instead of waiting on latch.


It is still a buggy behaviour, but much less harmful than I have 
originally thought.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom ce09103d9d58b611728b66366cd24e8a4069f7ac Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:04:10 +0300
Subject: [PATCH v3 3/3] pg_prewarm: add tap test for autoprewarm feature

---
 contrib/pg_prewarm/Makefile |  2 +
 contrib/pg_prewarm/t/001_autoprewarm.pl | 59 +
 2 files changed, 61 insertions(+)
 create mode 100644 contrib/pg_prewarm/t/001_autoprewarm.pl

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index b13ac3c813..9cfde8c4e4 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -10,6 +10,8 @@ EXTENSION = pg_prewarm
 DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/pg_prewarm/t/001_autoprewarm.pl b/contrib/pg_prewarm/t/001_autoprewarm.pl
new file mode 100644
index 00..f55b2a5352
--- /dev/null
+++ b/contrib/pg_prewarm/t/001_autoprewarm.pl
@@ -0,0 +1,59 @@
+#
+# Check that pg_prewarm can dump blocks from shared buffers
+# to PGDATA/autoprewarm.blocks.
+#
+
+use strict;
+use Test::More;
+use TestLib;
+use Time::HiRes qw(usleep);
+use warnings;
+
+use PostgresNode;
+
+plan tests => 3;
+
+# Wait up to 180s for pg_prewarm to dump blocks.
+sub wait_for_dump
+{
+	my $path = shift;
+
+	foreach my $i (0 .. 1800)
+	{
+		last if -e $path;
+		usleep(100_000);
+	}
+}
+
+my $node = get_new_node("node");
+$node->init;
+$node->append_conf(
+	'postgresql.conf', qq(
+shared_preload_libraries = 'pg_prewarm'
+pg_prewarm.autoprewarm = 'on'
+pg_prewarm.autoprewarm_interval = 1
+));
+$node->start;
+
+my $blocks_path = $node->data_dir . '/autoprewarm.blocks';
+
+# Check that we can dump blocks on timeout.
+wait_for_dump($blocks_path);
+ok(-e $blocks_path, 'file autoprewarm.blocks should be present in the PGDATA');
+
+# Check that we can dump blocks on shutdown.
+$node->stop;
+$node->append_conf(
+	'postgresql.conf', qq(
+pg_prewarm.autoprewarm_interval = 0
+));
+
+# Remove autoprewarm.blocks
+unlink($blocks_path) || die "$blocks_path: $!";
+ok(!-e $blocks_path, 'sanity check, dump on timeout is turned off');
+
+$node->start;
+$node->stop;
+
+wait_for_dump($blocks_path);
+ok(-e $blocks_path, 'file autoprewarm.blocks should be present in the PGDATA after clean shutdown');
-- 
2.19.1

From fba212ed765c8c411db1ca19c2ac991662109d99 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 9 Nov 2020 19:12:00 +0300
Subject: [PATCH v3 2/3] pg_prewarm: fix autoprewarm_interval behaviour

Previously it misused seconds from TimestampDifference() as
milliseconds, so it was busy looping with ~1 second interval
instead of wai

Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

2020-11-12 Thread Alexey Kondratov

On 2020-11-11 06:59, Tom Lane wrote:

Alexey Kondratov  writes:
After looking on the autoprewarm code more closely I have realised 
that

this 'double dump' issues was not an issues at all. I have just
misplaced a debug elog(), so its second output in the log was only
indicating that we calculated delay_in_ms one more time.


Ah --- that explains why I couldn't see a problem.

I've pushed 0001+0002 plus some followup work to fix other places
that could usefully use TimestampDifferenceMilliseconds().  I have
not done anything with 0003 (the TAP test for pg_prewarm), and will
leave that to the judgment of somebody who's worked with pg_prewarm
before.  To me it looks like it's not really testing things very
carefully at all; on the other hand, we have exactly zero test
coverage of that module today, so maybe something is better than
nothing.



Great, thank you for generalisation of the issue and working on it.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-17 Thread Alexey Kondratov

Hi,

On 2020-11-06 18:56, Anastasia Lubennikova wrote:

Status update for a commitfest entry.

This thread was inactive for a while and from the latest messages, I
see that the patch needs some further work.
So I move it to "Waiting on Author".

The new status of this patch is: Waiting on Author


I had a look on the initial patch and discussed options [1] to proceed 
with this issue. I agree with Bruce about idle_session_timeout, it would 
be a nice to have in-core feature on its own. However, this should be a 
cluster-wide option and it will start dropping all idle connection not 
only foreign ones. So it may be not an option for some cases, when the 
same foreign server is used for another load as well.


Regarding the initial issue I prefer point #3, i.e. foreign server 
option. It has a couple of benefits IMO: 1) it may be set separately on 
per foreign server basis, 2) it will live only in the postgres_fdw 
contrib without any need to touch core. I would only supplement this 
postgres_fdw foreign server option with a GUC, e.g. 
postgres_fdw.keep_connections, so one could easily define such behavior 
for all foreign servers at once or override server-level option by 
setting this GUC on per session basis.


Attached is a small POC patch, which implements this contrib-level 
postgres_fdw.keep_connections GUC. What do you think?


[1] 
https://www.postgresql.org/message-id/CALj2ACUFNydy0uo0JL9A1isHQ9pFe1Fgqa_HVanfG6F8g21nSQ%40mail.gmail.com



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Companydiff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index ab3226287d..64f0e96635 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -28,6 +28,8 @@
 #include "utils/memutils.h"
 #include "utils/syscache.h"
 
+#include "postgres_fdw.h"
+
 /*
  * Connection cache hash table entry
  *
@@ -948,6 +950,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 		 */
 		if (PQstatus(entry->conn) != CONNECTION_OK ||
 			PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
+			!keep_connections ||
 			entry->changing_xact_state)
 		{
 			elog(DEBUG3, "discarding connection %p", entry->conn);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..4cd5f71223 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -45,6 +45,8 @@
 #include "utils/sampling.h"
 #include "utils/selfuncs.h"
 
+#include "postgres_fdw.h"
+
 PG_MODULE_MAGIC;
 
 /* Default CPU cost to start up a foreign query. */
@@ -301,6 +303,8 @@ typedef struct
 	List	   *already_used;	/* expressions already dealt with */
 } ec_member_foreign_arg;
 
+bool keep_connections = true;
+
 /*
  * SQL functions
  */
@@ -505,6 +509,15 @@ static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
 			  const PgFdwRelationInfo *fpinfo_o,
 			  const PgFdwRelationInfo *fpinfo_i);
 
+void
+_PG_init(void)
+{
+	DefineCustomBoolVariable("postgres_fdw.keep_connections",
+			 "Enables postgres_fdw connection caching.",
+			 "When off postgres_fdw will close connections at the end of transaction.",
+			 &keep_connections, true, PGC_USERSET, 0, NULL,
+			 NULL, NULL);
+}
 
 /*
  * Foreign-data wrapper handler function: return a struct with pointers
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..7f1bdb96d6 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -124,9 +124,12 @@ typedef struct PgFdwRelationInfo
 	int			relation_index;
 } PgFdwRelationInfo;
 
+extern bool keep_connections;
+
 /* in postgres_fdw.c */
 extern int	set_transmission_modes(void);
 extern void reset_transmission_modes(int nestlevel);
+extern void _PG_init(void);
 
 /* in connection.c */
 extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt);


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-18 Thread Alexey Kondratov

On 2020-11-18 16:39, Bharath Rupireddy wrote:

Thanks for the interest shown!

On Wed, Nov 18, 2020 at 1:07 AM Alexey Kondratov
 wrote:


Regarding the initial issue I prefer point #3, i.e. foreign server
option. It has a couple of benefits IMO: 1) it may be set separately 
on

per foreign server basis, 2) it will live only in the postgres_fdw
contrib without any need to touch core. I would only supplement this
postgres_fdw foreign server option with a GUC, e.g.
postgres_fdw.keep_connections, so one could easily define such 
behavior

for all foreign servers at once or override server-level option by
setting this GUC on per session basis.



Below is what I have in my mind, mostly inline with yours:

a) Have a server level option (keep_connetion true/false, with the
default being true), when set to false the connection that's made with
this foreign server is closed and cached entry from the connection
cache is deleted at the end of txn in pgfdw_xact_callback.
b) Have postgres_fdw level GUC postgres_fdw.keep_connections default
being true. When set to false by the user, the connections, that are
used after this, are closed and removed from the cache at the end of
respective txns. If we don't use a connection that was cached prior to
the user setting the GUC as false,  then we may not be able to clear
it. We can avoid this problem by recommending users either to set the
GUC to false right after the CREATE EXTENSION postgres_fdw; or else
use the function specified in (c).
c) Have a new function that gets defined as part of CREATE EXTENSION
postgres_fdw;, say postgres_fdw_discard_connections(), similar to
dblink's dblink_disconnect(), which discards all the remote
connections and clears connection cache. And we can also have server
name as input to postgres_fdw_discard_connections() to discard
selectively.

Thoughts? If okay with the approach, I will start working on the patch.



This approach looks solid enough from my perspective to give it a try. I 
would only make it as three separate patches for an ease of further 
review.




Attached is a small POC patch, which implements this contrib-level
postgres_fdw.keep_connections GUC. What do you think?



I see two problems with your patch: 1) It just disconnects the remote
connection at the end of txn if the GUC is set to false, but it
doesn't remove the connection cache entry from ConnectionHash.


Yes, and this looks like a valid state for postgres_fdw and it can get 
into the same state even without my patch. Next time GetConnection() 
will find this cache entry, figure out that entry->conn is NULL and 
establish a fresh connection. It is not clear for me right now, what 
benefits we will get from clearing also this cache entry, except just 
doing this for sanity.



2) What
happens if there are some cached connections, user set the GUC to
false and not run any foreign queries or not use those connections
thereafter, so only the new connections will not be cached? Will the
existing unused connections still remain in the connection cache? See
(b) above for a solution.



Yes, they will. This could be solved with that additional disconnect 
function as you proposed in c).



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-19 Thread Alexey Kondratov

On 2020-11-19 07:11, Bharath Rupireddy wrote:

On Wed, Nov 18, 2020 at 10:32 PM Alexey Kondratov
 wrote:

Thanks! I will make separate patches and post them soon.



>> Attached is a small POC patch, which implements this contrib-level
>> postgres_fdw.keep_connections GUC. What do you think?

 >

> I see two problems with your patch: 1) It just disconnects the remote
> connection at the end of txn if the GUC is set to false, but it
> doesn't remove the connection cache entry from ConnectionHash.

Yes, and this looks like a valid state for postgres_fdw and it can get
into the same state even without my patch. Next time GetConnection()
will find this cache entry, figure out that entry->conn is NULL and
establish a fresh connection. It is not clear for me right now, what
benefits we will get from clearing also this cache entry, except just
doing this for sanity.



By clearing the cache entry we will have 2 advantages: 1) we could
save a(small) bit of memory 2) we could allow new connections to be
cached, currently ConnectionHash can have only 8 entries. IMHO, along
with disconnecting, we can also clear off the cache entry. Thoughts?



IIUC, 8 is not a hard limit, it is just a starting size. ConnectionHash 
is not a shared-memory hash table, so dynahash can expand it on-the-fly 
as follow, for example, from the comment before hash_create():


 * Note: for a shared-memory hashtable, nelem needs to be a pretty good
 * estimate, since we can't expand the table on the fly.  But an 
unshared

 * hashtable can be expanded on-the-fly, so it's better for nelem to be
 * on the small side and let the table grow if it's exceeded.  An overly
 * large nelem will penalize hash_seq_search speed without buying much.

Also I am not sure that by doing just a HASH_REMOVE you will free any 
memory, since hash table is already allocated (or expanded) to some 
size. So HASH_REMOVE will only add removed entry to the freeList, I 
guess.


Anyway, I can hardly imagine bloating of ConnectionHash to be a problem 
even in the case, when one has thousands of foreign servers all being 
accessed during a single backend life span.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-23 Thread Alexey Kondratov

Hi,

On 2020-11-23 09:48, Bharath Rupireddy wrote:


Here is how I'm making 4 separate patches:

1. new function and it's documentation.
2. GUC and it's documentation.
3. server level option and it's documentation.
4. test cases for all of the above patches.



Hi, I'm attaching the patches here. Note that, though the code changes
for this feature are small, I divided them up as separate patches to
make review easy.

v1-0001-postgres_fdw-function-to-discard-cached-connections.patch



This patch looks pretty straightforward for me, but there are some 
things to be addressed IMO:


+   server = GetForeignServerByName(servername, true);
+
+   if (server != NULL)
+   {

Yes, you return a false if no server was found, but for me it worth 
throwing an error in this case as, for example, dblink does in the 
dblink_disconnect().


+ result = disconnect_cached_connections(FOREIGNSERVEROID,
+hashvalue,
+false);

+   if (all || (!all && cacheid == FOREIGNSERVEROID &&
+   entry->server_hashvalue == hashvalue))
+   {
+   if (entry->conn != NULL &&
+   !all && cacheid == FOREIGNSERVEROID &&
+   entry->server_hashvalue == hashvalue)

These conditions look bulky for me. First, you pass FOREIGNSERVEROID to 
disconnect_cached_connections(), but actually it just duplicates 'all' 
flag, since when it is 'FOREIGNSERVEROID', then 'all == false'; when it 
is '-1', then 'all == true'. That is all, there are only two calls of 
disconnect_cached_connections(). That way, it seems that we should keep 
only 'all' flag at least for now, doesn't it?


Second, I think that we should just rewrite this if statement in order 
to simplify it and make more readable, e.g.:


if ((all || entry->server_hashvalue == hashvalue) &&
entry->conn != NULL)
{
disconnect_pg_server(entry);
result = true;
}

+   if (all)
+   {
+   hash_destroy(ConnectionHash);
+   ConnectionHash = NULL;
+   result = true;
+   }

Also, I am still not sure that it is a good idea to destroy the whole 
cache even in 'all' case, but maybe others will have a different 
opinion.




v1-0002-postgres_fdw-add-keep_connections-GUC-to-not-cache-connections.patch



+   entry->changing_xact_state) ||
+   (entry->used_in_current_xact &&
+   !keep_connections))

I am not sure, but I think, that instead of adding this additional flag 
into ConnCacheEntry structure we can look on entry->xact_depth and use 
local:


bool used_in_current_xact = entry->xact_depth > 0;

for exactly the same purpose. Since we set entry->xact_depth to zero at 
the end of xact, then it was used if it is not zero. It is set to 1 by 
begin_remote_xact() called by GetConnection(), so everything seems to be 
fine.


Otherwise, both patches seem to be working as expected. I am going to 
have a look on the last two patches a bit later.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-24 Thread Alexey Kondratov

On 2020-11-24 06:52, Bharath Rupireddy wrote:

Thanks for the review comments.

On Mon, Nov 23, 2020 at 9:57 PM Alexey Kondratov
 wrote:


> v1-0001-postgres_fdw-function-to-discard-cached-connections.patch

This patch looks pretty straightforward for me, but there are some
things to be addressed IMO:

+   server = GetForeignServerByName(servername, true);
+
+   if (server != NULL)
+   {

Yes, you return a false if no server was found, but for me it worth
throwing an error in this case as, for example, dblink does in the
dblink_disconnect().



dblink_disconnect() "Returns status, which is always OK (since any
error causes the function to throw an error instead of returning)."
This behaviour doesn't seem okay to me.

Since we throw true/false, I would prefer to throw a warning(with a
reason) while returning false over an error.



I thought about something a bit more sophisticated:

1) Return 'true' if there were open connections and we successfully 
closed them.
2) Return 'false' in the no-op case, i.e. there were no open 
connections.
3) Rise an error if something went wrong. And non-existing server case 
belongs to this last category, IMO.


That looks like a semantically correct behavior, but let us wait for any 
other opinion.






+ result = disconnect_cached_connections(FOREIGNSERVEROID,
+hashvalue,
+false);

+   if (all || (!all && cacheid == FOREIGNSERVEROID &&
+   entry->server_hashvalue == hashvalue))
+   {
+   if (entry->conn != NULL &&
+   !all && cacheid == FOREIGNSERVEROID &&
+   entry->server_hashvalue == hashvalue)

These conditions look bulky for me. First, you pass FOREIGNSERVEROID 
to

disconnect_cached_connections(), but actually it just duplicates 'all'
flag, since when it is 'FOREIGNSERVEROID', then 'all == false'; when 
it

is '-1', then 'all == true'. That is all, there are only two calls of
disconnect_cached_connections(). That way, it seems that we should 
keep

only 'all' flag at least for now, doesn't it?



I added cachid as an argument to disconnect_cached_connections() for
reusability. Say, someone wants to use it with a user mapping then
they can pass cacheid USERMAPPINGOID, hash value of user mapping. The
cacheid == USERMAPPINGOID && entry->mapping_hashvalue == hashvalue can
be added to disconnect_cached_connections().



Yeah, I have got your point and motivation to add this argument, but how 
we can use it? To disconnect all connections belonging to some specific 
user mapping? But any user mapping is hard bound to some foreign server, 
AFAIK, so we can pass serverid-based hash in this case.


In the case of pgfdw_inval_callback() this argument makes sense, since 
syscache callbacks work that way, but here I can hardly imagine a case 
where we can use it. Thus, it still looks as a preliminary complication 
for me, since we do not have plans to use it, do we? Anyway, everything 
seems to be working fine, so it is up to you to keep this additional 
argument.




v1-0003-postgres_fdw-server-level-option-keep_connection.patch
This patch adds a new server level option, keep_connection, default
being on, when set to off, the local session doesn't cache the
connections associated with the foreign server.



This patch looks good to me, except one note:

(entry->used_in_current_xact &&
-   !keep_connections))
+   (!keep_connections || !entry->keep_connection)))
{

Following this logic:

1) If keep_connections == true, then per-server keep_connection has a 
*higher* priority, so one can disable caching of a single foreign 
server.


2) But if keep_connections == false, then it works like a global switch 
off indifferently of per-server keep_connection's, i.e. they have a 
*lower* priority.


It looks fine for me, at least I cannot propose anything better, but 
maybe it should be documented in 0004?




v1-0004-postgres_fdw-connection-cache-discard-tests-and-documentation.patch
This patch adds the tests and documentation related to this feature.



I have not read all texts thoroughly, but what caught my eye:

+   A GUC, postgres_fdw.keep_connections, default 
being
+   on, when set to off, the local 
session


I think that GUC acronym is used widely only in the source code and 
Postgres docs tend to do not use it at all, except from acronyms list 
and a couple of 'GUC parameters' collocation usage. And it never used in 
a singular form there, so I think that it should be rather:


A configuration parameter, 
postgres_fdw.keep_connections, default being...


+ 
+  Note that when postgres_fdw.keep_connections 
is set to
+  

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-11-25 Thread Alexey Kondratov

On 2020-11-25 06:17, Bharath Rupireddy wrote:

On Wed, Nov 25, 2020 at 7:24 AM Craig Ringer
 wrote:


A quick thought here.

Would it make sense to add a hook in the DISCARD ALL implementation 
that postgres_fdw can register for?


There's precedent here, since DISCARD ALL already has the same effect 
as SELECT pg_advisory_unlock_all(); amongst other things.




IIUC, then it is like a core(server) function doing some work for the
postgres_fdw module. Earlier in the discussion, one point raised was
that it's better not to have core handling something related to
postgres_fdw. This is the reason we have come up with postgres_fdw
specific function and a GUC, which get defined when extension is
created. Similarly, dblink also has it's own bunch of functions one
among them is dblink_disconnect().



If I have got Craig correctly, he proposed that we already have a 
DISCARD ALL statement, which is processed by DiscardAll(), and it 
releases internal resources known from the core perspective. That way, 
we can introduce a general purpose hook DiscardAll_hook(), so 
postgres_fdw can get use of it to clean up its own resources 
(connections in our context) if needed. In other words, it is not like a 
core function doing some work for the postgres_fdw module, but rather 
like a callback/hook, that postgres_fdw is able to register to do some 
additional work.


It can be a good replacement for 0001, but won't it be already an 
overkill to drop all local caches along with remote connections? I mean, 
that it would be a nice to have hook from the extensibility perspective, 
but postgres_fdw_disconnect() still makes sense, since it does a very 
narrow and specific job.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-13 Thread Alexey Kondratov

On 2021-01-13 14:34, Michael Paquier wrote:

On Wed, Jan 13, 2021 at 05:22:49PM +0900, Michael Paquier wrote:

Yeah, that makes sense.  I'll send an updated patch based on that.


And here you go as per the attached.  I don't think that there was
anything remaining on my radar.  This version still needs to be
indented properly though.

Thoughts?



Thanks.

+   bits32  options;/* bitmask of 
CLUSTEROPT_* */

This should say '/* bitmask of CLUOPT_* */', I guess, since there are 
only CLUOPT's defined. Otherwise, everything looks as per discussed 
upthread.


By the way, something went wrong with the last email subject, so I have 
changed it back to the original in this response. I also attached your 
patch (with only this CLUOPT_* correction) to keep it in the thread for 
sure. Although, postgresql.org's web archive is clever enough to link 
your email to the same thread even with different subject.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Companydiff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9904a76387..43cfdeaa6b 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -30,13 +30,16 @@ typedef enum
 } IndexStateFlagsAction;
 
 /* options for REINDEX */
-typedef enum ReindexOption
+typedef struct ReindexParams
 {
-	REINDEXOPT_VERBOSE = 1 << 0,	/* print progress info */
-	REINDEXOPT_REPORT_PROGRESS = 1 << 1,	/* report pgstat progress */
-	REINDEXOPT_MISSING_OK = 1 << 2, /* skip missing relations */
-	REINDEXOPT_CONCURRENTLY = 1 << 3	/* concurrent mode */
-} ReindexOption;
+	bits32		options;			/* bitmask of REINDEXOPT_* */
+} ReindexParams;
+
+/* flag bits for ReindexParams->flags */
+#define REINDEXOPT_VERBOSE		0x01	/* print progress info */
+#define REINDEXOPT_REPORT_PROGRESS 0x02 /* report pgstat progress */
+#define REINDEXOPT_MISSING_OK 	0x04	/* skip missing relations */
+#define REINDEXOPT_CONCURRENTLY	0x08	/* concurrent mode */
 
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
@@ -146,7 +149,7 @@ extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
 extern Oid	IndexGetRelation(Oid indexId, bool missing_ok);
 
 extern void reindex_index(Oid indexId, bool skip_constraint_checks,
-		  char relpersistence, int options);
+		  char relpersistence, ReindexParams *params);
 
 /* Flag bits for reindex_relation(): */
 #define REINDEX_REL_PROCESS_TOAST			0x01
@@ -155,7 +158,7 @@ extern void reindex_index(Oid indexId, bool skip_constraint_checks,
 #define REINDEX_REL_FORCE_INDEXES_UNLOGGED	0x08
 #define REINDEX_REL_FORCE_INDEXES_PERMANENT 0x10
 
-extern bool reindex_relation(Oid relid, int flags, int options);
+extern bool reindex_relation(Oid relid, int flags, ReindexParams *params);
 
 extern bool ReindexIsProcessingHeap(Oid heapOid);
 extern bool ReindexIsProcessingIndex(Oid indexOid);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 401a0827ae..1245d944dc 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -18,16 +18,17 @@
 #include "storage/lock.h"
 #include "utils/relcache.h"
 
-
 /* options for CLUSTER */
-typedef enum ClusterOption
+#define CLUOPT_RECHECK 0x01		/* recheck relation state */
+#define CLUOPT_VERBOSE 0x02		/* print progress info */
+
+typedef struct ClusterParams
 {
-	CLUOPT_RECHECK = 1 << 0,	/* recheck relation state */
-	CLUOPT_VERBOSE = 1 << 1		/* print progress info */
-} ClusterOption;
+	bits32		options;			/* bitmask of CLUOPT_* */
+} ClusterParams;
 
 extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Oid tableOid, Oid indexOid, int options);
+extern void cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 	   bool recheck, LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index e2d2a77ca4..91281d6f8e 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -14,6 +14,7 @@
 #ifndef DEFREM_H
 #define DEFREM_H
 
+#include "catalog/index.h"
 #include "catalog/objectaddress.h"
 #include "nodes/params.h"
 #include "parser/parse_node.h"
@@ -34,11 +35,7 @@ extern ObjectAddress DefineIndex(Oid relationId,
  bool check_not_in_use,
  bool skip_build,
  bool quiet);
-extern int	ReindexParseOptions(ParseState *pstate, ReindexStmt *stmt);
-extern void ReindexIndex(RangeVar *indexRelation, int options, bool isTopLevel);
-extern Oid	ReindexTable(RangeVar *relation, int options, bool isTopLevel);
-extern void ReindexMultipleTables(const char *objectName, ReindexObjectType ob

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alexey Kondratov

On 2021-01-20 18:54, Alvaro Herrera wrote:

On 2021-Jan-20, Alvaro Herrera wrote:


On 2021-Jan-20, Michael Paquier wrote:

> +/*
> + * This is mostly duplicating ATExecSetTableSpaceNoStorage,
> + * which should maybe be factored out to a library function.
> + */
> Wouldn't it be better to do first the refactoring of 0002 and then
> 0001 so as REINDEX can use the new routine, instead of putting that
> into a comment?

I think merging 0001 and 0002 into a single commit is a reasonable
approach.


... except it doesn't make a lot of sense to have set_rel_tablespace in
either indexcmds.c or index.c.  I think tablecmds.c is a better place
for it.  (I would have thought catalog/storage.c, but that one's not 
the

right abstraction level it seems.)



I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New 
function SetRelTablesapce() is placed into the tablecmds.c. Following 
0002 gets use of it. Is it close to what you and Michael suggested?




But surely ATExecSetTableSpaceNoStorage should be using this new
routine.  (I first thought 0002 was doing that, since that commit is
calling itself a "refactoring", but now that I look closer, it's not.)



Yeah, this 'refactoring' was initially referring to refactoring of what 
Justin added to one of the previous 0001. And it was meant to be merged 
with 0001, once agreed, but we got distracted by other stuff.


I have not yet addressed Michael's concerns regarding reindex of 
partitions. I am going to look closer on it tomorrow.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alexey Kondratov

On 2021-01-20 21:08, Alexey Kondratov wrote:

On 2021-01-20 18:54, Alvaro Herrera wrote:

On 2021-Jan-20, Alvaro Herrera wrote:


On 2021-Jan-20, Michael Paquier wrote:

> +/*
> + * This is mostly duplicating ATExecSetTableSpaceNoStorage,
> + * which should maybe be factored out to a library function.
> + */
> Wouldn't it be better to do first the refactoring of 0002 and then
> 0001 so as REINDEX can use the new routine, instead of putting that
> into a comment?

I think merging 0001 and 0002 into a single commit is a reasonable
approach.


... except it doesn't make a lot of sense to have set_rel_tablespace 
in

either indexcmds.c or index.c.  I think tablecmds.c is a better place
for it.  (I would have thought catalog/storage.c, but that one's not 
the

right abstraction level it seems.)



I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New
function SetRelTablesapce() is placed into the tablecmds.c. Following
0002 gets use of it. Is it close to what you and Michael suggested?



Ugh, forgot to attach the patches. Here they are.

--
AlexeyFrom 2c3876f99bc8ebdd07c532619992e7ec3093e50a Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 23 Mar 2020 21:10:29 +0300
Subject: [PATCH v2 2/2] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml |  22 +
 src/backend/catalog/index.c   |  72 ++-
 src/backend/commands/indexcmds.c  |  68 ++-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/index.h   |   2 +
 src/test/regress/input/tablespace.source  |  53 +++
 src/test/regress/output/tablespace.source | 102 ++
 7 files changed, 318 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..4f84060c4d 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
+TABLESPACE new_tablespace
 
  
 
@@ -187,6 +188,19 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  This specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM is specified, then
+  all unsuitable relations will be skipped and a single WARNING
+  will be generated.
+ 
+
+   
+

 VERBOSE
 
@@ -210,6 +224,14 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+new_tablespace
+
+ 
+  The tablespace where indexes will be rebuilt.
+ 
+
+   
   
  
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b8cd35e995..ed98b17483 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -57,6 +57,7 @@
 #include "commands/event_trigger.h"
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
+#include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -1394,9 +1395,13 @@ index_update_collation_versions(Oid relid, Oid coll)
  * Create concurrently an index based on the definition of the one provided by
  * caller.  The index is inserted into catalogs and needs to be built later
  * on.  This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the new tablespace to use for this index.  If
+ * InvalidOid, use the tablespace in-use instead.
  */
 Oid
-index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char *newName)
+index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
+			   Oid tablespaceOid, const char *newName)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1526,7 +1531,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char
 			  newInfo,
 			  indexColNames,
 			  indexRelation->rd_rel->relam,
-			  indexRelation->rd_rel->reltablespace,
+			  OidIsValid(tablespaceOid) ?
+tablespaceOid : indexRelation->rd_rel->reltablespace,
 			  indexRelation->rd_indcollation,
 			  indclass->values,
 			  indcoloptions->values,
@@ -3591,6 +3597,8 @@ IndexGetRelation(Oid indexId, bool missing_ok)
 
 /*
  * reindex_index - This routine is used to recreate a single index
+ *
+ * See comments of reindex_relation() for details about "tablespaceOid".
  */
 void
 reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
@@ -3603,6 +3611,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 	volatile bool skipped_constraint = false;
 	PGRUsage

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-21 Thread Alexey Kondratov

On 2021-01-21 04:41, Michael Paquier wrote:

On Wed, Jan 20, 2021 at 03:34:39PM -0300, Alvaro Herrera wrote:

On 2021-Jan-20, Alexey Kondratov wrote:

Ugh, forgot to attach the patches. Here they are.


Yeah, looks reasonable.



+
+   if (changed)
+   /* Record dependency on tablespace */
+   changeDependencyOnTablespace(RelationRelationId,
+reloid, 
rd_rel->reltablespace);


Why have a separate "if (changed)" block here instead of merging with
the above?


Yep.



Sure, this is a refactoring artifact.


+   if (SetRelTablespace(reloid, newTableSpace))
+   /* Make sure the reltablespace change is visible */
+   CommandCounterIncrement();
At quick glance, I am wondering why you just don't do a CCI within
SetRelTablespace().



I did it that way for a better readability at first, since it looks more 
natural when you do some change (SetRelTablespace) and then make them 
visible with CCI. Second argument was that in the case of 
reindex_index() we have to also call RelationAssumeNewRelfilenode() and 
RelationDropStorage() before doing CCI and making the new tablespace 
visible. And this part is critical, I guess.




+  This specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" relations. If 
SCHEMA,

+  DATABASE or SYSTEM is
specified, then
+  all unsuitable relations will be skipped and a single
WARNING
+  will be generated.
What is an unsuitable relation?  How can the end user know that?



This was referring to mapped relations mentioned in the previous 
sentence. I have tried to rewrite this part and make it more specific in 
my current version. Also added Justin's changes to the docs and comment.



This is missing ACL checks when moving the index into a new location,
so this requires some pg_tablespace_aclcheck() calls, and the other
patches share the same issue.



I added proper pg_tablespace_aclcheck()'s into the reindex_index() and 
ReindexPartitions().



+   else if (partkind == RELKIND_PARTITIONED_TABLE)
+   {
+   Relation rel = table_open(partoid, ShareLock);
+   List*indexIds = RelationGetIndexList(rel);
+   ListCell *lc;
+
+   table_close(rel, NoLock);
+   foreach (lc, indexIds)
+   {
+   Oid indexid = lfirst_oid(lc);
+   (void) set_rel_tablespace(indexid, 
params->tablespaceOid);

+   }
+   }
This is really a good question.  ReindexPartitions() would trigger one
transaction per leaf to work on.  Changing the tablespace of the
partitioned table(s) before doing any work has the advantage to tell
any new partition to use the new tablespace.  Now, I see a struggling
point here: what should we do if the processing fails in the middle of
the move, leaving a portion of the leaves in the previous tablespace?
On a follow-up reindex with the same command, should the command force
a reindex even on the partitions that have been moved?  Or could there
be a point in skipping the partitions that are already on the new
tablespace and only process the ones on the previous tablespace?  It
seems to me that the first scenario makes the most sense as currently
a REINDEX works on all the relations defined, though there could be
use cases for the second case.  This should be documented, I think.



I agree that follow-up REINDEX should also reindex moved partitions, 
since REINDEX (TABLESPACE ...) is still reindex at first. I will try to 
put something about this part into the docs. Also I think that we cannot 
be sure that nothing happened with already reindexed partitions between 
two consequent REINDEX calls.



There are no tests for partitioned tables, aka we'd want to make sure
that the new partitioned index is on the correct tablespace, as well
as all its leaves.  It may be better to have at least two levels of
partitioned tables, as well as a partitioned table with no leaves in
the cases dealt with.



Yes, sure, it makes sense.


+*
+* Even if a table's indexes were moved to a new tablespace, 
the index

+* on its toast table is not normally moved.
 */
Still, REINDEX (TABLESPACE) TABLE should move all of them to be
consistent with ALTER TABLE SET TABLESPACE, but that's not the case
with this code, no?  This requires proper test coverage, but there is
nothing of the kind in this patch.


You are right, we do not move TOAST indexes now, since 
IsSystemRelation() is true for TOAST indexes, so I thought that we 
should not allow moving them without allow_system_table_mods=true. Now I 
wonder why ALTER TABLE does that.


I am going to attach the new version of patch set today or tomorrow.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-21 Thread Alexey Kondratov

On 2021-01-21 17:06, Alexey Kondratov wrote:

On 2021-01-21 04:41, Michael Paquier wrote:


There are no tests for partitioned tables, aka we'd want to make sure
that the new partitioned index is on the correct tablespace, as well
as all its leaves.  It may be better to have at least two levels of
partitioned tables, as well as a partitioned table with no leaves in
the cases dealt with.



Yes, sure, it makes sense.


+*
+* Even if a table's indexes were moved to a new tablespace, 
the index

+* on its toast table is not normally moved.
 */
Still, REINDEX (TABLESPACE) TABLE should move all of them to be
consistent with ALTER TABLE SET TABLESPACE, but that's not the case
with this code, no?  This requires proper test coverage, but there is
nothing of the kind in this patch.


You are right, we do not move TOAST indexes now, since
IsSystemRelation() is true for TOAST indexes, so I thought that we
should not allow moving them without allow_system_table_mods=true. Now
I wonder why ALTER TABLE does that.

I am going to attach the new version of patch set today or tomorrow.



Attached is a new patch set of first two patches, that should resolve 
all the issues raised before (ACL, docs, tests) excepting TOAST. Double 
thanks for suggestion to add more tests with nested partitioning. I have 
found and squashed a huge bug related to the returning back to the 
default tablespace using newly added tests.


Regarding TOAST. Now we skip moving toast indexes or throw error if 
someone wants to move TOAST index directly. I had a look on ALTER TABLE 
SET TABLESPACE and it has a bit complicated logic:


1) You cannot move TOAST table directly.
2) But if you move basic relation that TOAST table belongs to, then they 
are moved altogether.
3) Same logic as 2) happens if one does ALTER TABLE ALL IN TABLESPACE 
...


That way, ALTER TABLE allows moving TOAST tables (with indexes) 
implicitly, but does not allow doing that explicitly. In the same time I 
found docs to be vague about such behavior it only says:


All tables in the current database in a tablespace can be moved
by using the ALL IN TABLESPACE ... Note that system catalogs are
not moved by this command

Changing any part of a system catalog table is not permitted.

So actually ALTER TABLE treats TOAST relations as system sometimes, but 
sometimes not.


From the end user perspective it makes sense to move TOAST with main 
table when doing ALTER TABLE SET TABLESPACE. But should we touch indexes 
on TOAST table with REINDEX? We cannot move TOAST relation itself, since 
we are doing only a reindex, so we end up in the state when TOAST table 
and its index are placed in the different tablespaces. This state is not 
reachable with ALTER TABLE/INDEX, so it seem we should not allow it with 
REINDEX as well, should we?



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom bcd690da6bc3db16a96305b45546d3c9e400f769 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 23 Mar 2020 21:10:29 +0300
Subject: [PATCH v3 2/2] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml |  29 +++-
 src/backend/catalog/index.c   |  82 +++-
 src/backend/commands/indexcmds.c  |  81 +++-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/index.h   |   2 +
 src/test/regress/input/tablespace.source  |  79 +++
 src/test/regress/output/tablespace.source | 154 ++
 7 files changed, 425 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..90fdad0b4c 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
+TABLESPACE new_tablespace
 
  
 
@@ -187,6 +188,20 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  Specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" and system (unless allow_system_table_mods
+  is set to TRUE) relations. If SCHEMA,
+  DATABASE or SYSTEM are specified, then
+  all "mapped" and system relations will be skipped and a single
+  WARNING will be generated.
+ 
+
+   
+

 VERBOSE
 
@@ -210,6 +225,14 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+new_tablespace
+
+ 
+  The tablespace where indexes will be rebuilt.
+ 
+
+   
   
  
 
@@ -292,7 +315,11 @@ REINDEX [ ( option [, ...] ) ] { IN
with REINDEX INDEX or REINDEX TABLE,
respectively. Each partition of the specified partitioned relation is
reindex

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-22 Thread Alexey Kondratov

On 2021-01-22 00:26, Justin Pryzby wrote:

On Thu, Jan 21, 2021 at 11:48:08PM +0300, Alexey Kondratov wrote:
Attached is a new patch set of first two patches, that should resolve 
all
the issues raised before (ACL, docs, tests) excepting TOAST. Double 
thanks
for suggestion to add more tests with nested partitioning. I have 
found and
squashed a huge bug related to the returning back to the default 
tablespace

using newly added tests.

Regarding TOAST. Now we skip moving toast indexes or throw error if 
someone

wants to move TOAST index directly. I had a look on ALTER TABLE SET
TABLESPACE and it has a bit complicated logic:

1) You cannot move TOAST table directly.
2) But if you move basic relation that TOAST table belongs to, then 
they are

moved altogether.
3) Same logic as 2) happens if one does ALTER TABLE ALL IN TABLESPACE 
...


That way, ALTER TABLE allows moving TOAST tables (with indexes) 
implicitly,
but does not allow doing that explicitly. In the same time I found 
docs to

be vague about such behavior it only says:

All tables in the current database in a tablespace can be moved
by using the ALL IN TABLESPACE ... Note that system catalogs are
not moved by this command

Changing any part of a system catalog table is not permitted.

So actually ALTER TABLE treats TOAST relations as system sometimes, 
but

sometimes not.

From the end user perspective it makes sense to move TOAST with main 
table
when doing ALTER TABLE SET TABLESPACE. But should we touch indexes on 
TOAST
table with REINDEX? We cannot move TOAST relation itself, since we are 
doing
only a reindex, so we end up in the state when TOAST table and its 
index are
placed in the different tablespaces. This state is not reachable with 
ALTER
TABLE/INDEX, so it seem we should not allow it with REINDEX as well, 
should

we?


+		 * Even if a table's indexes were moved to a new tablespace, the 
index

+* on its toast table is not normally moved.
 */
ReindexParams newparams = *params;

newparams.options &= ~(REINDEXOPT_MISSING_OK);
+   if (!allowSystemTableMods)
+   newparams.tablespaceOid = InvalidOid;


I think you're right.  So actually TOAST should never move, even if
allowSystemTableMods, right ?



I think so. I would prefer to do not move TOAST indexes implicitly at 
all during reindex.




@@ -292,7 +315,11 @@ REINDEX [ ( class="parameter">option [, ...] ) ] { IN
with REINDEX INDEX or REINDEX 
TABLE,
respectively. Each partition of the specified partitioned relation 
is
reindexed in a separate transaction. Those commands cannot be used 
inside

-   a transaction block when working on a partitioned table or index.
+   a transaction block when working on a partitioned table or index. 
If
+   REINDEX with TABLESPACE 
executed
+   on partitioned relation fails it may have moved some partitions to 
the new
+   tablespace. Repeated command will still reindex all partitions 
even if they

+   are already in the new tablespace.


Minor corrections here:

If a REINDEX command fails when run on a partitioned
relation, and TABLESPACE was specified, then it may 
have
moved indexes on some partitions to the new tablespace.  Re-running the 
command
will reindex all partitions and move previously-unprocessed indexes to 
the new

tablespace.


Sounds good to me.

I have updated patches accordingly and also simplified tablespaceOid 
checks and assignment in the newly added SetRelTableSpace(). Result is 
attached as two separate patches for an ease of review, but no 
objections to merge them and apply at once if everything is fine.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 87e47e9b5b3d6b49230045e5db8f844b14b34ba0 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 23 Mar 2020 21:10:29 +0300
Subject: [PATCH v4 2/2] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml |  30 +++-
 src/backend/catalog/index.c   |  81 ++-
 src/backend/commands/indexcmds.c  |  81 ++-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/index.h   |   2 +
 src/test/regress/input/tablespace.source  |  85 
 src/test/regress/output/tablespace.source | 159 ++
 7 files changed, 436 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..a1c7736aec 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
+TABLESPACE new_tablespace
 
  
 
@@ -187,6 +188,20 @@ REINDEX 

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-25 Thread Alexey Kondratov

On 2021-01-25 11:07, Michael Paquier wrote:

On Fri, Jan 22, 2021 at 05:07:02PM +0300, Alexey Kondratov wrote:
I have updated patches accordingly and also simplified tablespaceOid 
checks
and assignment in the newly added SetRelTableSpace(). Result is 
attached as
two separate patches for an ease of review, but no objections to merge 
them

and apply at once if everything is fine.


 extern void SetRelationHasSubclass(Oid relationId, bool 
relhassubclass);

+extern bool SetRelTableSpace(Oid reloid, Oid tablespaceOid);
Seeing SetRelationHasSubclass(), wouldn't it be more consistent to use
SetRelationTableSpace() as routine name?

I think that we should document that the caller of this routine had
better do a CCI once done to make the tablespace chage visible.
Except for those two nits, the patch needs an indentation run and some
style tweaks but its logic looks fine.  So I'll apply that first
piece.



I updated comment with CCI info, did pgindent run and renamed new 
function to SetRelationTableSpace(). New patch is attached.



+INSERT INTO regress_tblspace_test_tbl (num1, num2, t)
+  SELECT round(random()*100), random(), repeat('text', 100)
+  FROM generate_series(1, 10) s(i);
Repeating 1M times a text value is too costly for such a test.  And as
even for empty tables there is one page created for toast indexes,
there is no need for that?



Yes, TOAST relation is created anyway. I just wanted to put some data 
into a TOAST index, so REINDEX did some meaningful work there, not only 
a new relfilenode creation. However you are right and this query 
increases tablespace tests execution for more for more than 2 times on 
my machine. I think that it is not really required.




This patch is introducing three new checks for system catalogs:
- don't use tablespace for mapped relations.
- don't use tablespace for system relations, except if
allowSystemTableMods.
- don't move non-shared relation to global tablespace.
For the non-concurrent case, all three checks are in reindex_index().
For the concurrent case, the two first checks are in
ReindexMultipleTables() and the third one is in
ReindexRelationConcurrently().  That's rather tricky to follow because
CONCURRENTLY is not allowed on system relations.  I am wondering if it
would be worth an extra comment effort, or if there is a way to
consolidate that better.



Yeah, all these checks we complicated from the beginning. I will try to 
find a better place tomorrow or put more info into the comments at 
least.


I am also going to check/fix the remaining points regarding 002 
tomorrow.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 39880842d7af31dcbfcffe7219250b31102955d5 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 20 Jan 2021 20:21:12 +0300
Subject: [PATCH v5 1/2] Extract common part from ATExecSetTableSpaceNoStorage
 for a future usage

---
 src/backend/commands/tablecmds.c | 95 +++-
 src/include/commands/tablecmds.h |  2 +
 2 files changed, 58 insertions(+), 39 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8687e9a97c..ec9c440e4e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13291,6 +13291,59 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	list_free(reltoastidxids);
 }
 
+/*
+ * SetRelationTableSpace - modify relation tablespace in the pg_class entry.
+ *
+ * 'reloid' is an Oid of relation to be modified.
+ * 'tablespaceOid' is an Oid of new tablespace.
+ *
+ * Catalog modification is done only if tablespaceOid is different from
+ * the currently set.  Returned bool value is indicating whether any changes
+ * were made or not.  Note that caller is responsible for doing
+ * CommandCounterIncrement() to make tablespace changes visible.
+ */
+bool
+SetRelationTableSpace(Oid reloid, Oid tablespaceOid)
+{
+	Relation	pg_class;
+	HeapTuple	tuple;
+	Form_pg_class rd_rel;
+	bool		changed = false;
+
+	/* Get a modifiable copy of the relation's pg_class row. */
+	pg_class = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(reloid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", reloid);
+	rd_rel = (Form_pg_class) GETSTRUCT(tuple);
+
+	/* MyDatabaseTableSpace is stored as InvalidOid. */
+	if (tablespaceOid == MyDatabaseTableSpace)
+		tablespaceOid = InvalidOid;
+
+	/* No work if no change in tablespace. */
+	if (tablespaceOid != rd_rel->reltablespace)
+	{
+		/* Update the pg_class row. */
+		rd_rel->reltablespace = tablespaceOid;
+		CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
+
+		/* Record dependency on tablespace. */
+		changeDependencyOnTablespace(RelationRelationId,
+	 reloid, rd_rel->reltablespace);
+
+		changed = true;

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-26 Thread Alexey Kondratov

On 2021-01-26 09:58, Michael Paquier wrote:

On Mon, Jan 25, 2021 at 11:11:38PM +0300, Alexey Kondratov wrote:
I updated comment with CCI info, did pgindent run and renamed new 
function

to SetRelationTableSpace(). New patch is attached.

[...]

Yeah, all these checks we complicated from the beginning. I will try 
to find

a better place tomorrow or put more info into the comments at least.


I was reviewing that, and I think that we can do a better
consolidation on several points that will also help the features
discussed on this thread for VACUUM, CLUSTER and REINDEX.

If you look closely, ATExecSetTableSpace() uses the same logic as the
code modified here to check if a relation can be moved to a new
tablespace, with extra checks for mapped relations,
GLOBALTABLESPACE_OID or if attempting to manipulate a temp relation
from another session.  There are two differences though:
- Custom actions are taken between the phase where we check if a
relation can be moved to a new tablespace, and the update of
pg_class.
- ATExecSetTableSpace() needs to be able to set a given relation
relfilenode on top of reltablespace, the newly-created one.

So I think that the heart of the problem is made of two things here:
- We should have one common routine for the existing code paths and
the new code paths able to check if a tablespace move can be done or
not.  The case of a cluster, reindex or vacuum on a list of relations
extracted from pg_class would still require a different handling
as incorrect relations have to be skipped, but the case of individual
relations can reuse the refactoring pieces done here
(see CheckRelationTableSpaceMove() in the attached).
- We need to have a second routine able to update reltablespace and
optionally relfilenode for a given relation's pg_class entry, once the
caller has made sure that CheckRelationTableSpaceMove() validates a
tablespace move.



I think that I got your idea. One comment:

+bool
+CheckRelationTableSpaceMove(Relation rel, Oid newTableSpaceId)
+{
+   Oid oldTableSpaceId;
+   Oid reloid = RelationGetRelid(rel);
+
+   /*
+* No work if no change in tablespace.  Note that MyDatabaseTableSpace
+* is stored as 0.
+*/
+   oldTableSpaceId = rel->rd_rel->reltablespace;
+   if (newTableSpaceId == oldTableSpaceId ||
+   (newTableSpaceId == MyDatabaseTableSpace && oldTableSpaceId == 
0))
+   {
+   InvokeObjectPostAlterHook(RelationRelationId, reloid, 0);
+   return false;
+   }

CheckRelationTableSpaceMove() does not feel like a right place for 
invoking post alter hooks. It is intended only to check for tablespace 
change possibility. Anyway, ATExecSetTableSpace() and 
ATExecSetTableSpaceNoStorage() already do that in the no-op case.



Please note that was a bug in your previous patch 0002: shared
dependencies need to be registered if reltablespace is updated of
course, but also iff the relation has no physical storage.  So
changeDependencyOnTablespace() requires a check based on
RELKIND_HAS_STORAGE(), or REINDEX would have registered shared
dependencies even for relations with storage, something we don't
want per the recent work done by Alvaro in ebfe2db.



Yes, thanks.

I have removed this InvokeObjectPostAlterHook() from your 0001 and made 
0002 to work on top of it. I think that now it should look closer to 
what you described above.


In the new 0002 I moved ACL check to the upper level, i.e. 
ExecReindex(), and removed expensive text generation in test. Not 
touched yet some of your previously raised concerns. Also, you made 
SetRelationTableSpace() to accept Relation instead of Oid, so now we 
have to open/close indexes in the ReindexPartitions(), I am not sure 
that I use proper locking there, but it works.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 96a37399a9cf9ae08d62e28496e73b36087e5a19 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 26 Jan 2021 15:53:06 +0900
Subject: [PATCH v7 1/2] Refactor code to detect and process tablespace moves

---
 src/backend/commands/tablecmds.c | 218 +--
 src/include/commands/tablecmds.h |   4 +
 2 files changed, 127 insertions(+), 95 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8687e9a97c..c08eedf995 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3037,6 +3037,116 @@ SetRelationHasSubclass(Oid relationId, bool relhassubclass)
 	table_close(relationRelation, RowExclusiveLock);
 }
 
+/*
+ * CheckRelationTableSpaceMove
+ *		Check if relation can be moved to new tablespace.
+ *
+ * NOTE: caller must be holding an appropriate lock on the relation.
+ * ShareUpdateExclusiveLock is sufficient to prevent concurrent schema
+ * changes.
+ *
+ * Returns true if the relation can be moved to the n

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-27 Thread Alexey Kondratov

On 2021-01-27 06:14, Michael Paquier wrote:

On Wed, Jan 27, 2021 at 01:00:50AM +0300, Alexey Kondratov wrote:
In the new 0002 I moved ACL check to the upper level, i.e. 
ExecReindex(),
and removed expensive text generation in test. Not touched yet some of 
your
previously raised concerns. Also, you made SetRelationTableSpace() to 
accept

Relation instead of Oid, so now we have to open/close indexes in the
ReindexPartitions(), I am not sure that I use proper locking there, 
but it

works.


Passing down Relation to the new routines makes the most sense to me
because we force the callers to think about the level of locking
that's required when doing any tablespace moves.

+   Relation iRel = index_open(partoid, ShareLock);
+
+   if (CheckRelationTableSpaceMove(iRel, 
params->tablespaceOid))

+   SetRelationTableSpace(iRel,
+ params->tablespaceOid,
+ InvalidOid);
Speaking of which, this breaks the locking assumptions of
SetRelationTableSpace().  I feel that we should think harder about
this part for partitioned indexes and tables because this looks rather
unsafe in terms of locking assumptions with partition trees.  If we
cannot come up with a safe solution, I would be fine with disallowing
TABLESPACE in this case, as a first step.  Not all problems have to be
solved at once, and even without this part the feature is still
useful.



I have read more about lock levels and ShareLock should prevent any kind 
of physical modification of indexes. We already hold ShareLock doing 
find_all_inheritors(), which is higher than ShareUpdateExclusiveLock, so 
using ShareLock seems to be safe here, but I will look on it closer.




+   /* It's not a shared catalog, so refuse to move it to shared 
tablespace */

+   if (params->tablespaceOid == GLOBALTABLESPACE_OID)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot move non-shared relation to tablespace 
\"%s\"",

+get_tablespace_name(params->tablespaceOid;
Why is that needed if CheckRelationTableSpaceMove() is used?



This is from ReindexRelationConcurrently() where we do not use 
CheckRelationTableSpaceMove(). For me it makes sense to add only this 
GLOBALTABLESPACE_OID check there, since before we already check for 
system catalogs and after for temp relations, so adding 
CheckRelationTableSpaceMove() will be a double-check.




- indexRelation->rd_rel->reltablespace,
+ OidIsValid(tablespaceOid) ?
+   tablespaceOid :
indexRelation->rd_rel->reltablespace,
Let's remove this logic from index_concurrently_create_copy() and let
the caller directly decide the tablespace to use, without a dependency
on InvalidOid in the inner routine.  A share update exclusive lock is
already hold on the old index when creating the concurrent copy, so
there won't be concurrent schema changes.



Changed.

Also added tests for ACL checks, relfilenode changes. Added ACL recheck 
for multi-transactional case. Added info about TOAST index reindexing. 
Changed some comments.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom f176a6e5a81ab133fee849f72e4edb8b287d6062 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 27 Jan 2021 00:46:17 +0300
Subject: [PATCH v8] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml |  31 +++-
 src/backend/catalog/index.c   |  50 +-
 src/backend/commands/indexcmds.c  | 112 -
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/index.h   |   9 +-
 src/test/regress/input/tablespace.source  | 106 +
 src/test/regress/output/tablespace.source | 181 ++
 7 files changed, 481 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..e610a0f52c 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
+TABLESPACE new_tablespace
 
  
 
@@ -187,6 +188,21 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  Specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" and system (unless allow_system_table_mods
+  is set to TRUE) relations. If SCHEMA,
+  DATABASE or SYSTEM are specified,
+  then all "mapped" and system relations will be skipped and a single
+  WARNING will be gener

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-28 Thread Alexey Kondratov

On 2021-01-28 00:36, Alvaro Herrera wrote:

On 2021-Jan-28, Alexey Kondratov wrote:

I have read more about lock levels and ShareLock should prevent any 
kind of

physical modification of indexes. We already hold ShareLock doing
find_all_inheritors(), which is higher than ShareUpdateExclusiveLock, 
so

using ShareLock seems to be safe here, but I will look on it closer.


You can look at lock.c where LockConflicts[] is; that would tell you
that ShareLock indeed conflicts with ShareUpdateExclusiveLock ... but 
it

does not conflict with itself!  So it would be possible to have more
than one process doing this thing at the same time, which surely makes
no sense.



Thanks for the explanation and pointing me to the LockConflicts[]. This 
is a good reference.




I didn't look at the patch closely enough to understand why you're
trying to do something like CLUSTER, VACUUM FULL or REINDEX without
holding full AccessExclusiveLock on the relation.  But do keep in mind
that once you hold a lock on a relation, trying to grab a weaker lock
afterwards is pretty pointless.



No, you are right, we are doing REINDEX with AccessExclusiveLock as it 
was before. This part is a more specific one. It only applies to 
partitioned indexes, which do not hold any data, so we do not reindex 
them directly, only their leafs. However, if we are doing a TABLESPACE 
change, we have to record it in their pg_class entry, so all future leaf 
partitions were created in the proper tablespace.


That way, we open partitioned index relation only for a reference, i.e. 
read-only, but modify pg_class entry under a proper lock 
(RowExclusiveLock). That's why I thought that ShareLock will be enough.


IIUC, 'ALTER TABLE ... SET TABLESPACE' uses AccessExclusiveLock even for 
relations with no storage, since AlterTableGetLockLevel() chooses it if 
AT_SetTableSpace is met. This is very similar to our case, so probably 
we should do the same?


Actually it is not completely clear for me why ShareUpdateExclusiveLock 
is sufficient for newly added SetRelationTableSpace() as Michael wrote 
in the comment.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-29 Thread Alexey Kondratov

On 2021-01-28 14:42, Alexey Kondratov wrote:

On 2021-01-28 00:36, Alvaro Herrera wrote:



I didn't look at the patch closely enough to understand why you're
trying to do something like CLUSTER, VACUUM FULL or REINDEX without
holding full AccessExclusiveLock on the relation.  But do keep in mind
that once you hold a lock on a relation, trying to grab a weaker lock
afterwards is pretty pointless.



No, you are right, we are doing REINDEX with AccessExclusiveLock as it
was before. This part is a more specific one. It only applies to
partitioned indexes, which do not hold any data, so we do not reindex
them directly, only their leafs. However, if we are doing a TABLESPACE
change, we have to record it in their pg_class entry, so all future
leaf partitions were created in the proper tablespace.

That way, we open partitioned index relation only for a reference,
i.e. read-only, but modify pg_class entry under a proper lock
(RowExclusiveLock). That's why I thought that ShareLock will be
enough.

IIUC, 'ALTER TABLE ... SET TABLESPACE' uses AccessExclusiveLock even
for relations with no storage, since AlterTableGetLockLevel() chooses
it if AT_SetTableSpace is met. This is very similar to our case, so
probably we should do the same?

Actually it is not completely clear for me why
ShareUpdateExclusiveLock is sufficient for newly added
SetRelationTableSpace() as Michael wrote in the comment.



Changed patch to use AccessExclusiveLock in this part for now. This is 
what 'ALTER TABLE/INDEX ... SET TABLESPACE' and 'REINDEX' usually do. 
Anyway, all real leaf partitions are processed in the independent 
transactions later.


Also changed some doc/comment parts Justin pointed me to.

+  then all "mapped" and system relations will be skipped and a 
single
+  WARNING will be generated. Indexes on TOAST 
tables

+  are reindexed, but not moved the new tablespace.


moved *to* the new tablespace.



Fixed.



I don't know if that needs to be said at all.  We talked about it a lot 
to
arrive at the current behavior, but I think that's only due to the 
difficulty

of correcting the initial mistake.



I do not think that it will be a big deal to move indexes on TOAST 
tables as well. I just thought that since 'ALTER TABLE/INDEX ... SET 
TABLESPACE' only moves them together with host table, we also should not 
do that. Yet, I am ready to change this logic if requested.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 6e9db8d362e794edf421733bc7cade38c917bff4 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 27 Jan 2021 00:46:17 +0300
Subject: [PATCH v9] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml |  31 +++-
 src/backend/catalog/index.c   |  47 +-
 src/backend/commands/indexcmds.c  | 112 -
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/index.h   |   9 +-
 src/test/regress/input/tablespace.source  | 106 +
 src/test/regress/output/tablespace.source | 181 ++
 7 files changed, 478 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..2b39699d42 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
+TABLESPACE new_tablespace
 
  
 
@@ -187,6 +188,21 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  Specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" or (unless allow_system_table_mods)
+  system relations. If SCHEMA,
+  DATABASE or SYSTEM are specified,
+  then all "mapped" and system relations will be skipped and a single
+  WARNING will be generated. Indexes on TOAST tables
+  are reindexed, but not moved to the new tablespace.
+ 
+
+   
+

 VERBOSE
 
@@ -210,6 +226,14 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+new_tablespace
+
+ 
+  The tablespace where indexes will be rebuilt.
+ 
+
+   
   
  
 
@@ -292,7 +316,12 @@ REINDEX [ ( option [, ...] ) ] { IN
with REINDEX INDEX or REINDEX TABLE,
respectively. Each partition of the specified partitioned relation is
reindexed in a separate transaction. Those commands cannot be used inside
-   a transaction block when working on a partitioned table or index.
+   a transaction block when working on a partitioned table or index. If
+   a REINDEX command fails when run on a partitioned
+   relation, and TABLESPACE was specified, then it may have
+   moved

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-02-01 Thread Alexey Kondratov

On 2021-01-30 05:23, Michael Paquier wrote:

On Fri, Jan 29, 2021 at 08:56:47PM +0300, Alexey Kondratov wrote:

On 2021-01-28 14:42, Alexey Kondratov wrote:
No, you are right, we are doing REINDEX with AccessExclusiveLock as 
it

was before. This part is a more specific one. It only applies to
partitioned indexes, which do not hold any data, so we do not reindex
them directly, only their leafs. However, if we are doing a 
TABLESPACE

change, we have to record it in their pg_class entry, so all future
leaf partitions were created in the proper tablespace.

That way, we open partitioned index relation only for a reference,
i.e. read-only, but modify pg_class entry under a proper lock
(RowExclusiveLock). That's why I thought that ShareLock will be
enough.

IIUC, 'ALTER TABLE ... SET TABLESPACE' uses AccessExclusiveLock even
for relations with no storage, since AlterTableGetLockLevel() chooses
it if AT_SetTableSpace is met. This is very similar to our case, so
probably we should do the same?

Actually it is not completely clear for me why
ShareUpdateExclusiveLock is sufficient for newly added
SetRelationTableSpace() as Michael wrote in the comment.


Nay, it was not fine.  That's something Alvaro has mentioned, leading
to 2484329.  This also means that the main patch of this thread should
refresh the comments at the top of CheckRelationTableSpaceMove() and
SetRelationTableSpace() to mention that this is used by REINDEX
CONCURRENTLY with a lower lock.



Hm, IIUC, REINDEX CONCURRENTLY doesn't use either of them. It directly 
uses index_create() with a proper tablespaceOid instead of 
SetRelationTableSpace(). And its checks structure is more restrictive 
even without tablespace change, so it doesn't use 
CheckRelationTableSpaceMove().


Changed patch to use AccessExclusiveLock in this part for now. This is 
what
'ALTER TABLE/INDEX ... SET TABLESPACE' and 'REINDEX' usually do. 
Anyway, all
real leaf partitions are processed in the independent transactions 
later.


+   if (partkind == RELKIND_PARTITIONED_INDEX)
+   {
+   Relation iRel = index_open(partoid, AccessExclusiveLock);
+
+   if (CheckRelationTableSpaceMove(iRel, 
params->tablespaceOid))

+   SetRelationTableSpace(iRel,
+ params->tablespaceOid,
+ InvalidOid);
+   index_close(iRel, NoLock);
Are you sure that this does not represent a risk of deadlocks as EAL
is not taken consistently across all the partitions?  A second issue
here is that this breaks the assumption of REINDEX CONCURRENTLY kicked
on partitioned relations that should use ShareUpdateExclusiveLock for
all its steps.  This would make the first transaction invasive for the
user, but we don't want that.

This makes me really wonder if we would not be better to restrict this
operation for partitioned relation as part of REINDEX as a first step.
Another thing, mentioned upthread, is that we could do this part of
the switch at the last transaction, or we could silently *not* do the
switch for partitioned indexes in the flow of REINDEX, letting users
handle that with an extra ALTER TABLE SET TABLESPACE once REINDEX has
finished on all the partitions, cascading the command only on the
partitioned relation of a tree.  It may be interesting to look as well
at if we could lower the lock used for partitioned relations with
ALTER TABLE SET TABLESPACE from AEL to SUEL, choosing AEL only if at
least one partition with storage is involved in the command,
CheckRelationTableSpaceMove() discarding anything that has no need to
change.



I am not sure right now, so I split previous patch into two parts:

0001: Adds TABLESPACE into REINDEX with tests, doc and all the stuff we 
did before with the only exception that it doesn't move partitioned 
indexes into the new tablespace.


Basically, it implements this option "we could silently *not* do the 
switch for partitioned indexes in the flow of REINDEX, letting users 
handle that with an extra ALTER TABLE SET TABLESPACE once REINDEX has 
finished". It probably makes sense, since we are doing tablespace change 
altogether with index relation rewrite and don't touch relations without 
storage. Doing ALTER INDEX ... SET TABLESPACE will be almost cost-less 
on them, since they do not hold any data.


0002: Implements the remaining part where pg_class entry is also changed 
for partitioned indexes. I think that we should think more about it, 
maybe it is not so dangerous and proper locking strategy could be 
achieved.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 6322032b472e6b1a76e0ca9326974e5774371fb9 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 1 Feb 2021 15:20:29 +0300
Subject: [PATCH v10 2/2] Change tablespace of partitioned indexes during
 REINDEX.

There are some doubts about proper locking

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-02-03 Thread Alexey Kondratov

On 2021-02-03 09:37, Michael Paquier wrote:

On Tue, Feb 02, 2021 at 10:32:19AM +0900, Michael Paquier wrote:

On Mon, Feb 01, 2021 at 06:28:57PM +0300, Alexey Kondratov wrote:
> Hm, IIUC, REINDEX CONCURRENTLY doesn't use either of them. It directly uses
> index_create() with a proper tablespaceOid instead of
> SetRelationTableSpace(). And its checks structure is more restrictive even
> without tablespace change, so it doesn't use CheckRelationTableSpaceMove().

Sure.  I have not checked the patch in details, but even with that it
would be much safer to me if we apply the same sanity checks
everywhere.  That's less potential holes to worry about.


Thanks Alexey for the new patch.  I have been looking at the main
patch in details.

/*
-* Don't allow reindex on temp tables of other backends ... their 
local

-* buffer manager is not going to cope.
+* We don't support moving system relations into different 
tablespaces

+* unless allow_system_table_mods=1.
 */
If you remove the check on RELATION_IS_OTHER_TEMP() in
reindex_index(), you would allow the reindex of a temp relation owned
by a different session if its tablespace is not changed, so this
cannot be removed.

+!allowSystemTableMods && IsSystemRelation(iRel))
 ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot reindex temporary tables of other 
sessions")));

+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied: \"%s\" is a system 
catalog",

+RelationGetRelationName(iRel;
Indeed, a system relation with a relfilenode should be allowed to move
under allow_system_table_mods.  I think that we had better move this
check into CheckRelationTableSpaceMove() instead of reindex_index() to
centralize the logic.  ALTER TABLE does this business in
RangeVarCallbackForAlterRelation(), but our code path opening the
relation is different for the non-concurrent case.

+   if (OidIsValid(params->tablespaceOid) &&
+   IsSystemClass(relid, classtuple))
+   {
+   if (!allowSystemTableMods)
+   {
+   /* Skip all system relations, if not 
allowSystemTableMods *

I don't see the need for having two warnings here to say the same
thing if a relation is mapped or not mapped, so let's keep it simple.



Yeah, I just wanted to separate mapped and system relations, but 
probably it is too complicated.




I have found that the test suite was rather messy in its
organization.  Table creations were done first with a set of tests not
really ordered, so that was really hard to follow.  This has also led
to a set of tests that were duplicated, while other tests have been
missed, mainly some cross checks for the concurrent and non-concurrent
behaviors.  I have reordered the whole so as tests on catalogs, normal
tables and partitions are done separately with relations created and
dropped for each set.  Partitions use a global check for tablespaces
and relfilenodes after one concurrent reindex (didn't see the point in
doubling with the non-concurrent case as the same code path to select
the relations from the partition tree is taken).  An ACL test has been
added at the end.

The case of partitioned indexes was kind of interesting and I thought
about that a couple of days, and I took the decision to ignore
relations that have no storage as you did, documenting that ALTER
TABLE can be used to update the references of the partitioned
relations.  The command is still useful with this behavior, and the
tests I have added track that.

Finally, I have reworked the docs, separating the limitations related
to system catalogs and partitioned relations, to be more consistent
with the notes at the end of the page.



Thanks for working on this.

+   if (tablespacename != NULL)
+   {
+   params.tablespaceOid = get_tablespace_oid(tablespacename, 
false);
+
+   /* Check permissions except when moving to database's default */
+   if (OidIsValid(params.tablespaceOid) &&

This check for OidIsValid() seems to be excessive, since you moved the 
whole ACL check under 'if (tablespacename != NULL)' here.


+   params.tablespaceOid != MyDatabaseTableSpace)
+   {
+   AclResult   aclresult;


+CREATE INDEX regress_tblspace_test_tbl_idx ON regress_tblspace_test_tbl 
(num1);

+-- move to global tablespace move fails

Maybe 'move to global tablespace, fail', just to match a style of the 
previous comments.


+REINDEX (TABLESPACE pg_global) INDEX regress_tblspace_test_tbl_idx;


+SELECT relid, parentrelid, level FROM 
pg_partition_tree('tbspace_reindex_part_index')

+  ORDER BY relid, level;
+SELECT relid, parentrelid, level FROM 
pg_partition_tree('tbspace_

Re: Free port choosing freezes when PostgresNode::use_tcp is used on BSD systems

2021-04-20 Thread Alexey Kondratov

On 2021-04-20 18:03, Tom Lane wrote:

Andrew Dunstan  writes:

On 4/19/21 7:22 PM, Tom Lane wrote:

I wonder whether we could get away with just replacing the $use_tcp
test with $TestLib::windows_os.  It's not really apparent to me
why we should care about 127.0.0.not-1 on Unix-oid systems.



Yeah
The comment is a bit strange anyway - Cygwin is actually going to use
Unix sockets, not TCP.
I think I would just change the test to this: $use_tcp &&
$TestLib::windows_os.


Works for me, but we need to revise the comment to match.



Then it could be somewhat like that, I guess.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Companydiff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index db47a97d196..f7b488ed464 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1191,19 +1191,19 @@ sub get_free_port
 		# Check to see if anything else is listening on this TCP port.
 		# Seek a port available for all possible listen_addresses values,
 		# so callers can harness this port for the widest range of purposes.
-		# The 0.0.0.0 test achieves that for post-2006 Cygwin, which
-		# automatically sets SO_EXCLUSIVEADDRUSE.  The same holds for MSYS (a
-		# Cygwin fork).  Testing 0.0.0.0 is insufficient for Windows native
-		# Perl (https://stackoverflow.com/a/14388707), so we also test
-		# individual addresses.
+		# The 0.0.0.0 test achieves that for MSYS, which automatically sets
+		# SO_EXCLUSIVEADDRUSE.  Testing 0.0.0.0 is insufficient for Windows
+		# native Perl (https://stackoverflow.com/a/14388707), so we also
+		# have to test individual addresses.  Doing that for 127.0.0/24
+		# addresses other than 127.0.0.1 might fail with EADDRNOTAVAIL on
+		# non-Linux, non-Windows kernels.
 		#
-		# On non-Linux, non-Windows kernels, binding to 127.0.0/24 addresses
-		# other than 127.0.0.1 might fail with EADDRNOTAVAIL.  Binding to
-		# 0.0.0.0 is unnecessary on non-Windows systems.
+		# That way, 0.0.0.0 and individual 127.0.0/24 addresses are tested
+		# only on Windows when TCP usage is requested.
 		if ($found == 1)
 		{
 			foreach my $addr (qw(127.0.0.1),
-$use_tcp ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ())
+$use_tcp && $TestLib::windows_os ? qw(127.0.0.2 127.0.0.3 0.0.0.0) : ())
 			{
 if (!can_bind($addr, $port))
 {


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-11-12 Thread Alexey Kondratov

On 04.11.2019 13:05, Kuntal Ghosh wrote:

On Mon, Nov 4, 2019 at 3:32 PM Dilip Kumar  wrote:

So your result shows that with "streaming on", performance is
degrading?  By any chance did you try to see where is the bottleneck?


Right. But, as we increase the logical_decoding_work_mem, the
performance improves. I've not analyzed the bottleneck yet. I'm
looking into the same.


My guess is that 64 kB is just too small value. In the table schema used 
for tests every rows takes at least 24 bytes for storing column values. 
Thus, with this logical_decoding_work_mem value the limit should be hit 
after about 2500+ rows, or about 400 times during transaction of 100 
rows size.


It is just too frequent, while ReorderBufferStreamTXN includes a whole 
bunch of logic, e.g. it always starts internal transaction:


/*
 * Decoding needs access to syscaches et al., which in turn use
 * heavyweight locks and such. Thus we need to have enough state around to
 * keep track of those.  The easiest way is to simply use a transaction
 * internally.  That also allows us to easily enforce that nothing writes
 * to the database by checking for xid assignments. ...
 */

Also it issues separated stream_start/stop messages around each streamed 
transaction chunk. So if streaming starts and stops too frequently it 
adds additional overhead and may even interfere with current in-progress 
transaction.


If I get it correctly, then it is rather expected with too small values 
of logical_decoding_work_mem. Probably it may be optimized, but I am not 
sure that it is worth doing right now.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Conflict handling for COPY FROM

2019-11-15 Thread Alexey Kondratov

On 11.11.2019 16:00, Surafel Temesgen wrote:



Next, you use DestRemoteSimple for returning conflicting tuples back:

+        dest = CreateDestReceiver(DestRemoteSimple);
+        dest->rStartup(dest, (int) CMD_SELECT, tupDesc);

However, printsimple supports very limited subset of built-in
types, so

CREATE TABLE large_test (id integer primary key, num1 bigint, num2
double precision);
COPY large_test FROM '/path/to/copy-test.tsv';
COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;

fails with following error 'ERROR:  unsupported type OID: 701', which
seems to be very confusing from the end user perspective. I've
tried to
switch to DestRemote, but couldn't figure it out quickly.


fixed


Thanks, now it works with my tests.

1) Maybe it is fine, but now I do not like this part:

+    portal = GetPortalByName("");
+    dest = CreateDestReceiver(DestRemote);
+    SetRemoteDestReceiverParams(dest, portal);
+    dest->rStartup(dest, (int) CMD_SELECT, tupDesc);

Here you implicitly use the fact that portal with a blank name is always 
created in exec_simple_query before we get to this point. Next, you 
create new DestReceiver and set it to this portal, but it is also 
already created and set in the exec_simple_query.


Would it be better if you just explicitly pass ready DestReceiver to 
DoCopy (similarly to how it is done for T_ExecuteStmt / ExecuteQuery), 
as it may be required by COPY now?


2) My second concern is that you use three internal flags to track 
errors limit:


+    int            error_limit;    /* total number of error to ignore */
+    bool        ignore_error;    /* is ignore error specified? */
+    bool        ignore_all_error;    /* is error_limit -1 (ignore all 
error)

+                                     * specified? */

Though it seems that we can just leave error_limit as a user-defined 
constant and track errors with something like errors_count. In that case 
you do not need auxiliary ignore_all_error flag. But probably it is a 
matter of personal choice.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-11-20 Thread Alexey Kondratov

Hi Steve,

Thank you for review.

On 17.11.2019 3:53, Steve Singer wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:tested, failed

* I had to replace heap_open/close with table_open/close to get the
patch to compile against master

In the documentation

+ 
+  This specifies a tablespace, where all rebuilt indexes will be created.
+  Can be used only with REINDEX INDEX and
+  REINDEX TABLE, since the system indexes are not
+  movable, but SCHEMA, DATABASE or
+  SYSTEM very likely will has one.
+ 

I found the "SCHEMA,DATABASE or SYSTEM very likely will has one." portion 
confusing and would be inclined to remove it or somehow reword it.


In the attached new version REINDEX with TABLESPACE and {SCHEMA, 
DATABASE, SYSTEM} now behaves more like with CONCURRENTLY, i.e. it skips 
unsuitable relations and shows warning. So this section in docs has been 
updated as well.


Also the whole patch has been reworked. I noticed that my code in 
reindex_index was doing pretty much the same as inside 
RelationSetNewRelfilenode. So I just added a possibility to specify new 
tablespace for RelationSetNewRelfilenode instead. Thus, even with 
addition of new tests the patch becomes less complex.



Consider the following

-
  create index foo_bar_idx on foo(bar) tablespace pg_default;
CREATE INDEX
reindex=# \d foo
 Table "public.foo"
  Column |  Type   | Collation | Nullable | Default
+-+---+--+-
  id | integer |   | not null |
  bar| text|   |  |
Indexes:
 "foo_pkey" PRIMARY KEY, btree (id)
 "foo_bar_idx" btree (bar)

reindex=# reindex index foo_bar_idx tablespace tst1;
REINDEX
reindex=# reindex index foo_bar_idx tablespace pg_default;
REINDEX
reindex=# \d foo
 Table "public.foo"
  Column |  Type   | Collation | Nullable | Default
+-+---+--+-
  id | integer |   | not null |
  bar| text|   |  |
Indexes:
 "foo_pkey" PRIMARY KEY, btree (id)
 "foo_bar_idx" btree (bar), tablespace "pg_default"


It is a bit strange that it says "pg_default" as the tablespace. If I do
this with a alter table to the table, moving the table back to pg_default
makes it look as it did before.

Otherwise the first patch seems fine.


Yes, I missed the fact that default tablespace of database is stored 
implicitly as InvalidOid, but I was setting it explicitly as specified. 
I have changed this behavior to stay consistent with ALTER TABLE.



With the second patch(for NOWAIT) I did the following

T1: begin;
T1: insert into foo select generate_series(1,1000);
T2: reindex index foo_bar_idx set tablespace tst1 nowait;

T2 is waiting for a lock. This isn't what I would expect.


Indeed, I have added nowait option for RangeVarGetRelidExtended, so it 
should not wait if index is locked. However, for reindex we also have to 
put share lock on the parent table relation, which is done by opening it 
via table_open(heapId, ShareLock).


The only one solution I can figure out right now is to wrap all such 
opens with ConditionalLockRelationOid(relId, ShareLock) and then do 
actual open with NoLock. This is how something similar is implemented in 
VACUUM if VACOPT_SKIP_LOCKED is specified. However, there are multiple 
code paths with table_open, so it becomes a bit ugly.


I will leave the second patch aside for now and experiment with it. 
Actually, its main idea was to mimic ALTER INDEX ... SET TABLESPACE 
[NOWAIT] syntax, but probably it is better to stick with more brief 
plain TABLESPACE like in CREATE INDEX.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

P.S. I have also added all previous thread participants to CC in order to do 
not split the thread. Sorry if it was a bad idea.

>From 22990d58fb549536ca33a1b02c5a21a248deee5d Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 20 Nov 2019 20:09:50 +0300
Subject: [PATCH v4] Allow REINDEX and REINDEX CONCURRENTLY to change
 TABLESPACE

---
 doc/src/sgml/ref/reindex.sgml | 24 ++-
 src/backend/catalog/index.c   | 26 +--
 src/backend/commands/cluster.c|  2 +-
 src/backend/commands/indexcmds.c  | 88 +++
 src/backend/commands/sequence.c   |  8 ++-
 src/backend/commands/tablecmds.c  |  9 ++-
 src/backend/parser/gram.y | 21 --
 src/backend/tcop/utility.c|  6 +-
 src/backend/utils/cache/relcache.c| 18 -
 src/include/catalog/index.h   |  7 +-
 src/include/commands/defrem.h  

Re: Conflict handling for COPY FROM

2019-11-21 Thread Alexey Kondratov

On 18.11.2019 9:42, Surafel Temesgen wrote:
On Fri, Nov 15, 2019 at 6:24 PM Alexey Kondratov 
mailto:a.kondra...@postgrespro.ru>> wrote:



1) Maybe it is fine, but now I do not like this part:

+    portal = GetPortalByName("");
+    dest = CreateDestReceiver(DestRemote);
+    SetRemoteDestReceiverParams(dest, portal);
+    dest->rStartup(dest, (int) CMD_SELECT, tupDesc);

Here you implicitly use the fact that portal with a blank name is
always
created in exec_simple_query before we get to this point. Next, you
create new DestReceiver and set it to this portal, but it is also
already created and set in the exec_simple_query.

Would it be better if you just explicitly pass ready DestReceiver to
DoCopy (similarly to how it is done for T_ExecuteStmt /
ExecuteQuery),


Good idea .Thank you


Now the whole patch works exactly as expected for me and I cannot find 
any new technical flaws. However, the doc is rather vague, especially 
these places:


+  specifying it to -1 returns all error record.

Actually, we return only rows with constraint violation, but malformed 
rows are ignored with warning. I guess that we simply cannot return 
malformed rows back to the caller in the same way as with constraint 
violation, since we cannot figure out (in general) which column 
corresponds to which type if there are extra or missing columns.


+  and same record formatting error is ignored.

I can get it, but it definitely should be reworded.

What about something like this?

+   
+ ERROR_LIMIT
+    
+ 
+  Enables ignoring of errored out rows up to limit_number. If limit_number is set
+  to -1, then all errors will be ignored.
+ 
+
+ 
+  Currently, only unique or exclusion constraint violation
+  and rows formatting errors are ignored. Malformed
+  rows will rise warnings, while constraint violating rows
+  will be returned back to the caller.
+ 
+
+    
+   

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-11-27 Thread Alexey Kondratov

On 27.11.2019 6:54, Michael Paquier wrote:

On Tue, Nov 26, 2019 at 11:09:55PM +0100, Masahiko Sawada wrote:

I looked at v4 patch. Here are some comments:

+   /* Skip all mapped relations if TABLESPACE is specified */
+   if (OidIsValid(tableSpaceOid) &&
+   classtuple->relfilenode == 0)

I think we can use OidIsValid(classtuple->relfilenode) instead.

Yes, definitely.


Yes, switched to !OidIsValid(classtuple->relfilenode). Also I added a 
comment that it is meant to be equivalent to RelationIsMapped() and 
extended tests.





This change says that temporary relation is not supported but it
actually seems to work. Which is correct?

Yeah, I don't really see a reason why it would not work.


My bad, I was keeping in mind RELATION_IS_OTHER_TEMP validation, but it 
is for temp tables of other backends only, so it definitely should not 
be in the doc. Removed.



Your patch has forgotten to update copyfuncs.c and equalfuncs.c with
the new tablespace string field.


Fixed, thanks.


It would be nice to add tab completion for this new clause in psql.


Added.


There is no need for opt_tablespace_name as new node for the parsing
grammar of gram.y as OptTableSpace is able to do the exact same job.


Sure, it was an artifact from the times, where I used optional SET 
TABLESPACE clause. Removed.




@@ -3455,6 +3461,8 @@ RelationSetNewRelfilenode(Relation relation,
char persistence)
  */
 newrnode = relation->rd_node;
 newrnode.relNode = newrelfilenode;
+   if (OidIsValid(tablespaceOid))
+   newrnode.spcNode = newTablespaceOid;
The core of the patch is actually here.  It seems to me that this is a
very bad idea because you actually hijack a logic which happens at a
much lower level which is based on the state of the tablespace stored
in the relation cache entry of the relation being reindexed, then the
tablespace choice actually happens in RelationInitPhysicalAddr() which
for the new relfilenode once the follow-up CCI is done.  So this very
likely needs more thoughts, and bringing to the point: shouldn't you
actually be careful that the relation tablespace is correctly updated
before reindexing it and before creating its new relfilenode?  This
way, RelationSetNewRelfilenode() does not need any additional work,
and I think that this saves from potential bugs in the choice of the
tablespace used with the new relfilenode.


When I did the first version of the patch I was looking on 
ATExecSetTableSpace, which implements ALTER ... SET TABLESPACE. And 
there is very similar pipeline there:


1) Find pg_class entry with SearchSysCacheCopy1

2) Create new relfilenode with GetNewRelFileNode

3) Set new tablespace for this relfilenode

4) Do some work with new relfilenode

5) Update pg_class entry with new tablespace

6) Do CommandCounterIncrement

The only difference is that point 3) and tablespace part of 5) were 
missing in RelationSetNewRelfilenode, so I added them, and I do 4) after 
6) in REINDEX. Thus, it seems that in my implementation of tablespace 
change in REINDEX I am more sure that "the relation tablespace is 
correctly updated before reindexing", since I do reindex after CCI 
(point 6), doesn't it?


So why it is fine for ATExecSetTableSpace to do pretty much the same, 
but not for REINDEX? Or the key point is in doing actual work before 
CCI, but for me it seems a bit against what you have wrote?


Thus, I cannot get your point correctly here. Can you, please, elaborate 
a little bit more your concerns?



ISTM the kind of above errors are the same: the given tablespace
exists but moving tablespace to it is not allowed since it's not
supported in PostgreSQL. So I think we can use
ERRCODE_FEATURE_NOT_SUPPORTED instead of
ERRCODE_INVALID_PARAMETER_VALUE (which is used at 3 places) .

Yes, it is also not project style to use full sentences in error
messages, so I would suggest instead (note the missing quotes in the
original patch):
cannot move non-shared relation to tablespace \"%s\"


Same here. I have taken this validation directly from tablecmds.c part 
for ALTER ... SET TABLESPACE. And there is exactly the same message 
"only shared relations can be placed in pg_global tablespace" with 
ERRCODE_INVALID_PARAMETER_VALUE there.


However, I understand your point, but still, would it be better if I 
stick to the same ERRCODE/message? Or should I introduce new 
ERRCODE/message for the same case?



And I have somewhat missed to notice the timing of the review replies
as you did not have room to reply, so fixed the CF entry to "waiting
on author", and bumped it to next CF instead.


Thank you! Attached is a patch, that addresses all the issues above, 
excepting the last two points (core part and error messages for 
pg_global), which are not clear for me right now.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2019-12-02 Thread Alexey Kondratov

On 02.12.2019 11:21, Michael Paquier wrote:

On Wed, Nov 27, 2019 at 08:47:06PM +0300, Alexey Kondratov wrote:

The only difference is that point 3) and tablespace part of 5) were missing
in RelationSetNewRelfilenode, so I added them, and I do 4) after 6) in
REINDEX. Thus, it seems that in my implementation of tablespace change in
REINDEX I am more sure that "the relation tablespace is correctly updated
before reindexing", since I do reindex after CCI (point 6), doesn't it?

So why it is fine for ATExecSetTableSpace to do pretty much the same, but
not for REINDEX? Or the key point is in doing actual work before CCI, but
for me it seems a bit against what you have wrote?

Nope, the order is not the same on what you do here, causing a
duplication in the tablespace selection within
RelationSetNewRelfilenode() and when flushing the relation on the new
tablespace for the first time after the CCI happens, please see
below.  And we should avoid that.


Thus, I cannot get your point correctly here. Can you, please, elaborate a
little bit more your concerns?

The case of REINDEX CONCURRENTLY is pretty simple, because a new
relation which is a copy of the old relation is created before doing
the reindex, so you simply need to set the tablespace OID correctly
in index_concurrently_create_copy().  And actually, I think that the
computation is incorrect because we need to check after
MyDatabaseTableSpace as well, no?


No, the same logic already exists in heap_create:

    if (reltablespace == MyDatabaseTableSpace)
        reltablespace = InvalidOid;

Which is called by index_concurrently_create_copy -> index_create -> 
heap_create.



The case of REINDEX is more tricky, because you are working on a
relation that already exists, hence I think that what you need to do a
different thing before the actual REINDEX:
1) Update the existing relation's pg_class tuple to point to the new
tablespace.
2) Do a CommandCounterIncrement.
So I think that the order of the operations you are doing is incorrect,
and that you have a risk of breaking the existing tablespace assignment
logic done when first flushing a new relfilenode.

This actually brings an extra thing: when doing a plain REINDEX you
need to make sure that the past relfilenode of the relation gets away
properly.  The attached POC patch does that before doing the CCI which
is a bit ugly, but that's enough to show my point, and there is no
need to touch RelationSetNewRelfilenode() this way.


Thank you for the detailed answer and PoC patch. I will recheck 
everything and dig deeper into this problem, and come up with something 
closer to the next 01.2020 commitfest.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-12-03 Thread Alexey Kondratov

On 01.12.2019 5:57, Michael Paquier wrote:

On Thu, Sep 26, 2019 at 03:08:22PM +0300, Alexey Kondratov wrote:

As Alvaro correctly pointed in the nearby thread [1], we've got an
interference regarding -R command line argument. I agree that it's a good
idea to reserve -R for recovery configuration write to be consistent with
pg_basebackup, so I've updated my patch to use another letters:

The patch has rotten and does not apply anymore.  Could you please
send a rebased version?  I have moved the patch to next CF, waiting on
author for now.


Rebased and updated patch is attached.

There was a problem with testing new restore_command options altogether 
with recent ensureCleanShutdown. My test simply moves all WAL from 
pg_wal and generates restore_command for a new options testing, but this 
prevents startup recovery required by ensureCleanShutdown. To test both 
options in the same we have to leave some recent WAL segments in the 
pg_wal and be sure that they are enough for startup recovery, but not 
enough for successful pg_rewind run. I have manually figured out that 
required amount of inserted records (and generated WAL) to achieve this. 
However, I think that this approach is not good for test, since tests 
may be modified in the future (amount of writes to DB changed) or even 
volume of WAL written by Postgres will change. It will lead to falsely 
always failed or passed tests.


Moreover, testing both ensureCleanShutdown and new options in the same 
time doesn't hit new code paths, so I decided to test new options with 
--no-ensure-shutdown for simplicity and stability of tests.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From a05c3343e0bd6fe339c944f6b0cde64ceb46a0b3 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 19 Feb 2019 19:14:53 +0300
Subject: [PATCH v11] pg_rewind: options to use restore_command from command
 line or cluster config

Previously, when pg_rewind could not find required WAL files in the
target data directory the rewind process would fail. One had to
manually figure out which of required WAL files have already moved to
the archival storage and copy them back.

This patch adds possibility to specify restore_command via command
line option or use one specified inside postgresql.conf. Specified
restore_command will be used for automatic retrieval of missing WAL
files from archival storage.
---
 doc/src/sgml/ref/pg_rewind.sgml   |  49 +++-
 src/bin/pg_rewind/parsexlog.c | 164 +-
 src/bin/pg_rewind/pg_rewind.c | 118 +++---
 src/bin/pg_rewind/pg_rewind.h |   6 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm | 105 -
 8 files changed, 416 insertions(+), 38 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 42d29edd4e..b601a5c7e4 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -66,11 +66,12 @@ PostgreSQL documentation
can be found either on the target timeline, the source timeline, or their common
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
-   target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
-   fetched on startup by configuring  or
-   .  The use of
+   target cluster ran for a long time after the divergence, its old WAL
+   files might no longer be present. In this case, you can manually copy them
+   from the WAL archive to the pg_wal directory, or run
+   pg_rewind with the -c or
+   -C option to automatically retrieve them from the WAL
+   archive. The use of
pg_rewind is not limited to failover, e.g.  a standby
server can be promoted, run some write transactions, and then rewinded
to become a standby again.
@@ -232,6 +233,39 @@ PostgreSQL documentation
   
  
 
+ 
+  -c
+  --restore-target-wal
+  
+   
+Use the restore_command defined in
+postgresql.conf to retrieve WAL files from
+the WAL archive if these files are no longer available in the
+pg_wal directory of the target cluster.
+   
+   
+This option cannot be used together with --target-restore-command.
+   
+  
+ 
+
+ 
+  -C restore_command
+  --target-restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieving
+WAL files from the WAL archive if these files are no longer available
+in the pg_wal directory of the target cluster.
+   
+   
+If restore_co

Re: [PATCH] Increase the maximum value track_activity_query_size

2019-12-20 Thread Alexey Kondratov

On 19.12.2019 20:52, Robert Haas wrote:

On Thu, Dec 19, 2019 at 10:59 AM Tom Lane  wrote:

Bruce Momjian  writes:

Good question.  I am in favor of allowing a larger value if no one
objects.  I don't think adding the min/max is helpful.


The original poster.



And probably anyone else, who debugs stuck queries of yet another crazy 
ORM. Yes, one could use log_min_duration_statement, but having a 
possibility to directly get it from pg_stat_activity without eyeballing 
the logs is nice. Also, IIRC log_min_duration_statement applies only to 
completed statements.



I think there are pretty obvious performance and memory-consumption
penalties to very large track_activity_query_size values.  Who exactly
are we really helping if we let them set it to huge values?

(wanders away wondering if we have suitable integer-overflow checks
in relevant code paths...)



The value of pgstat_track_activity_query_size is in bytes, so setting it 
to any value below INT_MAX seems to be safe from that perspective. 
However, being multiplied by NumBackendStatSlots its reasonable value 
should be far below INT_MAX (~2 GB).


Sincerely, It does not look for me like something badly needed, but 
still. We already have hundreds of GUCs and it is easy for a user to 
build a sub-optimal configuration, so does this overprotection make sense?



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Physical replication slot advance is not persistent

2019-12-24 Thread Alexey Kondratov

Hi Hackers,

I have accidentally noticed that pg_replication_slot_advance only 
changes in-memory state of the slot when its type is physical. Its new 
value does not survive restart.


Reproduction steps:

1) Create new slot and remember its restart_lsn

SELECT pg_create_physical_replication_slot('slot1', true);
SELECT * from pg_replication_slots;

2) Generate some dummy WAL

CHECKPOINT;
SELECT pg_switch_wal();
CHECKPOINT;
SELECT pg_switch_wal();

3) Advance slot to the value of pg_current_wal_insert_lsn()

SELECT pg_replication_slot_advance('slot1', '0/160001A0');

4) Check that restart_lsn has been updated

SELECT * from pg_replication_slots;

5) Restart server and check restart_lsn again. It should be the same as 
in the step 1.



I dig into the code and it happens because of this if statement:

    /* Update the on disk state when lsn was updated. */
    if (XLogRecPtrIsInvalid(endlsn))
    {
        ReplicationSlotMarkDirty();
        ReplicationSlotsComputeRequiredXmin(false);
        ReplicationSlotsComputeRequiredLSN();
        ReplicationSlotSave();
    }

Actually, endlsn is always a valid LSN after the execution of 
replication slot advance guts. It works for logical slots only by 
chance, since there is an implicit ReplicationSlotMarkDirty() call 
inside LogicalConfirmReceivedLocation.


Attached is a small patch, which fixes this bug. I have tried to
stick to the same logic in this 'if (XLogRecPtrIsInvalid(endlsn))'
and now pg_logical_replication_slot_advance and
pg_physical_replication_slot_advance return InvalidXLogRecPtr if
no-op.

What do you think?


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

P.S. CCed Simon and Michael as they are the last who seriously touched 
pg_replication_slot_advance code.


>From 36d1fa2a89b3fb354a813354496df475ee11b62e Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 24 Dec 2019 18:21:50 +0300
Subject: [PATCH v1] Make phsycal replslot advance persistent

---
 src/backend/replication/slotfuncs.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 46e6dd4d12..826708d3f6 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -358,12 +358,14 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
  * The LSN position to move to is compared simply to the slot's restart_lsn,
  * knowing that any position older than that would be removed by successive
  * checkpoints.
+ *
+ * Returns InvalidXLogRecPtr if no-op.
  */
 static XLogRecPtr
 pg_physical_replication_slot_advance(XLogRecPtr moveto)
 {
 	XLogRecPtr	startlsn = MyReplicationSlot->data.restart_lsn;
-	XLogRecPtr	retlsn = startlsn;
+	XLogRecPtr	retlsn = InvalidXLogRecPtr;
 
 	if (startlsn < moveto)
 	{
@@ -386,6 +388,8 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
  * because we need to digest WAL to advance restart_lsn allowing to recycle
  * WAL and removal of old catalog tuples.  As decoding is done in fast_forward
  * mode, no changes are generated anyway.
+ *
+ * Returns InvalidXLogRecPtr if no-op.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -393,7 +397,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 	LogicalDecodingContext *ctx;
 	ResourceOwner old_resowner = CurrentResourceOwner;
 	XLogRecPtr	startlsn;
-	XLogRecPtr	retlsn;
+	XLogRecPtr	retlsn = InvalidXLogRecPtr;
 
 	PG_TRY();
 	{
@@ -414,9 +418,6 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 		 */
 		startlsn = MyReplicationSlot->data.restart_lsn;
 
-		/* Initialize our return value in case we don't do anything */
-		retlsn = MyReplicationSlot->data.confirmed_flush;
-
 		/* invalidate non-timetravel entries */
 		InvalidateSystemCaches();
 
@@ -480,9 +481,9 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			 * better than always losing the position even on clean restart.
 			 */
 			ReplicationSlotMarkDirty();
-		}
 
-		retlsn = MyReplicationSlot->data.confirmed_flush;
+			retlsn = MyReplicationSlot->data.confirmed_flush;
+		}
 
 		/* free context, call shutdown callback */
 		FreeDecodingContext(ctx);
@@ -575,7 +576,7 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	nulls[0] = false;
 
 	/* Update the on disk state when lsn was updated. */
-	if (XLogRecPtrIsInvalid(endlsn))
+	if (!XLogRecPtrIsInvalid(endlsn))
 	{
 		ReplicationSlotMarkDirty();
 		ReplicationSlotsComputeRequiredXmin(false);
-- 
2.17.1



Re: Physical replication slot advance is not persistent

2019-12-25 Thread Alexey Kondratov

On 25.12.2019 07:03, Kyotaro Horiguchi wrote:

At Tue, 24 Dec 2019 20:12:32 +0300, Alexey Kondratov 
 wrote in

I dig into the code and it happens because of this if statement:

     /* Update the on disk state when lsn was updated. */
     if (XLogRecPtrIsInvalid(endlsn))
     {
         ReplicationSlotMarkDirty();
         ReplicationSlotsComputeRequiredXmin(false);
         ReplicationSlotsComputeRequiredLSN();
         ReplicationSlotSave();
     }

Yes, it seems just broken.


Attached is a small patch, which fixes this bug. I have tried to
stick to the same logic in this 'if (XLogRecPtrIsInvalid(endlsn))'
and now pg_logical_replication_slot_advance and
pg_physical_replication_slot_advance return InvalidXLogRecPtr if
no-op.

What do you think?

I think we shoudn't change the definition of
pg_*_replication_slot_advance since the result is user-facing.


Yes, that was my main concern too. OK.


The functions return a invalid value only when the slot had the
invalid value and failed to move the position. I think that happens
only for uninitalized slots.

Anyway what we should do there is dirtying the slot when the operation
can be assumed to have been succeeded.

As the result I think what is needed there is just checking if the
returned lsn is equal or larger than moveto. Doen't the following
change work?

-   if (XLogRecPtrIsInvalid(endlsn))
+   if (moveto <= endlsn)


Yep, it helps with physical replication slot persistence after advance, 
but the whole validation (moveto <= endlsn) does not make sense for me. 
The value of moveto should be >= than minlsn == confirmed_flush / 
restart_lsn, while endlsn == retlsn is also always initialized with 
confirmed_flush / restart_lsn. Thus, your condition seems to be true in 
any case, even if it was no-op one, which we were intended to catch.


Actually, if we do not want to change pg_*_replication_slot_advance, we 
can just add straightforward validation that either confirmed_flush, or 
restart_lsn changed after slot advance guts execution. It will be a 
little bit bulky, but much more clear and will never be affected by 
pg_*_replication_slot_advance logic change.



Another weird part I have found is this assignment inside 
pg_logical_replication_slot_advance:


/* Initialize our return value in case we don't do anything */
retlsn = MyReplicationSlot->data.confirmed_flush;

It looks redundant, since later we do the same assignment, which should 
be reachable in any case.


I will recheck everything again and try to come up with something during 
this week.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Physical replication slot advance is not persistent

2019-12-25 Thread Alexey Kondratov

On 25.12.2019 16:51, Alexey Kondratov wrote:

On 25.12.2019 07:03, Kyotaro Horiguchi wrote:

As the result I think what is needed there is just checking if the
returned lsn is equal or larger than moveto. Doen't the following
change work?

-    if (XLogRecPtrIsInvalid(endlsn))
+    if (moveto <= endlsn)


Yep, it helps with physical replication slot persistence after 
advance, but the whole validation (moveto <= endlsn) does not make 
sense for me. The value of moveto should be >= than minlsn == 
confirmed_flush / restart_lsn, while endlsn == retlsn is also always 
initialized with confirmed_flush / restart_lsn. Thus, your condition 
seems to be true in any case, even if it was no-op one, which we were 
intended to catch.


I will recheck everything again and try to come up with something 
during this week.


If I get it correctly, then we already keep previous slot position in 
the minlsn, so we just have to compare endlsn with minlsn and treat 
endlsn <= minlsn as a no-op without slot state flushing.


Attached is a patch that does this, so it fixes the bug without 
affecting any user-facing behavior. Detailed comment section and DEBUG 
output are also added. What do you think now?


I have also forgotten to mention that all versions down to 11.0 should 
be affected with this bug.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From e08299ddf92abc3fb4e802e8b475097fa746c458 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 25 Dec 2019 20:12:42 +0300
Subject: [PATCH v2] Make physical replslot advance persistent

---
 src/backend/replication/slotfuncs.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6683fc3f9b..bc5c93b089 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -573,9 +573,17 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
 	nulls[0] = false;
 
-	/* Update the on disk state when lsn was updated. */
-	if (XLogRecPtrIsInvalid(endlsn))
+	/*
+	 * Update the on disk state when LSN was updated.  Here we rely on the facts
+	 * that: 1) minlsn is initialized with restart_lsn and confirmed_flush LSN for
+	 * physical and logical replication slot respectively, and 2) endlsn is set in
+	 * the same way by pg_*_replication_slot_advance, but after advance.  Thus,
+	 * endlsn <= minlsn is treated as a no-op.
+	 */
+	if (endlsn > minlsn)
 	{
+		elog(DEBUG1, "flushing replication slot '%s' state",
+			NameStr(MyReplicationSlot->data.name));
 		ReplicationSlotMarkDirty();
 		ReplicationSlotsComputeRequiredXmin(false);
 		ReplicationSlotsComputeRequiredLSN();

base-commit: 8ce3aa9b5914d1ac45ed3f9bc484f66b3c4850c7
-- 
2.17.1



Re: Physical replication slot advance is not persistent

2019-12-26 Thread Alexey Kondratov

On 26.12.2019 11:33, Kyotaro Horiguchi wrote:

At Wed, 25 Dec 2019 20:28:04 +0300, Alexey Kondratov 
 wrote in

Yep, it helps with physical replication slot persistence after
advance, but the whole validation (moveto <= endlsn) does not make
sense for me. The value of moveto should be >= than minlsn ==
confirmed_flush / restart_lsn, while endlsn == retlsn is also always
initialized with confirmed_flush / restart_lsn. Thus, your condition
seems to be true in any case, even if it was no-op one, which we were
intended to catch.

...

If I get it correctly, then we already keep previous slot position in
the minlsn, so we just have to compare endlsn with minlsn and treat
endlsn <= minlsn as a no-op without slot state flushing.

I think you're right about the condition. (endlsn cannot be less than
minlsn, though) But I came to think that we shouldn't use locations in
that decision.


Attached is a patch that does this, so it fixes the bug without
affecting any user-facing behavior. Detailed comment section and DEBUG
output are also added. What do you think now?

I have also forgotten to mention that all versions down to 11.0 should
be affected with this bug.

pg_replication_slot_advance is the only caller of
pg_logical/physical_replication_slot_advacne so there's no apparent
determinant on who-does-what about dirtying and other housekeeping
calculation like *ComputeRequired*() functions, but the current shape
seems a kind of inconsistent between logical and physical.

I think pg_logaical/physical_replication_slot_advance should dirty the
slot if they actually changed anything. And
pg_replication_slot_advance should do the housekeeping if the slots
are dirtied.  (Otherwise both the caller function should dirty the
slot in lieu of the two.)

The attached does that.


Both approaches looks fine for me: my last patch with as minimal 
intervention as possible and yours refactoring. I think that it is a 
right direction to let everyone who modifies slot->data also mark slot 
as dirty.


I found some comment section in your code as rather misleading:

+        /*
+ * We don't need to dirty the slot only for the above change, 
but dirty

+         * this slot for the same reason with
+         * pg_logical_replication_slot_advance.
+         */

We just modified MyReplicationSlot->data, which is "On-Disk data of a 
replication slot, preserved across restarts.", so it definitely should 
be marked as dirty, not because pg_logical_replication_slot_advance does 
the same.


Also I think that using this transient variable in 
ReplicationSlotIsDirty is not necessary. MyReplicationSlot is already a 
pointer to the slot in shared memory.


+    ReplicationSlot *slot = MyReplicationSlot;
+
+    Assert(MyReplicationSlot != NULL);
+
+    SpinLockAcquire(&slot->mutex);

Otherwise it looks fine for me, so attached is the same diff, but with 
these proposed corrections.


Another concern is that ReplicationSlotIsDirty is added with the only 
one user. It also cannot be used by SaveSlotToPath due to the 
simultaneous usage of both flags dirty and just_dirtied there.


In that way, I hope that we should call ReplicationSlotSave 
unconditionally in the pg_replication_slot_advance, so slot will be 
saved or not automatically based on the slot->dirty flag. In the same 
time, ReplicationSlotsComputeRequiredXmin and 
ReplicationSlotsComputeRequiredLSN should be called by anyone, who 
modifies xmin and LSN fields in the slot. Otherwise, currently we are 
getting some leaky abstractions.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 21ae8531b3..edf661521a 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -672,6 +672,23 @@ ReplicationSlotMarkDirty(void)
 	SpinLockRelease(&slot->mutex);
 }
 
+/*
+ * Verify whether currently acquired slot is dirty.
+ */
+bool
+ReplicationSlotIsDirty(void)
+{
+	bool dirty;
+
+	Assert(MyReplicationSlot != NULL);
+
+	SpinLockAcquire(&MyReplicationSlot->mutex);
+	dirty = MyReplicationSlot->dirty;
+	SpinLockRelease(&MyReplicationSlot->mutex);
+
+	return dirty;
+}
+
 /*
  * Convert a slot that's marked as RS_EPHEMERAL to a RS_PERSISTENT slot,
  * guaranteeing it will be there after an eventual crash.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6683fc3f9b..d7a16a9071 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -370,6 +370,12 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 		MyReplicationSlot->data.restart_lsn = moveto;
 		SpinLockRelease(&MyReplicationSlot->mutex);
 		retlsn = moveto;
+
+		/*
+		 * Dirty the slot as we updated data that is meant to be
+		 * persistent on disk.
+		 */
+		ReplicationSlotMarkDirty();
 	}
 
 	retur

Re: Supply restore_command to pg_rewind via CLI argument

2022-03-22 Thread Alexey Kondratov
Hi,

On Tue, Mar 22, 2022 at 3:32 AM Andres Freund  wrote:
>
> Doesn't apply once more: http://cfbot.cputube.org/patch_37_3213.log
>

Thanks for the reminder, a rebased version is attached.


Regards
-- 
Alexey Kondratov
From df56b5c7b882e781fdc0b92e7a83331f0baab094 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Tue, 29 Jun 2021 17:17:47 +0300
Subject: [PATCH v4] Allow providing restore_command as a command line option
 to pg_rewind

This could be useful when postgres is usually run with
-c config_file=..., so the actual configuration and restore_command
is not inside $PGDATA/postgresql.conf.
---
 doc/src/sgml/ref/pg_rewind.sgml  | 19 +
 src/bin/pg_rewind/pg_rewind.c| 45 ++-
 src/bin/pg_rewind/t/001_basic.pl |  1 +
 src/bin/pg_rewind/t/RewindTest.pm| 95 ++--
 src/test/perl/PostgreSQL/Test/Cluster.pm |  5 +-
 5 files changed, 106 insertions(+), 59 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..af75f35867 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,25 @@ PostgreSQL documentation
   
  
 
+ 
+  -C restore_command
+  --target-restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieving
+WAL files from the WAL archive if these files are no longer available
+in the pg_wal directory of the target cluster.
+   
+   
+If restore_command is already set in
+postgresql.conf, you can provide the
+--restore-target-wal option instead. If both options
+are provided, then --target-restore-command
+will be used.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index b39b5c1aac..9aca041425 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -85,21 +85,22 @@ usage(const char *progname)
 	printf(_("%s resynchronizes a PostgreSQL cluster with another copy of the cluster.\n\n"), progname);
 	printf(_("Usage:\n  %s [OPTION]...\n\n"), progname);
 	printf(_("Options:\n"));
-	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
-			 " retrieve WAL files from archives\n"));
-	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
-	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
-	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
-	printf(_("  -n, --dry-run  stop before modifying anything\n"));
-	printf(_("  -N, --no-sync  do not wait for changes to be written\n"
-			 " safely to disk\n"));
-	printf(_("  -P, --progress write progress messages\n"));
-	printf(_("  -R, --write-recovery-conf  write configuration for replication\n"
-			 " (requires --source-server)\n"));
-	printf(_("  --debugwrite a lot of debug messages\n"));
-	printf(_("  --no-ensure-shutdown   do not automatically fix unclean shutdown\n"));
-	printf(_("  -V, --version  output version information, then exit\n"));
-	printf(_("  -?, --help show this help, then exit\n"));
+	printf(_("  -c, --restore-target-wal  use restore_command in target configuration to\n"
+			 "retrieve WAL files from archives\n"));
+	printf(_("  -C, --target-restore-command=COMMAND  target WAL restore_command\n"));
+	printf(_("  -D, --target-pgdata=DIRECTORY existing data directory to modify\n"));
+	printf(_("  --source-pgdata=DIRECTORY source data directory to synchronize with\n"));
+	printf(_("  --source-server=CONNSTR   source server to synchronize with\n"));
+	printf(_("  -n, --dry-run stop before modifying anything\n"));
+	printf(_("  -N, --no-sync do not wait for changes to be written\n"
+			 "safely to disk\n"));
+	printf(_("  -P, --progresswrite progress messages\n"));
+	printf(_("  -R, --write-recovery-conf write configuration for replication\n"
+			 "(requires --source-server)\n"));
+	printf(_("  --debug   write a lot of debug messages\n"));
+	printf(_("  --no-ensure-shutdown  do not automatically

Re: Printing LSN made easy

2020-11-27 Thread Alexey Kondratov

Hi,

On 2020-11-27 13:40, Ashutosh Bapat wrote:


Off list Peter Eisentraut pointed out that we can not use these macros
in elog/ereport since it creates problems for translations. He
suggested adding functions which return strings and use %s when doing
so.

The patch has two functions pg_lsn_out_internal() which takes an LSN
as input and returns a palloc'ed string containing the string
representation of LSN. This may not be suitable in performance
critical paths and also may leak memory if not freed. So there's
another function pg_lsn_out_buffer() which takes LSN and a char array
as input, fills the char array with the string representation and
returns the pointer to the char array. This allows the function to be
used as an argument in printf/elog etc. Macro MAXPG_LSNLEN has been
extern'elized for this purpose.



If usage of macros in elog/ereport can cause problems for translation, 
then even with this patch life is not get simpler significantly. For 
example, instead of just doing like:


 elog(WARNING,
- "xlog min recovery request %X/%X is past current point 
%X/%X",

- (uint32) (lsn >> 32), (uint32) lsn,
- (uint32) (newMinRecoveryPoint >> 32),
- (uint32) newMinRecoveryPoint);
+ "xlog min recovery request " LSN_FORMAT " is past 
current point " LSN_FORMAT,

+ LSN_FORMAT_ARG(lsn),
+ LSN_FORMAT_ARG(newMinRecoveryPoint));

we have to either declare two additional local buffers, which is 
verbose; or use pg_lsn_out_internal() and rely on memory contexts (or do 
pfree() manually, which is verbose again) to prevent memory leaks.




Off list Craig Ringer suggested introducing a new format specifier
similar to %m for LSN but I did not get time to take a look at the
relevant code. AFAIU it's available only to elog/ereport, so may not
be useful generally. But teaching printf variants about the new format
would be the best solution. However, I didn't find any way to do that.



It seems that this topic has been extensively discussed off-list, but 
still strong +1 for the patch. I always wanted LSN printing to be more 
concise.


I have just tried new printing utilities in a couple of new places and 
it looks good to me.


+char *
+pg_lsn_out_internal(XLogRecPtr lsn)
+{
+   charbuf[MAXPG_LSNLEN + 1];
+
+   snprintf(buf, sizeof(buf), LSN_FORMAT, LSN_FORMAT_ARG(lsn));
+
+   return pstrdup(buf);
+}

Would it be a bit more straightforward if we palloc buf initially and 
just return a pointer instead of doing pstrdup()?



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 698e481f5f55b967b5c60dba4bc577f8baa20ff4 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 17:09:29 +0530
Subject: [PATCH] Make it easy to print LSN

The commit introduces following macros and functions to make it easy to
use LSNs in printf variants, elog, ereport and appendStringInfo
variants.

LSN_FORMAT - macro representing the format in which LSN is printed

LSN_FORMAT_ARG - macro to pass LSN as an argument to the above format

pg_lsn_out_internal - a function which returns palloc'ed char array
containing string representation of given LSN.

pg_lsn_out_buffer - similar to above but accepts and returns a char
array of size (MAXPG_LSNLEN + 1)

The commit also has some example usages of these.

Ashutosh Bapat
---
 contrib/pageinspect/rawpage.c|  3 +-
 src/backend/access/rmgrdesc/replorigindesc.c |  5 +-
 src/backend/access/rmgrdesc/xlogdesc.c   |  3 +-
 src/backend/access/transam/xlog.c|  8 ++--
 src/backend/utils/adt/pg_lsn.c   | 49 ++--
 src/include/access/xlogdefs.h|  7 +++
 src/include/utils/pg_lsn.h   |  3 ++
 7 files changed, 55 insertions(+), 23 deletions(-)

diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index c0181506a5..2cd055a5f0 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -261,8 +261,7 @@ page_header(PG_FUNCTION_ARGS)
 	{
 		char		lsnchar[64];
 
-		snprintf(lsnchar, sizeof(lsnchar), "%X/%X",
- (uint32) (lsn >> 32), (uint32) lsn);
+		snprintf(lsnchar, sizeof(lsnchar), LSN_FORMAT, LSN_FORMAT_ARG(lsn));
 		values[0] = CStringGetTextDatum(lsnchar);
 	}
 	else
diff --git a/src/backend/access/rmgrdesc/replorigindesc.c b/src/backend/access/rmgrdesc/replorigindesc.c
index 19e14f910b..a3f49b5750 100644
--- a/src/backend/access/rmgrdesc/replorigindesc.c
+++ b/src/backend/access/rmgrdesc/replorigindesc.c
@@ -29,10 +29,9 @@ replorigin_desc(StringInfo buf, XLogReaderState *record)
 
 xlrec = (xl_replorigin_set *) rec;
 
-appendStringInfo(buf, "set %u; lsn %X/%X; force: %d",
+appendStringInfo(buf, "set %u; lsn " LSN_FORMAT "; force: 

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-11-30 Thread Alexey Kondratov

On 2020-11-30 14:33, Michael Paquier wrote:

On Tue, Nov 24, 2020 at 09:31:23AM -0600, Justin Pryzby wrote:

@cfbot: rebased


Catching up with the activity here, I can see four different things in
the patch set attached:
1) Refactoring of the grammar of CLUSTER, VACUUM, ANALYZE and REINDEX
to support values in parameters.
2) Tablespace change for REINDEX.
3) Tablespace change for VACUUM FULL/CLUSTER.
4) Tablespace change for indexes with VACUUM FULL/CLUSTER.

I am not sure yet about the last three points, so let's begin with 1)
that is dealt with in 0001 and 0002.  I have spent some time on 0001,
renaming the rule names to be less generic than "common", and applied
it.  0002 looks to be in rather good shape, still there are a few
things that have caught my eyes.  I'll look at that more closely
tomorrow.



Thanks. I have rebased the remaining patches on top of 873ea9ee to use 
'utility_option_list' instead of 'common_option_list'.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom ac3b77aec26a40016784ada9dab8b9059f424fa4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 31 Mar 2020 20:35:41 -0500
Subject: [PATCH v31 5/5] Implement vacuum full/cluster (INDEX_TABLESPACE
 )

---
 doc/src/sgml/ref/cluster.sgml | 12 -
 doc/src/sgml/ref/vacuum.sgml  | 12 -
 src/backend/commands/cluster.c| 64 ++-
 src/backend/commands/matview.c|  3 +-
 src/backend/commands/tablecmds.c  |  2 +-
 src/backend/commands/vacuum.c | 46 +++-
 src/backend/postmaster/autovacuum.c   |  1 +
 src/include/commands/cluster.h|  6 ++-
 src/include/commands/vacuum.h |  5 +-
 src/test/regress/input/tablespace.source  | 13 +
 src/test/regress/output/tablespace.source | 20 +++
 11 files changed, 123 insertions(+), 61 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index cbfc0582be..6781e3a025 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -28,6 +28,7 @@ CLUSTER [VERBOSE] [ ( option [, ...
 
 VERBOSE [ boolean ]
 TABLESPACE new_tablespace
+INDEX_TABLESPACE new_tablespace
 
 
  
@@ -105,6 +106,15 @@ CLUSTER [VERBOSE] [ ( option [, ...
 

 
+   
+INDEX_TABLESPACE
+
+ 
+  Specifies that the table's indexes will be rebuilt on a new tablespace.
+ 
+
+   
+

 table_name
 
@@ -141,7 +151,7 @@ CLUSTER [VERBOSE] [ ( option [, ...
 new_tablespace
 
  
-  The tablespace where the table will be rebuilt.
+  The tablespace where the table or its indexes will be rebuilt.
  
 

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 5261a7c727..28cab119b6 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -36,6 +36,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ boolean ]
 PARALLEL integer
 TABLESPACE new_tablespace
+INDEX_TABLESPACE new_tablespace
 
 and table_and_columns is:
 
@@ -265,6 +266,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ boolean
 
@@ -314,7 +324,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ new_tablespace
 
  
-  The tablespace where the relation will be rebuilt.
+  The tablespace where the relation or its indexes will be rebuilt.
  
 

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b289a76d58..0f9f09a15a 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -71,7 +71,7 @@ typedef struct
 
 
 static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose,
-			 Oid NewTableSpaceOid);
+			 Oid NewTableSpaceOid, Oid NewIdxTableSpaceOid);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 			bool verbose, bool *pSwapToastByContent,
 			TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
@@ -107,9 +107,11 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 {
 	ListCell	*lc;
 	int			options = 0;
-	/* Name and Oid of tablespace to use for clustered relation. */
-	char		*tablespaceName = NULL;
-	Oid			tablespaceOid = InvalidOid;
+	/* Name and Oid of tablespaces to use for clustered relations. */
+	char		*tablespaceName = NULL,
+*idxtablespaceName = NULL;
+	Oid			tablespaceOid,
+idxtablespaceOid;
 
 	/* Parse list of generic parameters not handled by the parser */
 	foreach(lc, stmt->params)
@@ -123,6 +125,8 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 options &= ~CLUOPT_VERBOSE;
 		else if (strcmp(opt->defname, "tablespace") == 0)
 			tablespaceName = defGetString(opt);
+		else if (strcmp(opt->defname, "index_tablespace") == 0)
+			idxtablespaceName = defGetString(opt);
 		else
 			ereport(ERROR,
 	(e

Re: Notes on physical replica failover with logical publisher or subscriber

2020-11-30 Thread Alexey Kondratov

Hi Craig,

On 2020-11-30 06:59, Craig Ringer wrote:


https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover



Thank you for sharing these notes. I have not dealt a lot with 
physical/logical replication interoperability, so those were mostly new 
problems for me to know.


One point from the wiki page, which seems clear enough to me:

```
Logical slots can fill pg_wal and can't benefit from archiving. Teach 
the logical decoding page read callback how to use the restore_command 
to retrieve WAL segs temporarily if they're not found in pg_wal...

```

It does not look like a big deal to teach logical decoding process to 
use restore_command, but I have some doubts about how everything will 
perform in the case when we started getting WAL from archive for 
decoding purposes. If we started using restore_command, then subscriber 
lagged long enough to exceed max_slot_wal_keep_size. Taking into account 
that getting WAL files from the archive has an additional overhead and 
that primary continues generating (and archiving) new segments, there is 
a possibility for primary to start doing this double duty forever --- 
archive WAL file at first and get it back for decoding when requested.


Another problem is that there are maybe several active decoders, IIRC, 
so they would have better to communicate in order to avoid fetching the 
same segment twice.




I tried to address many of these issues with failover slots, but I am
not trying to beat that dead horse now. I know that at least some
people here are of the opinion that effort shouldn't go into
logical/physical replication interoperation anyway - that we should
instead address the remaining limitations in logical replication so
that it can provide complete HA capabilities without use of physical
replication. So for now I'm just trying to save others who go looking
into these issues some time and warn them about some of the less
obvious booby-traps.



Another point to add regarding logical replication capabilities to build 
logical-only HA system --- logical equivalent of pg_rewind. At least I 
have not noticed anything after brief reading of the wiki page. IIUC, 
currently there is no way to quickly return ex-primary (ex-logical 
publisher) into HA-cluster without doing a pg_basebackup, isn't it? It 
seems that we should have the same problem here as with physical 
replication --- ex-primary may accept some xacts after promotion of new 
primary, so their history diverges and old primary should be rewound 
before being returned as standby (subscriber).



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-04 Thread Alexey Kondratov

On 2020-12-04 04:25, Justin Pryzby wrote:

On Thu, Dec 03, 2020 at 04:12:53PM +0900, Michael Paquier wrote:

> +typedef struct ReindexParams {
> +  bool concurrently;
> +  bool verbose;
> +  bool missingok;
> +
> +  int options;/* bitmask of lowlevel REINDEXOPT_* */
> +} ReindexParams;
> +

By moving everything into indexcmds.c, keeping ReindexParams within it
makes sense to me.  Now, there is no need for the three booleans
because options stores the same information, no?


 I liked the bools, but dropped them so the patch is smaller.



I had a look on 0001 and it looks mostly fine to me except some strange 
mixture of tabs/spaces in the ExecReindex(). There is also a couple of 
meaningful comments:


-   options =
-   (verbose ? REINDEXOPT_VERBOSE : 0) |
-   (concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+   if (verbose)
+   params.options |= REINDEXOPT_VERBOSE;

Why do we need this intermediate 'verbose' variable here? We only use it 
once to set a bitmask. Maybe we can do it like this:


params.options |= defGetBoolean(opt) ?
REINDEXOPT_VERBOSE : 0;

See also attached txt file with diff (I wonder can I trick cfbot this 
way, so it does not apply the diff).


+   int options;/* bitmask of lowlevel REINDEXOPT_* */

I would prefer if the comment says '/* bitmask of ReindexOption */' as 
in the VacuumOptions, since citing the exact enum type make it easier to 
navigate source code.




Regarding the REINDEX patch, I think this comment is misleading:

|+* Even if table was moved to new tablespace,
normally toast cannot move.
| */
|+   Oid toasttablespaceOid = allowSystemTableMods ?
tablespaceOid : InvalidOid;
|result |= reindex_relation(toast_relid, flags,

I think it ought to say "Even if a table's indexes were moved to a new
tablespace, its toast table's index is not normally moved"
Right ?



Yes, I think so, we are dealing only with index tablespace changing 
here. Thanks for noticing.




Also, I don't know whether we should check for GLOBALTABLESPACE_OID 
after

calling get_tablespace_oid(), or in the lowlevel routines.  Note that
reindex_relation is called during cluster/vacuum, and in the later 
patches, I
moved the test from from cluster() and ExecVacuum() to 
rebuild_relation().




IIRC, I wanted to do GLOBALTABLESPACE_OID check as early as possible 
(just after getting Oid), since it does not make sense to proceed 
further if tablespace is set to that value. So initially there were a 
lot of duplicative GLOBALTABLESPACE_OID checks, since there were a lot 
of reindex entry-points (index, relation, concurrently, etc.). Now we 
are going to have ExecReindex(), so there are much less entry-points and 
in my opinion it is fine to keep this validation just after 
get_tablespace_oid().


However, this is mostly a sanity check. I can hardly imagine a lot of 
users trying to constantly move indexes to the global tablespace, so it 
is also OK to put this check deeper into guts.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Companydiff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a27f8f9d83..0b1884815c 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2472,8 +2472,6 @@ void
 ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 {
 	ReindexParams		params = {0};
-	bool		verbose = false,
-concurrently = false;
 	ListCell   	*lc;
 	char	*tablespace = NULL;
 
@@ -2483,9 +2481,11 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 		DefElem*opt = (DefElem *) lfirst(lc);
 
 		if (strcmp(opt->defname, "verbose") == 0)
-			verbose = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ?
+REINDEXOPT_VERBOSE : 0;
 		else if (strcmp(opt->defname, "concurrently") == 0)
-			concurrently = defGetBoolean(opt);
+			params.options |= defGetBoolean(opt) ?
+REINDEXOPT_CONCURRENTLY : 0;
 		else if (strcmp(opt->defname, "tablespace") == 0)
 			tablespace = defGetString(opt);
 		else
@@ -2496,18 +2496,12 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 	 parser_errposition(pstate, opt->location)));
 	}
 
-	if (verbose)
-		params.options |= REINDEXOPT_VERBOSE;
+	params.tablespaceOid = tablespace ?
+		get_tablespace_oid(tablespace, false) : InvalidOid;
 
-	if (concurrently)
-	{
-		params.options |= REINDEXOPT_CONCURRENTLY;
+	if (params.options & REINDEXOPT_CONCURRENTLY)
 		PreventInTransactionBlock(isTopLevel,
   "REINDEX CONCURRENTLY");
-	}
-
-	params.tablespaceOid = tablespace ?
-		get_tablespace_oid(tablespace, false) : InvalidOid;
 
 	switch (stmt->kind)
 	{


  1   2   >