Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Fabien COELHO


Bonjour Michaël,


I don't see why not.  Perhaps this could be done for pgbench and
oid2name as well?


This is for pgbench.

I did not found a TAP test in pg_upgrade, I do not think that it is worth 
creating one for that purpose. The "test.sh" script does not seem 
appropriate for this kind of coverage error test.


The TAP test for oid2name only includes basic checks, options and 
arguments are not even tested, and there is no infra for actual testing, 
eg starting a server… Improving that would be a much more significant 
effort that the one line I added to pgbench existing TAP test.


--
Fabien.

Re: refactoring - share str2*int64 functions

2019-08-30 Thread Michael Paquier
On Thu, Aug 29, 2019 at 08:14:54AM +0200, Fabien COELHO wrote:
> Attached v7 does create uint64 overflow inline functions. The stuff yet is
> not moved to "common/int.c", a file which does not exists, even if
> "common/int.h" does.

Thanks for the new version.  I have begun reviewing your patch, and
attached is a first cut that I would like to commit separately which
adds all the compatibility overflow routines to int.h for uint16,
uint32 and uint64 with all the fallback implementations (int128-based
method added as well if available).  I have also grouped at the top of
the file the comments about each routine's return policy to avoid
duplication.  For the fallback implementations of uint64 using int128,
I think that it is cleaner to use uint128 so as there is no need to
check after negative results for multiplications, and I have made the
various expressions consistent for each size.

Attached is a small module called "overflow" with various regression
tests that I used to check each implementation.  I don't propose that
for commit as I am not sure if that's worth the extra CPU, so let's
consider it as a toy for now.

What do you think?
--
Michael


overflow.tar.gz
Description: application/gzip
diff --git a/src/include/common/int.h b/src/include/common/int.h
index d754798543..ccef1b4cc9 100644
--- a/src/include/common/int.h
+++ b/src/include/common/int.h
@@ -20,11 +20,26 @@
 #ifndef COMMON_INT_H
 #define COMMON_INT_H
 
-/*
- * If a + b overflows, return true, otherwise store the result of a + b into
- * *result. The content of *result is implementation defined in case of
+
+/*-
+ * The following guidelines apply to all the routines:
+ * - If a + b overflows, return true, otherwise store the result of a + b
+ * into *result. The content of *result is implementation defined in case of
  * overflow.
+ * - If a - b overflows, return true, otherwise store the result of a - b
+ * into *result. The content of *result is implementation defined in case of
+ * overflow.
+ * - If a * b overflows, return true, otherwise store the result of a * b
+ * into *result. The content of *result is implementation defined in case of
+ * overflow.
+ *-
  */
+
+/*
+ * Overflow routines for signed integers
+ *
+ */
+
 static inline bool
 pg_add_s16_overflow(int16 a, int16 b, int16 *result)
 {
@@ -43,11 +58,6 @@ pg_add_s16_overflow(int16 a, int16 b, int16 *result)
 #endif
 }
 
-/*
- * If a - b overflows, return true, otherwise store the result of a - b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_sub_s16_overflow(int16 a, int16 b, int16 *result)
 {
@@ -66,11 +76,6 @@ pg_sub_s16_overflow(int16 a, int16 b, int16 *result)
 #endif
 }
 
-/*
- * If a * b overflows, return true, otherwise store the result of a * b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_mul_s16_overflow(int16 a, int16 b, int16 *result)
 {
@@ -89,11 +94,6 @@ pg_mul_s16_overflow(int16 a, int16 b, int16 *result)
 #endif
 }
 
-/*
- * If a + b overflows, return true, otherwise store the result of a + b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_add_s32_overflow(int32 a, int32 b, int32 *result)
 {
@@ -112,11 +112,6 @@ pg_add_s32_overflow(int32 a, int32 b, int32 *result)
 #endif
 }
 
-/*
- * If a - b overflows, return true, otherwise store the result of a - b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_sub_s32_overflow(int32 a, int32 b, int32 *result)
 {
@@ -135,11 +130,6 @@ pg_sub_s32_overflow(int32 a, int32 b, int32 *result)
 #endif
 }
 
-/*
- * If a * b overflows, return true, otherwise store the result of a * b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_mul_s32_overflow(int32 a, int32 b, int32 *result)
 {
@@ -158,11 +148,6 @@ pg_mul_s32_overflow(int32 a, int32 b, int32 *result)
 #endif
 }
 
-/*
- * If a + b overflows, return true, otherwise store the result of a + b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_add_s64_overflow(int64 a, int64 b, int64 *result)
 {
@@ -190,11 +175,6 @@ pg_add_s64_overflow(int64 a, int64 b, int64 *result)
 #endif
 }
 
-/*
- * If a - b overflows, return true, otherwise store the result of a - b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_sub_s64_overflow(int64 a, int64 b, int64 *result)
 {
@@ -222,11 +202,6 @@ pg_sub_s64_overflow(int64 a, int64 b, int64 *result)
 #endif
 }
 
-/*
- * If a * b overflows, return true, otherwise store the result of a * b into
-

Re: refactoring - share str2*int64 functions

2019-08-30 Thread Fabien COELHO


Michaël,

attached is a first cut that I would like to commit separately which 
adds all the compatibility overflow routines to int.h for uint16, uint32 
and uint64 with all the fallback implementations (int128-based method 
added as well if available).  I have also grouped at the top of the file 
the comments about each routine's return policy to avoid duplication. 
For the fallback implementations of uint64 using int128, I think that it 
is cleaner to use uint128 so as there is no need to check after negative 
results for multiplications, and I have made the various expressions 
consistent for each size.


Patch applies cleanly, compiles, "make check" ok, but the added functions 
are not used (yet).


I think that factoring out comments is a good move.

For symmetry and efficiency, ISTM that uint16 mul overflow could use 
uint32 and uint32 could use uint64, and the division-based method be 
dropped in these cases.


Maybe I would add a comment before each new section telling about the 
type, eg:


 /*
  * UINT16
  */
 add/sub/mul uint16 functions…

I would tend to commit working solutions per type rather than by 
installment, so that at least all added functions are actually used 
somewhere, but it does not matter much, really.


I was unsure that having int128 implies uint128 availability, so I did not 
use it.


The checking extension is funny, but ISTM that these checks should be (are 
already?) included in some standard sql test, which will test the macros 
from direct SQL operations:


  fabien=# SELECT INT2 '1234512434343';
  ERROR:  value "1234512434343" is out of range for type smallint

Well, a quick look at "src/test/regress/sql/int2.sql" suggests that
there the existing tests should be extended… :-(

--
Fabien.

Re: POC: Cleaning up orphaned files using undo logs

2019-08-30 Thread Kuntal Ghosh
Hello Thomas,

I was doing some testing for the scenario where the undo written by a
transaction overflows to multiple undo logs. For that I've modified
the following macro:
#define UndoLogMaxSize (1024 * 1024) /* 1MB undo log size */
(I should have used the provided pg_force_switch_undo though..)

I'm getting the following assert failure while performing the recovery
with the same.
"TRAP: FailedAssertion("slot->meta.status == UNDO_LOG_STATUS_FULL",
File: "undolog.c", Line: 997)"

I found that we don't emit an WAL record when we update the
slot->meta.status as UNDO_LOG_STATUS_FULL. If we don't that, after
crash recovery, some new transaction may use that undo log which is
wrong, IMHO. Am I missing something?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: basebackup.c's sendFile() ignores read errors

2019-08-30 Thread Jeevan Chalke
On Thu, Aug 29, 2019 at 3:17 PM Jeevan Ladhe 
wrote:

> Hi Jeevan
>
> I had a look at the patch and this seems correct to me.
>

Thanks, Jeevan Ladhe.


>
> Few minor comments:
>
> + /* Check fread() error. */
> + CHECK_FREAD_ERROR(fp, pathbuf);
> +
>
> The comments above the macro call at both the places are not necessary as
> your macro name itself is self-explanatory.
>
> --
> + /*
> + * If file is truncated, then we will hit
> + * end-of-file error in which case we don't
> + * want to error out, instead just pad it with
> + * zeros.
> + */
> + if (feof(fp))
>
> The if block does not do the truncation right away, so I think the comment
> above can be reworded to explain why we reset cnt?
>

Fixed both comments in the attached patch.

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c91f66d..3aea470 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -90,6 +90,18 @@ static char *statrelpath = NULL;
  */
 #define THROTTLING_FREQUENCY	8
 
+/*
+ * Checks whether we encountered any error in fread().  fread() doesn't give
+ * any clue what has happened, so we check with ferror().  Also, neither
+ * fread() nor ferror() set errno, so we just throw a generic error.
+ */
+#define CHECK_FREAD_ERROR(fp, filename) \
+do { \
+	if (ferror(fp)) \
+		ereport(ERROR, \
+(errmsg("could not read from file \"%s\"", filename))); \
+} while (0)
+
 /* The actual number of bytes, transfer of which may cause sleep. */
 static uint64 throttling_sample;
 
@@ -543,6 +555,8 @@ perform_base_backup(basebackup_options *opt)
 	break;
 			}
 
+			CHECK_FREAD_ERROR(fp, pathbuf);
+
 			if (len != wal_segment_size)
 			{
 CheckXLogRemoved(segno, tli);
@@ -1478,6 +1492,20 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 			if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
 			{
+/*
+ * If the file is truncated, then we will hit
+ * an end-of-file error in which case we don't
+ * want to error out, instead just pad the rest
+ * of the file with zeros. However, we want to
+ * send all the bytes read so far, and thus set
+ * the cnt accordingly.
+ */
+if (feof(fp))
+{
+	cnt = BLCKSZ * i;
+	break;
+}
+
 ereport(ERROR,
 		(errcode_for_file_access(),
 		 errmsg("could not reread block %d of file \"%s\": %m",
@@ -1529,7 +1557,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		len += cnt;
 		throttle(cnt);
 
-		if (len >= statbuf->st_size)
+		if (feof(fp) || len >= statbuf->st_size)
 		{
 			/*
 			 * Reached end of file. The file could be longer, if it was
@@ -1540,6 +1568,8 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		}
 	}
 
+	CHECK_FREAD_ERROR(fp, readfilename);
+
 	/* If the file was truncated while we were sending it, pad it with zeros */
 	if (len < statbuf->st_size)
 	{


Re: [HACKERS] CLUSTER command progress monitor

2019-08-30 Thread Masahiko Sawada
On Thu, Aug 15, 2019 at 12:48 PM Tatsuro Yamada
 wrote:
>
> Hi Michael, Alvaro and Robert!
>
> On 2019/08/14 11:52, Michael Paquier wrote:
> > On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote:
> >> On 2019/08/13 14:40, Tatsuro Yamada wrote:
> >>> On 2019/08/02 3:43, Alvaro Herrera wrote:
>  Hmm, I'm trying this out now and I don't see the index_rebuild_count
>  ever go up.  I think it's because the indexes are built using parallel
>  index build ... or maybe it was the table AM changes that moved things
>  around, not sure.  There's a period at the end when the CLUSTER command
>  keeps working, but it's gone from pg_stat_progress_cluster.
> >>>
> >>> Thanks for your report.
> >>> I'll investigate it. :)
> >>
> >> I did "git bisect" and found the commit:
> >>
> >>   03f9e5cba0ee1633af4abe734504df50af46fbd8
> >>   Report progress of REINDEX operations
> >
> > I am adding an open item for this one.
> > --
> > Michael
>
>
> Okay, I checked it on the wiki.
>
>https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items
>- index_rebuild_count in CLUSTER reporting never increments
>
> To be clear, 03f9e5cb broke CLUSTER progress reporting, but
> I investigated little more and share my ideas to fix the problem.
>
> * Call stack
> 
> cluster_rel
>pgstat_progress_start_command(CLUSTER) *A1
>rebuild_relation
>  finish_heap_swap
>reindex_relation
>  reindex_index
>pgstat_progress_start_command(CREATE_INDEX) *B1
>pgstat_progress_end_command() *B2
>  pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :(
>pgstat_progress_end_command() *A2
>
>Note
>  These are sets:
>A1 and A2,
>B1 and B2
> 
>
>
> * Ideas to fix
>There are Three options, I guess.
> 
>1. Call pgstat_progress_start_command(CLUSTER) again
>   before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i).
>
>2. Add "save and restore" functions for the following two
>   variables of MyBeentry in pgstat.c.
>  - st_progress_command
>  - st_progress_command_target
>
>3. Use Hash or List to store multiple values for the two
>   variables in pgstat.c.
> 
>
>
> I tried 1. and it shown index_rebuild_count, but it also shown
> "initializing" phase again and other columns were empty. So, it is
> not suitable to fix the problem. :(
> I'm going to try 2. and 3., but, unfortunately, I can't get enough
> time to do that after PGConf.Asia 2019.
> If we selected 3., it affects following these progress reporting
> features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> I suppose. Any comments welcome! :)

I looked at this open item. I prefer #3 but I think it would be enough
to have a small stack using a fixed length array to track nested
progress information and the current executing command (maybe 4 or 8
would be enough for maximum nested level for now?). That way, we don't
need any change the interfaces. AFAICS there is no case where we call
only either pgstat_progress_start_command or
pgstat_progress_end_command without each other (although
pgstat_progress_update_param is called without start). So I think that
having a small stack for tracking multiple reports would work.

Attached the draft patch that fixes this issue. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d362e7f7d7..99e4844e6c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3016,8 +3016,10 @@ pgstat_bestart(void)
 #endif
 
 	lbeentry.st_state = STATE_UNDEFINED;
-	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
-	lbeentry.st_progress_command_target = InvalidOid;
+	MemSet(&(lbeentry.st_progress_cmds), 0,
+		   sizeof(PgBackendProgressInfo) * PGSTAT_MAX_PROGRESS_INFO);
+	/* Set invalid command index */
+	lbeentry.st_current_cmd = -1;
 
 	/*
 	 * we don't zero st_progress_param here to save cycles; nobody should
@@ -3203,10 +3205,22 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	if (!beentry || !pgstat_track_activities)
 		return;
 
+	Assert(beentry->st_current_cmd >= -1);
+
+	/* The given command is already started */
+	if (beentry->st_progress_cmds[beentry->st_current_cmd].command == cmdtype)
+		return;
+
+	/* The progress information queue is full */
+	if (beentry->st_current_cmd >= PGSTAT_MAX_PROGRESS_INFO - 1)
+		elog(ERROR, "progress information per backends is full");
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
-	beentry->st_progress_command = cmdtype;
-	beentry->st_progress_command_target = relid;
-	MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
+	beentry->st_current_cmd++;
+	MemSet(&(beentry->st_pr

Re: Yet another fast GiST build

2019-08-30 Thread Andrey Borodin


> 30 авг. 2019 г., в 3:47, Alexander Korotkov  
> написал(а):
> 
> 1) Binary search in non-leaf pages instead of probing each key is much faster.

That's a neat idea, but key union breaks ordering, even for z-order.
for two sets of tuples X and Y
if for any i,o from N, Xi < Yo
does not guaranty union(X) < union (Y) 

For example consider this z-ordered keyspace (picture attached)


union(5, 9) is z-order-smaller than union(4,4)

I'm not even sure we can use sorted search for choosing subtree for insertion.

How do you think, should I supply GiST-build patch with docs and tests and add 
it to CF? Or do we need more design discussion before?


Best regards, Andrey Borodin.

Re: WIP: Generic functions for Node types using generated metadata

2019-08-30 Thread Fabien COELHO


Hello Andres,

Just my 0.02 €:


There's been a lot of complaints over the years about how annoying it is
to keep the out/read/copy/equalfuncs.c functions in sync with the actual
underlying structs.

There've been various calls for automating their generation, but no
actual patches that I am aware of.


I started something a while back, AFAICR after spending stupid time 
looking for a stupid missing field copy or whatever. I wrote a (simple) 
perl script deriving all (most) node utility functions for the header 
files.


I gave up as the idea did not gather much momentum from committers, so I 
assumed the effort would be rejected in the end. AFAICR the feedback 
spirit was something like "node definition do not change often, we can 
manage it by hand".



There also recently has been discussion about generating more efficient
memory layout for node trees that we know are read only (e.g. plan trees
inside the plancache), and about copying such trees more efficiently
(e.g. by having one big allocation, and then just adjusting pointers).


If pointers are relative to the start, it could be just indexes that do 
not need much adjusting.


One way to approach this problem would be to to parse the type 
definitions, and directly generate code for the various functions. But 
that does mean that such a code-generator needs to be expanded for each 
such functions.


No big deal for the effort I made. The issue was more dealing with 
exceptions (eg "we do not serialize this field because it is not used for 
some reason") and understanding some implicit assumptions in the struct 
declarations.



An alternative approach is to have a parser of the node definitions that
doesn't generate code directly, but instead generates metadata. And then
use that metadata to write node aware functions.  This seems more
promising to me.


Hmmm. The approach we had in an (old) research project was to write the 
meta data, and derive all struct & utility functions from these. It is 
simpler this way because you save parsing some C, and it can be made 
language agnostic (i.e. serializing the data structure from a language and 
reading its value from another).



I'm fairly sure this metadata can also be used to write the other
currently existing node functions.


Beware of strange exceptions…


With regards to using libclang for the parsing: I chose that because it
seemed the easiest to experiment with, compared to annotating all the
structs with enough metadata to be able to easily parse them from a perl
script.


I did not find this an issue when I tried, because the annotation needed 
is basically the type name of the field.



The node definitions are after all distributed over quite a few headers.


Yep.


I think it might even be the correct way forward, over inventing our own
mini-languages and writing ad-hoc parsers for those. It sure is easier
to understand plain C code, compared to having to understand various
embeded mini-languages consisting out of macros.


Dunno.

The obvious drawback is that it'd require more people to install 
libclang - a significant imposition.


Indeed. A perl-only dependence would be much simpler that relying on a 
particular library from a particular compiler to compile postgres, 
possibly with an unrelated compiler.



Alternatively we could annotate the code enough to be able to write our
own parser, or use some other C parser.


If you can dictate some conventions, eg one line/one field, simple perl 
regexpr would work well I think, you would not need a parser per se.



I don't really want to invest significantly more time into this without
first debating the general idea.


That what I did, and I quitted quickly:-)

On the general idea, I'm 100% convinced that stupid utility functions 
should be either generic or generated, not maintained by hand.


--
Fabien.

Re: basebackup.c's sendFile() ignores read errors

2019-08-30 Thread Jeevan Ladhe
>
> Fixed both comments in the attached patch.
>

Thanks, the patch looks good to me.

Regards,
Jeevan Ladhe


Re: WIP: Generic functions for Node types using generated metadata

2019-08-30 Thread Fabien COELHO


Hallo Andres,


There've been various calls for automating their generation, but no
actual patches that I am aware of.


I started something a while back


I have found this thread:

https://www.postgresql.org/message-id/flat/E1cq93r-0004ey-Mp%40gemulon.postgresql.org

It seems that comments from committers discouraged me to go on… :-) For 
instance Robert wanted a "checker", which is basically harder than a 
generator because you have to parse both sides and then compare.


Basically there was no consensus at the time (March 2017), so I gave up. 
It seems that I even distroyed the branch I was working on, so the no 
doubt magnificent perl script I wrote is also lost.


--
Fabien.

Re: block-level incremental backup

2019-08-30 Thread Rajkumar Raghuwanshi
Hi,

I am doing some testing on pg_basebackup and pg_combinebackup patches. I
have also tried to create tap test for pg_combinebackup by taking
reference from pg_basebackup tap cases.
Attaching first draft test patch.

I have done some testing with compression options, both -z and -Z level is
working with incremental backup.

A minor comment : It is mentioned in pg_combinebackup help that maximum 10
incremental backup can be given with -i option, but I found maximum 9
incremental backup directories can be given at a time.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


On Thu, Aug 29, 2019 at 10:06 PM Jeevan Ladhe 
wrote:

> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format,
> extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.
>
> Thoughts?
>
> Regards,
> Jeevan Ladhe
>
diff --git a/src/bin/pg_combinebackup/t/pg_combinebackup.pl b/src/bin/pg_combinebackup/t/pg_combinebackup.pl
new file mode 100644
index 000..e0f834a
--- /dev/null
+++ b/src/bin/pg_combinebackup/t/pg_combinebackup.pl
@@ -0,0 +1,79 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 23;
+
+program_help_ok('pg_combinebackup');
+program_version_ok('pg_combinebackup');
+program_options_handling_ok('pg_combinebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Initialize node
+$node->init();
+my $pgdata = $node->data_dir;
+
+# Change wal related setting for pg_basebackup to run
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->start;
+
+$node->command_fails(['pg_combinebackup'],
+	'pg_combinebackup needs full and incremental directory specified');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+	q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath",'unlogged main fork in base');
+
+# Run full base backup.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup"],
+	'pg_basebackup runs for full backup');
+ok(-f "$tempdir/backup/PG_VERSION", 'full backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+	'unlogged main fork not in backup');
+
+# Get LSN of last backup to use for incremental backupslurp_file
+my @extract_lsn = split (" ", scalar TestLib::slurp_file("$tempdir/backup/backup_label"));
+my $LSN = $extract_lsn[3];
+
+# Run incr base backup.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup1",'--lsn', "$LSN"],
+	'pg_basebackup runs for incremental backup');
+ok(-f "$tempdir/backup1/PG_VERSION", 'incremental backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup1/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup1/$baseUnloggedPath",
+	'unlogged main fork not in backup');
+
+# Run pg_combinebackup.
+$node->command_ok([ 'pg_combinebackup', '-f', "$tempdir/backup", '-i', "$tempdir/backup1", '-o', "$tempdir/backup2"],
+	'pg_combinebackup runs');
+ok(-f "$tempdir/backup2/PG_VERSION", 'combined backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup2/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup2/$baseUnloggedPath",
+	'unlogged main fork not in backup');


Re: block-level incremental backup

2019-08-30 Thread Jeevan Ladhe
Here are some comments:


+/* The reference XLOG position for the incremental backup. */

+static XLogRecPtr refptr;

As Robert already pointed we may want to pass this as parameter around
instead
of a global variable. Also, can be renamed to something like:
incr_backup_refptr.
I see in your earlier version of patch this was named startincrptr, which I
think was more meaningful.

-

/*

 * If incremental backup, see whether the filename is a relation
filename
 * or not.

 */

Can be reworded something like:
"If incremental backup, check if it is relation file and can be sent
partially."

-

+   if (verify_checksum)
+   {
+   ereport(WARNING,
+   (errmsg("cannot verify checksum in file \"%s\",
block "
+   "%d: read buffer size %d and page size %d "
+   "differ",
+   readfilename, blkno, (int) cnt, BLCKSZ)));
+   verify_checksum = false;
+   }

For do_incremental_backup() it does not make sense to show the block number
in
warning as it is always going to be 0 when we throw this warning.
Further, I think this can be rephrased as:
"cannot verify checksum in file \"%s\", read file size %d is not multiple of
page size %d".

Or maybe we can just say:
"cannot verify checksum in file \"%s\"" if checksum requested, disable the
checksum and leave it to the following message:

+   ereport(WARNING,
+   (errmsg("file size (%d) not in multiple of page size
(%d), sending whole file",
+   (int) cnt, BLCKSZ)));

-

If you agree on the above comment for blkno, then we can shift declaration
of blkno
inside the condition "   if (!sendwholefile)" in
do_incremental_backup(), or
avoid it altogether, and just pass "i" as blkindex, as well as blkno to
verify_page_checksum(). May be add a comment why they are same in case of
incremental backup.

-

I think we should give the user hint from where he should be reading the
input
lsn for incremental backup in the --help option as well as documentation?
Something like - "To take an incremental backup, please provide value of
"--lsn"
as the "START WAL LOCATION" of previously taken full backup or incremental
backup from backup_lable file.

-

pg_combinebackup:

+static bool made_new_outputdata = false;
+static bool found_existing_outputdata = false;

Both of these are global, I understand that we need them global so that
they are
accessible in cleanup_directories_atexit(). But they are passed to
verify_dir_is_empty_or_create() as parameters, which I think is not needed.
Instead verify_dir_is_empty_or_create() can directly change the globals.

-

I see that checksum_failure is never set and always remains as false. May be
it is something that you wanted to set in combine_partial_files() when a
the corrupted partial file is detected?

-

I think the logic for verifying the backup chain should be moved out from
main()
function to a separate function.

-

+ /*
+ * Verify the backup chain.  INCREMENTAL BACKUP REFERENCE WAL LOCATION of
+ * the incremental backup must match with the START WAL LOCATION of the
+ * previous backup, until we reach a full backup in which there is no
+ * INCREMENTAL BACKUP REFERENCE WAL LOCATION.
+ */

The current logic assumes the incremental backup directories are to be
provided
as input in the serial order the backups were taken. This is bit confusing
unless clarified in pg_combinebackup help menu or documentation. I think we
should clarify it at both the places.

-

I think scan_directory() should be rather renamed as do_combinebackup().

Regards,
Jeevan Ladhe

On Thu, Aug 29, 2019 at 8:11 PM Jeevan Ladhe 
wrote:

> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format,
> extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.
>
> Thoughts?
>
> Regards,
> Jeevan Ladhe
>


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2019-08-30 Thread Nikhil Sontakke
Hi,

> > This patch has been around for some time now, the last version fails to
> > apply cleanly and in-depth reviews have happened.  I am moving that to
> > the next CF, waiting on its author.
>
> Unfortunately, nothing was changed since then, so there is already some amount
> of unaddressed review feedback. I'll move this patch to "Returned with
> feedback".
>

Craig Ringer mentioned about this thread to me recently.

This effort has seen decent reviews from Craig, Andres and Michael
already. So, I thought of refreshing it to work against latest master
HEAD.

PFA, main patch as well as the test patch (I named the test patch v17
to be consistent with the main patch). The major grouse with the test
patch AFAICS was the use of non-Windows compliant timersub() function.
I have now used INSTR_TIME_SET_CURRENT/INSTR_TIME_SUBTRACT family of
portable macros for the same.

Please let me know on what we think of the above.

Regards,
Nikhil
-- 
Nikhil Sontakke
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/


0002-libpq_batch_tests_community_master.v17.patch
Description: Binary data


0001-libpq_batch_support_commmunity_master.v17.patch
Description: Binary data


Re: refactoring - share str2*int64 functions

2019-08-30 Thread Michael Paquier
On Fri, Aug 30, 2019 at 10:06:11AM +0200, Fabien COELHO wrote:
> Patch applies cleanly, compiles, "make check" ok, but the added functions
> are not used (yet).

Thanks.

> I think that factoring out comments is a good move.
> 
> For symmetry and efficiency, ISTM that uint16 mul overflow could use uint32
> and uint32 could use uint64, and the division-based method be dropped in
> these cases.

Yes, the division would be worse than the other.  What do you think
about using the previous module I sent and measure how long it takes
to evaluate the overflows in some specific cases N times in loops?

> Maybe I would add a comment before each new section telling about the type,
> eg:
> 
>  /*
>   * UINT16
>   */
>  add/sub/mul uint16 functions.

Let's do as you suggest here.

> I would tend to commit working solutions per type rather than by
> installment, so that at least all added functions are actually used
> somewhere, but it does not matter much, really.

I prefer by section, with testing dedicated to each part properly
done so as we can move to the next parts.

> I was unsure that having int128 implies uint128 availability, so I did not
> use it.

The recent Ryu-floating point work has begun using them (see f2s.c and
d2s.c).

> The checking extension is funny, but ISTM that these checks should be (are
> already?) included in some standard sql test, which will test the macros
> from direct SQL operations:

Sure.  But not for the unsigned part as there are no equivalent
in-core data types, still it is possible to trick things with signed
integer arguments.  I found my toy useful to check test all
implementations consistently.

>   fabien=# SELECT INT2 '1234512434343';
>   ERROR:  value "1234512434343" is out of range for type smallint
> 
> Well, a quick look at "src/test/regress/sql/int2.sql" suggests that
> there the existing tests should be extended… :-(

We can tackle that separately.  -32768 is perfectly legal for
smallint, and the test is wrong here.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Tom Lane
Fabien COELHO  writes:
> Could we maintain coverage by adding a TAP test? See 1 liner attached.

Is this issue *really* worth expending test cycles on forevermore?

Test cycles are not free, and I see zero reason to think that a
check of this sort would ever catch any bugs.  Now, if you had a
way to detect that somebody had forgotten the case in some new
program, that would be interesting.

regards, tom lane




Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Fabien COELHO


Hello Tom,


Could we maintain coverage by adding a TAP test? See 1 liner attached.


Is this issue *really* worth expending test cycles on forevermore?


With this argument consistently applied, postgres code coverage is 
consistently weak, with 25% of the code never executed, and 15% of 
functions never called. "psql" is abysmal, "libpq" is really weak.



Test cycles are not free, and I see zero reason to think that a
check of this sort would ever catch any bugs.  Now, if you had a
way to detect that somebody had forgotten the case in some new
program, that would be interesting.


It could get broken somehow, and the test would catch it?

That would be the only command which tests this feature?

This is a TAP test, not a test run on basic "make check". The cost is not 
measurable: pgbench 533 TAP tests run in 5 wallclock seconds, and this 
added test does not change that much.


Now, if you say you are against it, then it is rejected…

--
Fabien.

Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Tom Lane
Fabien COELHO  writes:
>> Is this issue *really* worth expending test cycles on forevermore?

> With this argument consistently applied, postgres code coverage is 
> consistently weak, with 25% of the code never executed, and 15% of 
> functions never called. "psql" is abysmal, "libpq" is really weak.

It's all a question of balance.  If you go too far in the other
direction, you end up with test suites that take an hour and a half
to run so nobody ever runs them (case in point, mysql).  I'm all for
improving coverage in meaningful ways --- but cases like this seem
unlikely to be worth ongoing expenditures of testing effort.

regards, tom lane




Re: refactoring - share str2*int64 functions

2019-08-30 Thread Fabien COELHO


Michaël,


For symmetry and efficiency, ISTM that uint16 mul overflow could use uint32
and uint32 could use uint64, and the division-based method be dropped in
these cases.


Yes, the division would be worse than the other.  What do you think
about using the previous module I sent and measure how long it takes
to evaluate the overflows in some specific cases N times in loops?


Given the overheads of the SQL interpreter, I'm unsure about what you 
would measure at the SQL level. You could just write a small standalone C 
program to test perf and accuracy. Maybe this is what you have in mind.



I would tend to commit working solutions per type rather than by
installment, so that at least all added functions are actually used
somewhere, but it does not matter much, really.


I prefer by section, with testing dedicated to each part properly
done so as we can move to the next parts.


This suggests that you will test twice: once when adding the inlined 
functions, once when calling from SQL.



The checking extension is funny, but ISTM that these checks should be (are
already?) included in some standard sql test, which will test the macros
from direct SQL operations:


Sure.  But not for the unsigned part as there are no equivalent
in-core data types,


Yep, it bothered me sometimes, but not enough to propose to add them.


still it is possible to trick things with signed integer arguments.


Is it?


I found my toy useful to check test all implementations consistently.


Ok.


  fabien=# SELECT INT2 '1234512434343';
  ERROR:  value "1234512434343" is out of range for type smallint

Well, a quick look at "src/test/regress/sql/int2.sql" suggests that
there the existing tests should be extended… :-(


We can tackle that separately.  -32768 is perfectly legal for
smallint, and the test is wrong here.


Do you mean:

 sql> SELECT -32768::INT2;
 ERROR:  smallint out of range

This is not a negative constant but the reverse of a positive, which is 
indeed out of range, although the error message could help more.


 sql> SELECT (-32768)::INT2;
 -32768 # ok

 sql> SELECT INT2 '-32768';
 -32768 # ok

--
Fabien.

Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Fabien COELHO




Is this issue *really* worth expending test cycles on forevermore?



With this argument consistently applied, postgres code coverage is
consistently weak, with 25% of the code never executed, and 15% of
functions never called. "psql" is abysmal, "libpq" is really weak.


It's all a question of balance.  If you go too far in the other
direction, you end up with test suites that take an hour and a half
to run so nobody ever runs them (case in point, mysql).  I'm all for
improving coverage in meaningful ways --- but cases like this seem
unlikely to be worth ongoing expenditures of testing effort.


Sure.

I think there is room for several classes of tests, important ones always 
run and others run say by the farm, and I thought that is what TAP tests 
were for, given they are quite expensive anyway (eg most TAP test create 
their own postgres instance).


So for me the suggestion was appropriate for a pgbench-specific TAP test.

--
Fabien.




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-08-30 Thread Konstantin Knizhnik





FWIW my understanding is that the speedup comes mostly from 
elimination of

the serialization to a file. That however requires savepoints to handle
aborts of subtransactions - I'm pretty sure I'd be trivial to create a
workload where this will be much slower (with many aborts of large
subtransactions).




I think that instead of defining savepoints it is simpler and more 
efficient to use


BeginInternalSubTransaction + 
ReleaseCurrentSubTransaction/RollbackAndReleaseCurrentSubTransaction


as it is done in PL/pgSQL (pl_exec.c).
Not sure if it can pr

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: base backup client as auxiliary backend process

2019-08-30 Thread Peter Eisentraut
> Attached is a very hackish patch to implement this.  It works like this:
> 
> # (assuming you have a primary already running somewhere)
> initdb -D data2 --replica
> $EDITOR data2/postgresql.conf  # set primary_conninfo
> pg_ctl -D data2 start

Attached is an updated patch for this.  I have changed the initdb option
name per suggestion.  The WAL receiver is now started concurrently with
the base backup.  There is progress reporting (ps display), fsyncing.
Configuration files are not copied anymore.  There is a simple test
suite.  Tablespace support is still missing, but it would be
straightforward.

It's still all to be considered experimental, but it's taking shape and
working pretty well.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From aae4640acbb2a1ae4ff5d2e80abce0798799fe73 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 30 Aug 2019 20:42:51 +0200
Subject: [PATCH v2] Base backup client as auxiliary backend process

Discussion: 
https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cd...@2ndquadrant.com
---
 doc/src/sgml/protocol.sgml|  12 +-
 doc/src/sgml/ref/initdb.sgml  |  17 +
 src/backend/access/transam/xlog.c |  84 ++--
 src/backend/bootstrap/bootstrap.c |   9 +
 src/backend/postmaster/pgstat.c   |   6 +
 src/backend/postmaster/postmaster.c   | 114 -
 src/backend/replication/basebackup.c  |  68 +++
 .../libpqwalreceiver/libpqwalreceiver.c   | 419 ++
 src/backend/replication/repl_gram.y   |   9 +-
 src/backend/replication/repl_scanner.l|   1 +
 src/bin/initdb/initdb.c   |  39 +-
 src/include/access/xlog.h |   6 +
 src/include/miscadmin.h   |   2 +
 src/include/pgstat.h  |   1 +
 src/include/replication/basebackup.h  |   2 +
 src/include/replication/walreceiver.h |   4 +
 src/include/utils/guc.h   |   2 +-
 src/test/recovery/t/018_basebackup.pl |  29 ++
 18 files changed, 768 insertions(+), 56 deletions(-)
 create mode 100644 src/test/recovery/t/018_basebackup.pl

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b20f1690a7..81f43b5c00 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2466,7 +2466,7 @@ Streaming Replication Protocol
   
 
   
-BASE_BACKUP [ LABEL 
'label' ] [ PROGRESS ] [ 
FAST ] [ WAL ] [ 
NOWAIT ] [ MAX_RATE 
rate ] [ TABLESPACE_MAP ] [ 
NOVERIFY_CHECKSUMS ]
+BASE_BACKUP [ LABEL 
'label' ] [ PROGRESS ] [ 
FAST ] [ WAL ] [ 
NOWAIT ] [ MAX_RATE 
rate ] [ TABLESPACE_MAP ] [ 
NOVERIFY_CHECKSUMS ] [ EXCLUDE_CONF ]
  BASE_BACKUP
 
 
@@ -2576,6 +2576,16 @@ Streaming Replication Protocol
  
 

+
+   
+EXCLUDE_CONF
+
+ 
+  Do not copy configuration files, that is, files that end in
+  .conf.
+ 
+
+   
   
  
  
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..1261e02d59 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -286,6 +286,23 @@ Options
   
  
 
+ 
+  -r
+  --replica
+  
+   
+Initialize a data directory for a physical replication replica.  The
+data directory will not be initialized with a full database system,
+but will instead only contain a minimal set of files.  A server that
+is started on this data directory will first fetch a base backup and
+then switch to standby mode.  The connection information for the base
+backup has to be configured by setting , and other parameters as desired,
+before the server is started.
+   
+  
+ 
+
  
   -S
   --sync-only
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e651a841bb..7ab8ab45f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -905,8 +905,6 @@ static void CheckRecoveryConsistency(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,

XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
-static void WriteControlFile(void);
-static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
 static bool CheckForStandbyTrigger(void);
 
@@ -4481,7 +4479,7 @@ rescanLatestTimeLine(void)
  * ReadControlFile() verifies they are correct.  We could split out the
  * I/O and compatibility-check functions, but there seems no need currently.
  */
-static void
+void
 WriteControlFile(void)
 {
int fd;
@@ -4573,7 +4571,7 @@ WriteControlFile(void)

Re: block-level incremental backup

2019-08-30 Thread Robert Haas
On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe
 wrote:
> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format, extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.

I don't agree. You're right that you would have to untar (and
uncompress) the backup to run pg_combinebackup, but you would also
have to do that to restore a non-incremental backup, so it doesn't
seem much different.  It's true that for an incremental backup you
might need to untar and uncompress multiple prior backups rather than
just one, but that's just the nature of an incremental backup.  And,
on a practical level, if you want compression, which is pretty likely
if you're thinking about incremental backups, the way to get that is
to use tar format with -z or -Z.

It might be interesting to teach pg_combinebackup to be able to read
tar-format backups, but I think that there are several variants of the
tar format, and I suspect it would need to read them all.  If someone
un-tars and re-tars a backup with a different tar tool, we don't want
it to become unreadable.  So we'd either have to write our own
de-tarring library or add an external dependency on one.  I don't
think it's worth doing that at this point; I definitely don't think it
needs to be part of the first patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Fabien COELHO


Bonjour Michaël,


I don't see why not.  Perhaps this could be done for pgbench and
oid2name as well?


This is for pgbench.

I did not found a TAP test in pg_upgrade, I do not think that it is worth 
creating one for that purpose. The "test.sh" script does not seem 
appropriate for this kind of coverage error test.


The TAP test for oid2name only includes basic checks, options and 
arguments are not even tested, and there is no infra for actual testing, 
eg starting a server… Improving that would be a much more significant 
effort that the one line I added to pgbench existing TAP test.


--
Fabien.

Re: refactoring - share str2*int64 functions

2019-08-30 Thread Michael Paquier
On Thu, Aug 29, 2019 at 08:14:54AM +0200, Fabien COELHO wrote:
> Attached v7 does create uint64 overflow inline functions. The stuff yet is
> not moved to "common/int.c", a file which does not exists, even if
> "common/int.h" does.

Thanks for the new version.  I have begun reviewing your patch, and
attached is a first cut that I would like to commit separately which
adds all the compatibility overflow routines to int.h for uint16,
uint32 and uint64 with all the fallback implementations (int128-based
method added as well if available).  I have also grouped at the top of
the file the comments about each routine's return policy to avoid
duplication.  For the fallback implementations of uint64 using int128,
I think that it is cleaner to use uint128 so as there is no need to
check after negative results for multiplications, and I have made the
various expressions consistent for each size.

Attached is a small module called "overflow" with various regression
tests that I used to check each implementation.  I don't propose that
for commit as I am not sure if that's worth the extra CPU, so let's
consider it as a toy for now.

What do you think?
--
Michael


overflow.tar.gz
Description: application/gzip
diff --git a/src/include/common/int.h b/src/include/common/int.h
index d754798543..ccef1b4cc9 100644
--- a/src/include/common/int.h
+++ b/src/include/common/int.h
@@ -20,11 +20,26 @@
 #ifndef COMMON_INT_H
 #define COMMON_INT_H
 
-/*
- * If a + b overflows, return true, otherwise store the result of a + b into
- * *result. The content of *result is implementation defined in case of
+
+/*-
+ * The following guidelines apply to all the routines:
+ * - If a + b overflows, return true, otherwise store the result of a + b
+ * into *result. The content of *result is implementation defined in case of
  * overflow.
+ * - If a - b overflows, return true, otherwise store the result of a - b
+ * into *result. The content of *result is implementation defined in case of
+ * overflow.
+ * - If a * b overflows, return true, otherwise store the result of a * b
+ * into *result. The content of *result is implementation defined in case of
+ * overflow.
+ *-
  */
+
+/*
+ * Overflow routines for signed integers
+ *
+ */
+
 static inline bool
 pg_add_s16_overflow(int16 a, int16 b, int16 *result)
 {
@@ -43,11 +58,6 @@ pg_add_s16_overflow(int16 a, int16 b, int16 *result)
 #endif
 }
 
-/*
- * If a - b overflows, return true, otherwise store the result of a - b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_sub_s16_overflow(int16 a, int16 b, int16 *result)
 {
@@ -66,11 +76,6 @@ pg_sub_s16_overflow(int16 a, int16 b, int16 *result)
 #endif
 }
 
-/*
- * If a * b overflows, return true, otherwise store the result of a * b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_mul_s16_overflow(int16 a, int16 b, int16 *result)
 {
@@ -89,11 +94,6 @@ pg_mul_s16_overflow(int16 a, int16 b, int16 *result)
 #endif
 }
 
-/*
- * If a + b overflows, return true, otherwise store the result of a + b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_add_s32_overflow(int32 a, int32 b, int32 *result)
 {
@@ -112,11 +112,6 @@ pg_add_s32_overflow(int32 a, int32 b, int32 *result)
 #endif
 }
 
-/*
- * If a - b overflows, return true, otherwise store the result of a - b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_sub_s32_overflow(int32 a, int32 b, int32 *result)
 {
@@ -135,11 +130,6 @@ pg_sub_s32_overflow(int32 a, int32 b, int32 *result)
 #endif
 }
 
-/*
- * If a * b overflows, return true, otherwise store the result of a * b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_mul_s32_overflow(int32 a, int32 b, int32 *result)
 {
@@ -158,11 +148,6 @@ pg_mul_s32_overflow(int32 a, int32 b, int32 *result)
 #endif
 }
 
-/*
- * If a + b overflows, return true, otherwise store the result of a + b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_add_s64_overflow(int64 a, int64 b, int64 *result)
 {
@@ -190,11 +175,6 @@ pg_add_s64_overflow(int64 a, int64 b, int64 *result)
 #endif
 }
 
-/*
- * If a - b overflows, return true, otherwise store the result of a - b into
- * *result. The content of *result is implementation defined in case of
- * overflow.
- */
 static inline bool
 pg_sub_s64_overflow(int64 a, int64 b, int64 *result)
 {
@@ -222,11 +202,6 @@ pg_sub_s64_overflow(int64 a, int64 b, int64 *result)
 #endif
 }
 
-/*
- * If a * b overflows, return true, otherwise store the result of a * b into
-

Re: refactoring - share str2*int64 functions

2019-08-30 Thread Fabien COELHO


Michaël,

attached is a first cut that I would like to commit separately which 
adds all the compatibility overflow routines to int.h for uint16, uint32 
and uint64 with all the fallback implementations (int128-based method 
added as well if available).  I have also grouped at the top of the file 
the comments about each routine's return policy to avoid duplication. 
For the fallback implementations of uint64 using int128, I think that it 
is cleaner to use uint128 so as there is no need to check after negative 
results for multiplications, and I have made the various expressions 
consistent for each size.


Patch applies cleanly, compiles, "make check" ok, but the added functions 
are not used (yet).


I think that factoring out comments is a good move.

For symmetry and efficiency, ISTM that uint16 mul overflow could use 
uint32 and uint32 could use uint64, and the division-based method be 
dropped in these cases.


Maybe I would add a comment before each new section telling about the 
type, eg:


 /*
  * UINT16
  */
 add/sub/mul uint16 functions…

I would tend to commit working solutions per type rather than by 
installment, so that at least all added functions are actually used 
somewhere, but it does not matter much, really.


I was unsure that having int128 implies uint128 availability, so I did not 
use it.


The checking extension is funny, but ISTM that these checks should be (are 
already?) included in some standard sql test, which will test the macros 
from direct SQL operations:


  fabien=# SELECT INT2 '1234512434343';
  ERROR:  value "1234512434343" is out of range for type smallint

Well, a quick look at "src/test/regress/sql/int2.sql" suggests that
there the existing tests should be extended… :-(

--
Fabien.

Re: POC: Cleaning up orphaned files using undo logs

2019-08-30 Thread Kuntal Ghosh
Hello Thomas,

I was doing some testing for the scenario where the undo written by a
transaction overflows to multiple undo logs. For that I've modified
the following macro:
#define UndoLogMaxSize (1024 * 1024) /* 1MB undo log size */
(I should have used the provided pg_force_switch_undo though..)

I'm getting the following assert failure while performing the recovery
with the same.
"TRAP: FailedAssertion("slot->meta.status == UNDO_LOG_STATUS_FULL",
File: "undolog.c", Line: 997)"

I found that we don't emit an WAL record when we update the
slot->meta.status as UNDO_LOG_STATUS_FULL. If we don't that, after
crash recovery, some new transaction may use that undo log which is
wrong, IMHO. Am I missing something?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: basebackup.c's sendFile() ignores read errors

2019-08-30 Thread Jeevan Chalke
On Thu, Aug 29, 2019 at 3:17 PM Jeevan Ladhe 
wrote:

> Hi Jeevan
>
> I had a look at the patch and this seems correct to me.
>

Thanks, Jeevan Ladhe.


>
> Few minor comments:
>
> + /* Check fread() error. */
> + CHECK_FREAD_ERROR(fp, pathbuf);
> +
>
> The comments above the macro call at both the places are not necessary as
> your macro name itself is self-explanatory.
>
> --
> + /*
> + * If file is truncated, then we will hit
> + * end-of-file error in which case we don't
> + * want to error out, instead just pad it with
> + * zeros.
> + */
> + if (feof(fp))
>
> The if block does not do the truncation right away, so I think the comment
> above can be reworded to explain why we reset cnt?
>

Fixed both comments in the attached patch.

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c91f66d..3aea470 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -90,6 +90,18 @@ static char *statrelpath = NULL;
  */
 #define THROTTLING_FREQUENCY	8
 
+/*
+ * Checks whether we encountered any error in fread().  fread() doesn't give
+ * any clue what has happened, so we check with ferror().  Also, neither
+ * fread() nor ferror() set errno, so we just throw a generic error.
+ */
+#define CHECK_FREAD_ERROR(fp, filename) \
+do { \
+	if (ferror(fp)) \
+		ereport(ERROR, \
+(errmsg("could not read from file \"%s\"", filename))); \
+} while (0)
+
 /* The actual number of bytes, transfer of which may cause sleep. */
 static uint64 throttling_sample;
 
@@ -543,6 +555,8 @@ perform_base_backup(basebackup_options *opt)
 	break;
 			}
 
+			CHECK_FREAD_ERROR(fp, pathbuf);
+
 			if (len != wal_segment_size)
 			{
 CheckXLogRemoved(segno, tli);
@@ -1478,6 +1492,20 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 			if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
 			{
+/*
+ * If the file is truncated, then we will hit
+ * an end-of-file error in which case we don't
+ * want to error out, instead just pad the rest
+ * of the file with zeros. However, we want to
+ * send all the bytes read so far, and thus set
+ * the cnt accordingly.
+ */
+if (feof(fp))
+{
+	cnt = BLCKSZ * i;
+	break;
+}
+
 ereport(ERROR,
 		(errcode_for_file_access(),
 		 errmsg("could not reread block %d of file \"%s\": %m",
@@ -1529,7 +1557,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		len += cnt;
 		throttle(cnt);
 
-		if (len >= statbuf->st_size)
+		if (feof(fp) || len >= statbuf->st_size)
 		{
 			/*
 			 * Reached end of file. The file could be longer, if it was
@@ -1540,6 +1568,8 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		}
 	}
 
+	CHECK_FREAD_ERROR(fp, readfilename);
+
 	/* If the file was truncated while we were sending it, pad it with zeros */
 	if (len < statbuf->st_size)
 	{


Re: [HACKERS] CLUSTER command progress monitor

2019-08-30 Thread Masahiko Sawada
On Thu, Aug 15, 2019 at 12:48 PM Tatsuro Yamada
 wrote:
>
> Hi Michael, Alvaro and Robert!
>
> On 2019/08/14 11:52, Michael Paquier wrote:
> > On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote:
> >> On 2019/08/13 14:40, Tatsuro Yamada wrote:
> >>> On 2019/08/02 3:43, Alvaro Herrera wrote:
>  Hmm, I'm trying this out now and I don't see the index_rebuild_count
>  ever go up.  I think it's because the indexes are built using parallel
>  index build ... or maybe it was the table AM changes that moved things
>  around, not sure.  There's a period at the end when the CLUSTER command
>  keeps working, but it's gone from pg_stat_progress_cluster.
> >>>
> >>> Thanks for your report.
> >>> I'll investigate it. :)
> >>
> >> I did "git bisect" and found the commit:
> >>
> >>   03f9e5cba0ee1633af4abe734504df50af46fbd8
> >>   Report progress of REINDEX operations
> >
> > I am adding an open item for this one.
> > --
> > Michael
>
>
> Okay, I checked it on the wiki.
>
>https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items
>- index_rebuild_count in CLUSTER reporting never increments
>
> To be clear, 03f9e5cb broke CLUSTER progress reporting, but
> I investigated little more and share my ideas to fix the problem.
>
> * Call stack
> 
> cluster_rel
>pgstat_progress_start_command(CLUSTER) *A1
>rebuild_relation
>  finish_heap_swap
>reindex_relation
>  reindex_index
>pgstat_progress_start_command(CREATE_INDEX) *B1
>pgstat_progress_end_command() *B2
>  pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :(
>pgstat_progress_end_command() *A2
>
>Note
>  These are sets:
>A1 and A2,
>B1 and B2
> 
>
>
> * Ideas to fix
>There are Three options, I guess.
> 
>1. Call pgstat_progress_start_command(CLUSTER) again
>   before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i).
>
>2. Add "save and restore" functions for the following two
>   variables of MyBeentry in pgstat.c.
>  - st_progress_command
>  - st_progress_command_target
>
>3. Use Hash or List to store multiple values for the two
>   variables in pgstat.c.
> 
>
>
> I tried 1. and it shown index_rebuild_count, but it also shown
> "initializing" phase again and other columns were empty. So, it is
> not suitable to fix the problem. :(
> I'm going to try 2. and 3., but, unfortunately, I can't get enough
> time to do that after PGConf.Asia 2019.
> If we selected 3., it affects following these progress reporting
> features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> I suppose. Any comments welcome! :)

I looked at this open item. I prefer #3 but I think it would be enough
to have a small stack using a fixed length array to track nested
progress information and the current executing command (maybe 4 or 8
would be enough for maximum nested level for now?). That way, we don't
need any change the interfaces. AFAICS there is no case where we call
only either pgstat_progress_start_command or
pgstat_progress_end_command without each other (although
pgstat_progress_update_param is called without start). So I think that
having a small stack for tracking multiple reports would work.

Attached the draft patch that fixes this issue. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d362e7f7d7..99e4844e6c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3016,8 +3016,10 @@ pgstat_bestart(void)
 #endif
 
 	lbeentry.st_state = STATE_UNDEFINED;
-	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
-	lbeentry.st_progress_command_target = InvalidOid;
+	MemSet(&(lbeentry.st_progress_cmds), 0,
+		   sizeof(PgBackendProgressInfo) * PGSTAT_MAX_PROGRESS_INFO);
+	/* Set invalid command index */
+	lbeentry.st_current_cmd = -1;
 
 	/*
 	 * we don't zero st_progress_param here to save cycles; nobody should
@@ -3203,10 +3205,22 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	if (!beentry || !pgstat_track_activities)
 		return;
 
+	Assert(beentry->st_current_cmd >= -1);
+
+	/* The given command is already started */
+	if (beentry->st_progress_cmds[beentry->st_current_cmd].command == cmdtype)
+		return;
+
+	/* The progress information queue is full */
+	if (beentry->st_current_cmd >= PGSTAT_MAX_PROGRESS_INFO - 1)
+		elog(ERROR, "progress information per backends is full");
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
-	beentry->st_progress_command = cmdtype;
-	beentry->st_progress_command_target = relid;
-	MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
+	beentry->st_current_cmd++;
+	MemSet(&(beentry->st_pr

Re: Yet another fast GiST build

2019-08-30 Thread Andrey Borodin


> 30 авг. 2019 г., в 3:47, Alexander Korotkov  
> написал(а):
> 
> 1) Binary search in non-leaf pages instead of probing each key is much faster.

That's a neat idea, but key union breaks ordering, even for z-order.
for two sets of tuples X and Y
if for any i,o from N, Xi < Yo
does not guaranty union(X) < union (Y) 

For example consider this z-ordered keyspace (picture attached)


union(5, 9) is z-order-smaller than union(4,4)

I'm not even sure we can use sorted search for choosing subtree for insertion.

How do you think, should I supply GiST-build patch with docs and tests and add 
it to CF? Or do we need more design discussion before?


Best regards, Andrey Borodin.

Re: WIP: Generic functions for Node types using generated metadata

2019-08-30 Thread Fabien COELHO


Hello Andres,

Just my 0.02 €:


There's been a lot of complaints over the years about how annoying it is
to keep the out/read/copy/equalfuncs.c functions in sync with the actual
underlying structs.

There've been various calls for automating their generation, but no
actual patches that I am aware of.


I started something a while back, AFAICR after spending stupid time 
looking for a stupid missing field copy or whatever. I wrote a (simple) 
perl script deriving all (most) node utility functions for the header 
files.


I gave up as the idea did not gather much momentum from committers, so I 
assumed the effort would be rejected in the end. AFAICR the feedback 
spirit was something like "node definition do not change often, we can 
manage it by hand".



There also recently has been discussion about generating more efficient
memory layout for node trees that we know are read only (e.g. plan trees
inside the plancache), and about copying such trees more efficiently
(e.g. by having one big allocation, and then just adjusting pointers).


If pointers are relative to the start, it could be just indexes that do 
not need much adjusting.


One way to approach this problem would be to to parse the type 
definitions, and directly generate code for the various functions. But 
that does mean that such a code-generator needs to be expanded for each 
such functions.


No big deal for the effort I made. The issue was more dealing with 
exceptions (eg "we do not serialize this field because it is not used for 
some reason") and understanding some implicit assumptions in the struct 
declarations.



An alternative approach is to have a parser of the node definitions that
doesn't generate code directly, but instead generates metadata. And then
use that metadata to write node aware functions.  This seems more
promising to me.


Hmmm. The approach we had in an (old) research project was to write the 
meta data, and derive all struct & utility functions from these. It is 
simpler this way because you save parsing some C, and it can be made 
language agnostic (i.e. serializing the data structure from a language and 
reading its value from another).



I'm fairly sure this metadata can also be used to write the other
currently existing node functions.


Beware of strange exceptions…


With regards to using libclang for the parsing: I chose that because it
seemed the easiest to experiment with, compared to annotating all the
structs with enough metadata to be able to easily parse them from a perl
script.


I did not find this an issue when I tried, because the annotation needed 
is basically the type name of the field.



The node definitions are after all distributed over quite a few headers.


Yep.


I think it might even be the correct way forward, over inventing our own
mini-languages and writing ad-hoc parsers for those. It sure is easier
to understand plain C code, compared to having to understand various
embeded mini-languages consisting out of macros.


Dunno.

The obvious drawback is that it'd require more people to install 
libclang - a significant imposition.


Indeed. A perl-only dependence would be much simpler that relying on a 
particular library from a particular compiler to compile postgres, 
possibly with an unrelated compiler.



Alternatively we could annotate the code enough to be able to write our
own parser, or use some other C parser.


If you can dictate some conventions, eg one line/one field, simple perl 
regexpr would work well I think, you would not need a parser per se.



I don't really want to invest significantly more time into this without
first debating the general idea.


That what I did, and I quitted quickly:-)

On the general idea, I'm 100% convinced that stupid utility functions 
should be either generic or generated, not maintained by hand.


--
Fabien.

Re: basebackup.c's sendFile() ignores read errors

2019-08-30 Thread Jeevan Ladhe
>
> Fixed both comments in the attached patch.
>

Thanks, the patch looks good to me.

Regards,
Jeevan Ladhe


Re: WIP: Generic functions for Node types using generated metadata

2019-08-30 Thread Fabien COELHO


Hallo Andres,


There've been various calls for automating their generation, but no
actual patches that I am aware of.


I started something a while back


I have found this thread:

https://www.postgresql.org/message-id/flat/E1cq93r-0004ey-Mp%40gemulon.postgresql.org

It seems that comments from committers discouraged me to go on… :-) For 
instance Robert wanted a "checker", which is basically harder than a 
generator because you have to parse both sides and then compare.


Basically there was no consensus at the time (March 2017), so I gave up. 
It seems that I even distroyed the branch I was working on, so the no 
doubt magnificent perl script I wrote is also lost.


--
Fabien.

Re: block-level incremental backup

2019-08-30 Thread Rajkumar Raghuwanshi
Hi,

I am doing some testing on pg_basebackup and pg_combinebackup patches. I
have also tried to create tap test for pg_combinebackup by taking
reference from pg_basebackup tap cases.
Attaching first draft test patch.

I have done some testing with compression options, both -z and -Z level is
working with incremental backup.

A minor comment : It is mentioned in pg_combinebackup help that maximum 10
incremental backup can be given with -i option, but I found maximum 9
incremental backup directories can be given at a time.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


On Thu, Aug 29, 2019 at 10:06 PM Jeevan Ladhe 
wrote:

> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format,
> extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.
>
> Thoughts?
>
> Regards,
> Jeevan Ladhe
>
diff --git a/src/bin/pg_combinebackup/t/pg_combinebackup.pl b/src/bin/pg_combinebackup/t/pg_combinebackup.pl
new file mode 100644
index 000..e0f834a
--- /dev/null
+++ b/src/bin/pg_combinebackup/t/pg_combinebackup.pl
@@ -0,0 +1,79 @@
+use strict;
+use warnings;
+use Cwd;
+use Config;
+use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 23;
+
+program_help_ok('pg_combinebackup');
+program_version_ok('pg_combinebackup');
+program_options_handling_ok('pg_combinebackup');
+
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node('main');
+
+# Initialize node
+$node->init();
+my $pgdata = $node->data_dir;
+
+# Change wal related setting for pg_basebackup to run
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->start;
+
+$node->command_fails(['pg_combinebackup'],
+	'pg_combinebackup needs full and incremental directory specified');
+
+# Create an unlogged table to test that forks other than init are not copied.
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+	q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'unlogged init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath",'unlogged main fork in base');
+
+# Run full base backup.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup"],
+	'pg_basebackup runs for full backup');
+ok(-f "$tempdir/backup/PG_VERSION", 'full backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup/$baseUnloggedPath",
+	'unlogged main fork not in backup');
+
+# Get LSN of last backup to use for incremental backupslurp_file
+my @extract_lsn = split (" ", scalar TestLib::slurp_file("$tempdir/backup/backup_label"));
+my $LSN = $extract_lsn[3];
+
+# Run incr base backup.
+$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup1",'--lsn', "$LSN"],
+	'pg_basebackup runs for incremental backup');
+ok(-f "$tempdir/backup1/PG_VERSION", 'incremental backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup1/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup1/$baseUnloggedPath",
+	'unlogged main fork not in backup');
+
+# Run pg_combinebackup.
+$node->command_ok([ 'pg_combinebackup', '-f', "$tempdir/backup", '-i', "$tempdir/backup1", '-o', "$tempdir/backup2"],
+	'pg_combinebackup runs');
+ok(-f "$tempdir/backup2/PG_VERSION", 'combined backup was created');
+
+# Unlogged relation forks other than init should not be copied
+ok(-f "$tempdir/backup2/${baseUnloggedPath}_init",
+	'unlogged init fork in backup');
+ok( !-f "$tempdir/backup2/$baseUnloggedPath",
+	'unlogged main fork not in backup');


Re: block-level incremental backup

2019-08-30 Thread Jeevan Ladhe
Here are some comments:


+/* The reference XLOG position for the incremental backup. */

+static XLogRecPtr refptr;

As Robert already pointed we may want to pass this as parameter around
instead
of a global variable. Also, can be renamed to something like:
incr_backup_refptr.
I see in your earlier version of patch this was named startincrptr, which I
think was more meaningful.

-

/*

 * If incremental backup, see whether the filename is a relation
filename
 * or not.

 */

Can be reworded something like:
"If incremental backup, check if it is relation file and can be sent
partially."

-

+   if (verify_checksum)
+   {
+   ereport(WARNING,
+   (errmsg("cannot verify checksum in file \"%s\",
block "
+   "%d: read buffer size %d and page size %d "
+   "differ",
+   readfilename, blkno, (int) cnt, BLCKSZ)));
+   verify_checksum = false;
+   }

For do_incremental_backup() it does not make sense to show the block number
in
warning as it is always going to be 0 when we throw this warning.
Further, I think this can be rephrased as:
"cannot verify checksum in file \"%s\", read file size %d is not multiple of
page size %d".

Or maybe we can just say:
"cannot verify checksum in file \"%s\"" if checksum requested, disable the
checksum and leave it to the following message:

+   ereport(WARNING,
+   (errmsg("file size (%d) not in multiple of page size
(%d), sending whole file",
+   (int) cnt, BLCKSZ)));

-

If you agree on the above comment for blkno, then we can shift declaration
of blkno
inside the condition "   if (!sendwholefile)" in
do_incremental_backup(), or
avoid it altogether, and just pass "i" as blkindex, as well as blkno to
verify_page_checksum(). May be add a comment why they are same in case of
incremental backup.

-

I think we should give the user hint from where he should be reading the
input
lsn for incremental backup in the --help option as well as documentation?
Something like - "To take an incremental backup, please provide value of
"--lsn"
as the "START WAL LOCATION" of previously taken full backup or incremental
backup from backup_lable file.

-

pg_combinebackup:

+static bool made_new_outputdata = false;
+static bool found_existing_outputdata = false;

Both of these are global, I understand that we need them global so that
they are
accessible in cleanup_directories_atexit(). But they are passed to
verify_dir_is_empty_or_create() as parameters, which I think is not needed.
Instead verify_dir_is_empty_or_create() can directly change the globals.

-

I see that checksum_failure is never set and always remains as false. May be
it is something that you wanted to set in combine_partial_files() when a
the corrupted partial file is detected?

-

I think the logic for verifying the backup chain should be moved out from
main()
function to a separate function.

-

+ /*
+ * Verify the backup chain.  INCREMENTAL BACKUP REFERENCE WAL LOCATION of
+ * the incremental backup must match with the START WAL LOCATION of the
+ * previous backup, until we reach a full backup in which there is no
+ * INCREMENTAL BACKUP REFERENCE WAL LOCATION.
+ */

The current logic assumes the incremental backup directories are to be
provided
as input in the serial order the backups were taken. This is bit confusing
unless clarified in pg_combinebackup help menu or documentation. I think we
should clarify it at both the places.

-

I think scan_directory() should be rather renamed as do_combinebackup().

Regards,
Jeevan Ladhe

On Thu, Aug 29, 2019 at 8:11 PM Jeevan Ladhe 
wrote:

> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format,
> extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.
>
> Thoughts?
>
> Regards,
> Jeevan Ladhe
>


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2019-08-30 Thread Nikhil Sontakke
Hi,

> > This patch has been around for some time now, the last version fails to
> > apply cleanly and in-depth reviews have happened.  I am moving that to
> > the next CF, waiting on its author.
>
> Unfortunately, nothing was changed since then, so there is already some amount
> of unaddressed review feedback. I'll move this patch to "Returned with
> feedback".
>

Craig Ringer mentioned about this thread to me recently.

This effort has seen decent reviews from Craig, Andres and Michael
already. So, I thought of refreshing it to work against latest master
HEAD.

PFA, main patch as well as the test patch (I named the test patch v17
to be consistent with the main patch). The major grouse with the test
patch AFAICS was the use of non-Windows compliant timersub() function.
I have now used INSTR_TIME_SET_CURRENT/INSTR_TIME_SUBTRACT family of
portable macros for the same.

Please let me know on what we think of the above.

Regards,
Nikhil
-- 
Nikhil Sontakke
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/


0002-libpq_batch_tests_community_master.v17.patch
Description: Binary data


0001-libpq_batch_support_commmunity_master.v17.patch
Description: Binary data


Re: refactoring - share str2*int64 functions

2019-08-30 Thread Michael Paquier
On Fri, Aug 30, 2019 at 10:06:11AM +0200, Fabien COELHO wrote:
> Patch applies cleanly, compiles, "make check" ok, but the added functions
> are not used (yet).

Thanks.

> I think that factoring out comments is a good move.
> 
> For symmetry and efficiency, ISTM that uint16 mul overflow could use uint32
> and uint32 could use uint64, and the division-based method be dropped in
> these cases.

Yes, the division would be worse than the other.  What do you think
about using the previous module I sent and measure how long it takes
to evaluate the overflows in some specific cases N times in loops?

> Maybe I would add a comment before each new section telling about the type,
> eg:
> 
>  /*
>   * UINT16
>   */
>  add/sub/mul uint16 functions.

Let's do as you suggest here.

> I would tend to commit working solutions per type rather than by
> installment, so that at least all added functions are actually used
> somewhere, but it does not matter much, really.

I prefer by section, with testing dedicated to each part properly
done so as we can move to the next parts.

> I was unsure that having int128 implies uint128 availability, so I did not
> use it.

The recent Ryu-floating point work has begun using them (see f2s.c and
d2s.c).

> The checking extension is funny, but ISTM that these checks should be (are
> already?) included in some standard sql test, which will test the macros
> from direct SQL operations:

Sure.  But not for the unsigned part as there are no equivalent
in-core data types, still it is possible to trick things with signed
integer arguments.  I found my toy useful to check test all
implementations consistently.

>   fabien=# SELECT INT2 '1234512434343';
>   ERROR:  value "1234512434343" is out of range for type smallint
> 
> Well, a quick look at "src/test/regress/sql/int2.sql" suggests that
> there the existing tests should be extended… :-(

We can tackle that separately.  -32768 is perfectly legal for
smallint, and the test is wrong here.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Tom Lane
Fabien COELHO  writes:
> Could we maintain coverage by adding a TAP test? See 1 liner attached.

Is this issue *really* worth expending test cycles on forevermore?

Test cycles are not free, and I see zero reason to think that a
check of this sort would ever catch any bugs.  Now, if you had a
way to detect that somebody had forgotten the case in some new
program, that would be interesting.

regards, tom lane




Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Fabien COELHO


Hello Tom,


Could we maintain coverage by adding a TAP test? See 1 liner attached.


Is this issue *really* worth expending test cycles on forevermore?


With this argument consistently applied, postgres code coverage is 
consistently weak, with 25% of the code never executed, and 15% of 
functions never called. "psql" is abysmal, "libpq" is really weak.



Test cycles are not free, and I see zero reason to think that a
check of this sort would ever catch any bugs.  Now, if you had a
way to detect that somebody had forgotten the case in some new
program, that would be interesting.


It could get broken somehow, and the test would catch it?

That would be the only command which tests this feature?

This is a TAP test, not a test run on basic "make check". The cost is not 
measurable: pgbench 533 TAP tests run in 5 wallclock seconds, and this 
added test does not change that much.


Now, if you say you are against it, then it is rejected…

--
Fabien.

Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Tom Lane
Fabien COELHO  writes:
>> Is this issue *really* worth expending test cycles on forevermore?

> With this argument consistently applied, postgres code coverage is 
> consistently weak, with 25% of the code never executed, and 15% of 
> functions never called. "psql" is abysmal, "libpq" is really weak.

It's all a question of balance.  If you go too far in the other
direction, you end up with test suites that take an hour and a half
to run so nobody ever runs them (case in point, mysql).  I'm all for
improving coverage in meaningful ways --- but cases like this seem
unlikely to be worth ongoing expenditures of testing effort.

regards, tom lane




Re: refactoring - share str2*int64 functions

2019-08-30 Thread Fabien COELHO


Michaël,


For symmetry and efficiency, ISTM that uint16 mul overflow could use uint32
and uint32 could use uint64, and the division-based method be dropped in
these cases.


Yes, the division would be worse than the other.  What do you think
about using the previous module I sent and measure how long it takes
to evaluate the overflows in some specific cases N times in loops?


Given the overheads of the SQL interpreter, I'm unsure about what you 
would measure at the SQL level. You could just write a small standalone C 
program to test perf and accuracy. Maybe this is what you have in mind.



I would tend to commit working solutions per type rather than by
installment, so that at least all added functions are actually used
somewhere, but it does not matter much, really.


I prefer by section, with testing dedicated to each part properly
done so as we can move to the next parts.


This suggests that you will test twice: once when adding the inlined 
functions, once when calling from SQL.



The checking extension is funny, but ISTM that these checks should be (are
already?) included in some standard sql test, which will test the macros
from direct SQL operations:


Sure.  But not for the unsigned part as there are no equivalent
in-core data types,


Yep, it bothered me sometimes, but not enough to propose to add them.


still it is possible to trick things with signed integer arguments.


Is it?


I found my toy useful to check test all implementations consistently.


Ok.


  fabien=# SELECT INT2 '1234512434343';
  ERROR:  value "1234512434343" is out of range for type smallint

Well, a quick look at "src/test/regress/sql/int2.sql" suggests that
there the existing tests should be extended… :-(


We can tackle that separately.  -32768 is perfectly legal for
smallint, and the test is wrong here.


Do you mean:

 sql> SELECT -32768::INT2;
 ERROR:  smallint out of range

This is not a negative constant but the reverse of a positive, which is 
indeed out of range, although the error message could help more.


 sql> SELECT (-32768)::INT2;
 -32768 # ok

 sql> SELECT INT2 '-32768';
 -32768 # ok

--
Fabien.

Re: pg_upgrade: Error out on too many command-line arguments

2019-08-30 Thread Fabien COELHO




Is this issue *really* worth expending test cycles on forevermore?



With this argument consistently applied, postgres code coverage is
consistently weak, with 25% of the code never executed, and 15% of
functions never called. "psql" is abysmal, "libpq" is really weak.


It's all a question of balance.  If you go too far in the other
direction, you end up with test suites that take an hour and a half
to run so nobody ever runs them (case in point, mysql).  I'm all for
improving coverage in meaningful ways --- but cases like this seem
unlikely to be worth ongoing expenditures of testing effort.


Sure.

I think there is room for several classes of tests, important ones always 
run and others run say by the farm, and I thought that is what TAP tests 
were for, given they are quite expensive anyway (eg most TAP test create 
their own postgres instance).


So for me the suggestion was appropriate for a pgbench-specific TAP test.

--
Fabien.




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-08-30 Thread Konstantin Knizhnik





FWIW my understanding is that the speedup comes mostly from 
elimination of

the serialization to a file. That however requires savepoints to handle
aborts of subtransactions - I'm pretty sure I'd be trivial to create a
workload where this will be much slower (with many aborts of large
subtransactions).




I think that instead of defining savepoints it is simpler and more 
efficient to use


BeginInternalSubTransaction + 
ReleaseCurrentSubTransaction/RollbackAndReleaseCurrentSubTransaction


as it is done in PL/pgSQL (pl_exec.c).
Not sure if it can pr

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: base backup client as auxiliary backend process

2019-08-30 Thread Peter Eisentraut
> Attached is a very hackish patch to implement this.  It works like this:
> 
> # (assuming you have a primary already running somewhere)
> initdb -D data2 --replica
> $EDITOR data2/postgresql.conf  # set primary_conninfo
> pg_ctl -D data2 start

Attached is an updated patch for this.  I have changed the initdb option
name per suggestion.  The WAL receiver is now started concurrently with
the base backup.  There is progress reporting (ps display), fsyncing.
Configuration files are not copied anymore.  There is a simple test
suite.  Tablespace support is still missing, but it would be
straightforward.

It's still all to be considered experimental, but it's taking shape and
working pretty well.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From aae4640acbb2a1ae4ff5d2e80abce0798799fe73 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 30 Aug 2019 20:42:51 +0200
Subject: [PATCH v2] Base backup client as auxiliary backend process

Discussion: 
https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cd...@2ndquadrant.com
---
 doc/src/sgml/protocol.sgml|  12 +-
 doc/src/sgml/ref/initdb.sgml  |  17 +
 src/backend/access/transam/xlog.c |  84 ++--
 src/backend/bootstrap/bootstrap.c |   9 +
 src/backend/postmaster/pgstat.c   |   6 +
 src/backend/postmaster/postmaster.c   | 114 -
 src/backend/replication/basebackup.c  |  68 +++
 .../libpqwalreceiver/libpqwalreceiver.c   | 419 ++
 src/backend/replication/repl_gram.y   |   9 +-
 src/backend/replication/repl_scanner.l|   1 +
 src/bin/initdb/initdb.c   |  39 +-
 src/include/access/xlog.h |   6 +
 src/include/miscadmin.h   |   2 +
 src/include/pgstat.h  |   1 +
 src/include/replication/basebackup.h  |   2 +
 src/include/replication/walreceiver.h |   4 +
 src/include/utils/guc.h   |   2 +-
 src/test/recovery/t/018_basebackup.pl |  29 ++
 18 files changed, 768 insertions(+), 56 deletions(-)
 create mode 100644 src/test/recovery/t/018_basebackup.pl

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b20f1690a7..81f43b5c00 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2466,7 +2466,7 @@ Streaming Replication Protocol
   
 
   
-BASE_BACKUP [ LABEL 
'label' ] [ PROGRESS ] [ 
FAST ] [ WAL ] [ 
NOWAIT ] [ MAX_RATE 
rate ] [ TABLESPACE_MAP ] [ 
NOVERIFY_CHECKSUMS ]
+BASE_BACKUP [ LABEL 
'label' ] [ PROGRESS ] [ 
FAST ] [ WAL ] [ 
NOWAIT ] [ MAX_RATE 
rate ] [ TABLESPACE_MAP ] [ 
NOVERIFY_CHECKSUMS ] [ EXCLUDE_CONF ]
  BASE_BACKUP
 
 
@@ -2576,6 +2576,16 @@ Streaming Replication Protocol
  
 

+
+   
+EXCLUDE_CONF
+
+ 
+  Do not copy configuration files, that is, files that end in
+  .conf.
+ 
+
+   
   
  
  
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..1261e02d59 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -286,6 +286,23 @@ Options
   
  
 
+ 
+  -r
+  --replica
+  
+   
+Initialize a data directory for a physical replication replica.  The
+data directory will not be initialized with a full database system,
+but will instead only contain a minimal set of files.  A server that
+is started on this data directory will first fetch a base backup and
+then switch to standby mode.  The connection information for the base
+backup has to be configured by setting , and other parameters as desired,
+before the server is started.
+   
+  
+ 
+
  
   -S
   --sync-only
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e651a841bb..7ab8ab45f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -905,8 +905,6 @@ static void CheckRecoveryConsistency(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,

XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
-static void WriteControlFile(void);
-static void ReadControlFile(void);
 static char *str_time(pg_time_t tnow);
 static bool CheckForStandbyTrigger(void);
 
@@ -4481,7 +4479,7 @@ rescanLatestTimeLine(void)
  * ReadControlFile() verifies they are correct.  We could split out the
  * I/O and compatibility-check functions, but there seems no need currently.
  */
-static void
+void
 WriteControlFile(void)
 {
int fd;
@@ -4573,7 +4571,7 @@ WriteControlFile(void)

Re: block-level incremental backup

2019-08-30 Thread Robert Haas
On Thu, Aug 29, 2019 at 10:41 AM Jeevan Ladhe
 wrote:
> Due to the inherent nature of pg_basebackup, the incremental backup also
> allows taking backup in tar and compressed format. But, pg_combinebackup
> does not understand how to restore this. I think we should either make
> pg_combinebackup support restoration of tar incremental backup or restrict
> taking the incremental backup in tar format until pg_combinebackup
> supports the restoration by making option '--lsn' and '-Ft' exclusive.
>
> It is arguable that one can take the incremental backup in tar format, extract
> that manually and then give the resultant directory as input to the
> pg_combinebackup, but I think that kills the purpose of having
> pg_combinebackup utility.

I don't agree. You're right that you would have to untar (and
uncompress) the backup to run pg_combinebackup, but you would also
have to do that to restore a non-incremental backup, so it doesn't
seem much different.  It's true that for an incremental backup you
might need to untar and uncompress multiple prior backups rather than
just one, but that's just the nature of an incremental backup.  And,
on a practical level, if you want compression, which is pretty likely
if you're thinking about incremental backups, the way to get that is
to use tar format with -z or -Z.

It might be interesting to teach pg_combinebackup to be able to read
tar-format backups, but I think that there are several variants of the
tar format, and I suspect it would need to read them all.  If someone
un-tars and re-tars a backup with a different tar tool, we don't want
it to become unreadable.  So we'd either have to write our own
de-tarring library or add an external dependency on one.  I don't
think it's worth doing that at this point; I definitely don't think it
needs to be part of the first patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company