Re: [HACKERS] Exclude pg_internal.init from base backup

2017-11-07 Thread David Steele
On 11/7/17 11:03 AM, Simon Riggs wrote:
> On 5 November 2017 at 11:55, Magnus Hagander  wrote:
>>
>> So +1 for documenting the difference in how these are handled, as this is
>> important to know for somebody writing an external tool for it.
> 
> Changes made, moving to commit the attached patch.

Thanks, Simon!  This was on my to do list today -- glad I checked my
email first.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-31 Thread David Steele
On 10/30/17 6:36 AM, Michael Paquier wrote:
> On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
>>
>> How does rep mgr or other programs using pg_rewind know what to exclude?
> 
> Good question. Answers could come from folks such as David Steele
> (pgBackRest) or Marco (barman) whom I am attaching in CC.

pgBackRest does not use pg_rewind.  However, pgBackRest does use the
same exclusion list for backups as pg_basebackup.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline ID in backup_label file

2017-10-26 Thread David Steele
On 10/25/17 10:10 PM, Craig Ringer wrote:
> On 26 October 2017 at 02:50, Michael Paquier  
> wrote:
>>
>> I would find interesting to add at the bottom of the backup_label file
>> a new field called "START TIMELINE: %d" to put this information in a
>> more understandable shape. Any opinions?
> 
> Strong "yes" from me.

+1

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenFile() Permissions Refactor

2017-09-24 Thread David Steele
On 9/23/17 10:22 AM, Peter Eisentraut wrote:
> On 9/13/17 10:26, David Steele wrote:
>> Here's a new patch based on your review.  Where I had a question I made
>> a choice as described below:
> 
> I have committed the changes to the file APIs and a fix for the umask
> save/restore issue.

Thank you!

> The mkdir changes didn't really inspire me.  Replacing mkdir() with
> MakeDirectoryPerm() without any change in functionality doesn't really
> improve clarity.  

OK.  I had hoped removing the need to specify the mode at every call
site was functionality enough.  Even so, I'm a little surprised you
didn't keep PG_DIR_MODE_DEFAULT.

> Maybe we'll revisit that when your next patches arrive.

The next patch set was be this same refactor on the tools (initdb,
pg_rewind, etc), but if you think the mkdir refactor did not add enough
value then I'll rethink my plans.

I may need to present all the patches in one CF so it's clear where all
this is going: allowing group read on $PGDATA.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show backtrace when tap tests fail

2017-09-20 Thread David Steele
On 9/20/17 6:26 AM, Dagfinn Ilmari Mannsåker wrote:
> Craig Ringer <cr...@2ndquadrant.com> writes:
> 
>> On 20 September 2017 at 06:36, David Steele <da...@pgmasters.net> wrote:
>>
>>>
>>> I just use:
>>>
>>> $SIG{__DIE__} = sub {Carp::confess @_};
> 
> That is the basic idea behind both Carp::Always and Devel::Confess, but
> they also avoid breaking non-string exceptions (references and objects)
> being thrown and caught.  Devel::Confess jumps through even more hoops
> to add backtraces to these without breaking code catching them.

I see.  My object exceptions are always confessed so this code is just
to catch random die's from parts of the code I can't control.  I have
never seen one of them throw a non-string exception before.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show backtrace when tap tests fail

2017-09-19 Thread David Steele

On 9/19/17 5:25 PM, Tom Lane wrote:

Andres Freund  writes:

On 2017-09-19 17:15:21 -0400, Tom Lane wrote:

Meh --- Carp::Always isn't standard either, so I think this is just extra
complication with little value-add.  Let's just do the Devel::Confess
incantation as Dagfinn has it.



Has ~25 times the installation base on debian tho...



https://qa.debian.org/popcon.php?package=libdevel-confess-perl (13)
vs
https://qa.debian.org/popcon.php?package=libcarp-always-perl (300)


Both of those read like "lost in the noise" to me.  I think with
either of these, we're more or less asking PG developers to install
a dependency they probably didn't have before.  We might as well
ask them to install the more useful one.


I just use:

$SIG{__DIE__} = sub {Carp::confess @_};

It also includes the stack for the confess, but that's only a single 
line and I don't care since the important information is at the top.


I have used this in production code and it doesn't seem to have any 
nasty side effects, though this is only a last resort for when a defined 
exception is not raised.  For test code it should be more than sufficient.


I have not tried this on Perls < 5.10, though.

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating backup history files for backups taken from standbys

2017-09-18 Thread David Steele
On 9/18/17 7:26 PM, Michael Paquier wrote:
> On Tue, Sep 19, 2017 at 8:14 AM, David Steele <da...@pgmasters.net> wrote:
>> On 8/31/17 11:56 PM, Michael Paquier wrote:
>>> Here is an updated patch with refreshed documentation, as a result of
>>> 449338c which was discussed in thread
>>> https://www.postgresql.org/message-id/d4d951b9-89c0-6bc1-b6ff-d0b2dd5a8...@pgmasters.net.
>>> I am just outlining the fact that a backup history file gets created
>>> on standbys and that it is archived.
>>
>> The patch looks good overall.  It applies cleanly and passes all tests.
>>
>> One question.  Do we want to create this file all the time (as written),
>> or only when archive_mode = always?
>>
>> It appears that CleanupBackupHistory() will immediately remove the
>> history file when archive_mode != always so it seems useless to write
>> it.  On the other hand the code is a bit simpler this way.  Thoughts?
> 
> With archive_mode = off, the bytes of the backup history file are
> still written to disk, so my take on the matter is to keep the code
> simple.

I'm OK with that.

I'll give Masahiko some time to respond before marking it RFC.

Regards,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating backup history files for backups taken from standbys

2017-09-18 Thread David Steele
Hi Michael,

On 8/31/17 11:56 PM, Michael Paquier wrote:

> Here is an updated patch with refreshed documentation, as a result of
> 449338c which was discussed in thread
> https://www.postgresql.org/message-id/d4d951b9-89c0-6bc1-b6ff-d0b2dd5a8...@pgmasters.net.
> I am just outlining the fact that a backup history file gets created
> on standbys and that it is archived.

The patch looks good overall.  It applies cleanly and passes all tests.

One question.  Do we want to create this file all the time (as written),
or only when archive_mode = always?

It appears that CleanupBackupHistory() will immediately remove the
history file when archive_mode != always so it seems useless to write
it.  On the other hand the code is a bit simpler this way.  Thoughts?

Regards,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional contrib test suites

2017-09-16 Thread David Steele
On 9/15/17 6:52 PM, Michael Paquier wrote:
> On Sat, Sep 16, 2017 at 5:15 AM, Tom Lane  wrote:
>>
>> Noting that mandrill is showing yet a different failure, one that I think
>> is inherent to chkpass:
>>
>>   CREATE TABLE test (i int, p chkpass);
>>   INSERT INTO test VALUES (1, 'hello'), (2, 'goodbye');
>> + WARNING:  type chkpass has unstable input conversion for "hello"
>> + LINE 1: INSERT INTO test VALUES (1, 'hello'), (2, 'goodbye');
>> + ^
>> + WARNING:  type chkpass has unstable input conversion for "goodbye"
>> + LINE 1: INSERT INTO test VALUES (1, 'hello'), (2, 'goodbye');
>> +   ^
>>
>> I'm starting to think that (4) might be the best avenue.  Or we could
>> consider
>>
>> (5) drop contrib/chkpass altogether, on the grounds that it's too badly
>> designed, and too obsolete crypto-wise, to be useful or supportable.
> 
> crypt() uses the 7 lowest characters, which makes for 7.2e16 values,
> so I would be fine with (5), then (4) as the test suite is not
> portable.

I'd prefer 5, but can go with 4.

I get that users need to store their own passwords, but we have support
for SHA1 via the crypto module which seems by far the better choice.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional contrib test suites

2017-09-14 Thread David Steele
On 9/8/17 1:32 PM, Peter Eisentraut wrote:
> 
> Yes, some of the error messages had changed.  Fixed patches attached.

Patches apply and all tests pass.  A few comments:

* [PATCH v2 1/7] adminpack: Add test suite

There are no regular tests for pg_logdir_ls().  It looks like TAP tests
would be required but I'm not sure it's worth it.  The fact that the
"default" log name format is hard-coded in is, um, interesting.

Maybe add:

+ SELECT pg_logdir_ls();
+ ERROR:  could not read directory "log": No such file or directory

to get a little more coverage?  It would be good to at least have a note
on why it is not tested.

* [PATCH v2 4/7] chkpass: Add test suite

Well, this is kind of scary:

+CREATE EXTENSION chkpass;
+WARNING:  type input function chkpass_in should not be volatile

I guess the only side effect is that this column cannot be indexed?  The
docs say that, so OK, but is there anything else a user should be
worried about?

The rest looks good.  I'll mark this "Ready for Committer" since I'm the
only reviewer.  I don't think anything you might add based on my
comments above requires a re-review.

As for testing on more platforms, send it to the build farm?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenFile() Permissions Refactor

2017-09-13 Thread David Steele
Hi Peter,

Here's a new patch based on your review.  Where I had a question I made
a choice as described below:

On 9/1/17 1:58 PM, David Steele wrote:
> On 9/1/17 1:15 PM, Peter Eisentraut wrote:
>> On 8/29/17 12:15, David Steele wrote:
>>
>> I wonder whether we even need that much flexibility.  We already set a
>> global umask, so we could just open all files with 0666 and let the
>> umask sort it out.  Then we don't need all the *Perm() variants.
> 
> Well, there's one instance where the *Perm is used:
> 
> diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
> - fd = OpenTransientFile(fnamebuf, O_CREAT | O_WRONLY | O_TRUNC | 
> PG_BINARY,
> -S_IRUSR | S_IWUSR | S_IRGRP 
> | S_IROTH);
> + fd = OpenTransientFilePerm(fnamebuf, O_CREAT | O_WRONLY | O_TRUNC |
> PG_BINARY,
> +S_IRUSR | S_IWUSR | 
> S_IRGRP | S_IROTH);
> 
> I also think it's worth leaving the variants for extensions to use.
> Even though there are no calls in the core extensions it's hard to say
> what might be out there in the field.

These have been left in.

>> I don't like the function-like macros in fd.h.  We can use proper functions.
> 
> I had them as functions originally but thought macros might be
> friendlier with compilers that don't inline.  I'm happy to change them back.

Macros have been converted to functions.

>> I also wonder whether the umask save-and-restore code in copy.c and
>> be-fsstubs.c is sound.  If the AllocateFile() call in between errors
>> out, then there is nothing that restores the original umask.  This might
>> need a TRY/CATCH block, or maybe just a chmod() afterwards.
> 
> Unless I'm mistaken this is a preexisting issue.  Would you prefer I
> submit a different patch for that or combine it into this patch?
> 
> The chmod() implementation seems the safer option to me and produces
> fewer code paths.  It also prevents partially-written files from being
> accessible to any user but postgres.

I went with chmod().  The fix is incorporated in this patch but if you
want it broken out let me know.

>> The mkdir() calls could perhaps use some further refactoring so you
>> don't have to specify the mode everywhere.
> 
> I thought about that but feared it would be considered an overreach.
> Does fd.c seem like a good place for the new function?

New functions MakeDirectory() and MakeDirectoryPerm() have been added to
fd.c.

MakeDirectoryPerm() is used in ipc.c.

>> This kind of code:
>>
>> -   if (stat_buf.st_mode & (S_IRWXG | S_IRWXO))
>> +   if (stat_buf.st_mode & PG_MODE_MASK_DEFAULT)
>> ereport(FATAL,
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>  errmsg("data directory \"%s\" has group or world access",
>> DataDir),
>> -errdetail("Permissions should be u=rwx (0700).")));
>> +errdetail("Permissions should be (%04o).",
>> PG_DIR_MODE_DEFAULT)));
>>
>> can be problematic, because we are hoping that PG_MODE_MASK_DEFAULT,
>> PG_DIR_MODE_DEFAULT, and the wording in the error message can stay
>> consistent.
> 
> Well, the eventual goal is to make the mask/mode configurable - at least
> to the extent that group access is allowed.  However, I'm happy to leave
> that discussion for another day.

Changes to postmaster.c have been reverted (except to rename mkdir to
MakeDirectory).

Patch v2 is attached.

-- 
-David
da...@pgmasters.net
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index fa409d72b7..3ab1fd2db4 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1869,8 +1869,7 @@ qtext_store(const char *query, int query_len,
*query_offset = off;
 
/* Now write the data into the successfully-reserved part of the file */
-   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY,
-  S_IRUSR | S_IWUSR);
+   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY);
if (fd < 0)
goto error;
 
@@ -1934,7 +1933,7 @@ qtext_load_file(Size *buffer_size)
int fd;
struct stat stat;
 
-   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY, 0);
+   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY);
if (fd < 0)
{
if (errno != ENOENT)
diff --git a/src/backend/access/heap/rewriteheap.c 
b/src/backend/access/heap/rewriteheap.c
index bd560e47e1..f93c19

[HACKERS] Exclude pg_internal.init from base backup

2017-09-02 Thread David Steele
Hackers,

The cache in pg_internal.init was reused in days of yore but has been
rebuilt on postmaster startup since v8.1.  It appears there is no reason
for this file to be backed up.

I also moved the RELCACHE_INIT_FILENAME constant to relcache.h to avoid
duplicating the string.

I'll add this to the 2017-11 CF.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 95aeb35507..c3e6c30eba 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1130,6 +1130,12 @@ SELECT pg_stop_backup();

 

+The pg_internal.init file can be omitted from the
+backup no matter what directory it appears in.  This file contains a
+relation cache that is always rebuilt on startup.
+   
+
+   
 The backup label
 file includes the label string you gave to pg_start_backup,
 as well as the time at which pg_start_backup was run, and
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 2bb4e38a9d..46748bfad0 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2384,6 +2384,11 @@ The commands accepted in walsender mode are:


 
+ pg_internal.init
+
+   
+   
+
  Various temporary files and directories created during the operation
  of the PostgreSQL server, such as any file or directory beginning
  with pgsql_tmp.
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index 9776858f03..2518774fb7 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -36,6 +36,7 @@
 #include "utils/builtins.h"
 #include "utils/elog.h"
 #include "utils/ps_status.h"
+#include "utils/relcache.h"
 #include "utils/timestamp.h"
 
 
@@ -151,6 +152,9 @@ static const char *excludeFiles[] =
/* Skip current log file temporary file */
LOG_METAINFO_DATAFILE_TMP,
 
+   /* Skip relation cache because it is rebuilt on startup */
+   RELCACHE_INIT_FILENAME,
+
/*
 * If there's a backup_label or tablespace_map file, it belongs to a
 * backup started by the user with pg_start_backup().  It is *not* 
correct
diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index b8e37809b0..5015719915 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -87,11 +87,6 @@
 #include "utils/tqual.h"
 
 
-/*
- * name of relcache init file(s), used to speed up backend startup
- */
-#define RELCACHE_INIT_FILENAME "pg_internal.init"
-
 #define RELCACHE_INIT_FILEMAGIC0x573266/* version ID 
value */
 
 /*
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl 
b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index a00f7b0e1a..d95ea3e0d5 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 72;
+use Test::More tests => 73;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -61,6 +61,11 @@ foreach my $filename (
close $file;
 }
 
+# Connect to a database to create global/pg_internal.init.  If this is removed
+# the test to ensure global/pg_internal.init is not copied will return a false
+# positive.
+$node->safe_psql('postgres', 'SELECT 1;');
+
 $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backup", '-X', 'none' ],
'pg_basebackup runs');
 ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
@@ -84,7 +89,8 @@ foreach my $dirname (
 
 # These files should not be copied.
 foreach my $filename (
-   qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid 
tablespace_map current_logfiles.tmp)
+   qw(postgresql.auto.conf.tmp postmaster.opts postmaster.pid 
tablespace_map current_logfiles.tmp
+   global/pg_internal.init)
   )
 {
ok(!-f "$tempdir/backup/$filename", "$filename not copied");
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 3c53cefe4b..29c6d9bae3 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -18,6 +18,11 @@
 #include "nodes/bitmapset.h"
 
 
+/*
+ * Name of relcache init file(s), used to speed up backend startup
+ */
+#define RELCACHE_INIT_FILENAME "pg_internal.init"
+
 typedef struct RelationData *Relation;
 
 /* 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename RECOVERYXLOG to RECOVERYWAL?

2017-09-02 Thread David Steele
On 9/1/17 7:53 PM, Michael Paquier wrote:
> On Sat, Sep 2, 2017 at 3:06 AM, Robert Haas  wrote:
>> I don't think this really buys us anything.  If we'd applied it to v10
>> maybe, but what do we get out of whacking it around now?
>>
>> "Consistency", I hear you cry!  Fair point.  But we never had a goal
>> of eliminating all internal references to "xlog", just the user-facing
>> ones.  And since RECOVERYXLOG is not documented, I think there's a
>> good argument that it's not user-facing.  You could argue that since
>> it shows up in the file system it's implicitly user-facing, and maybe
>> you're right; if some other committer really wants to make this
>> change, I won't grouse much.  But personally I'd favor leaving it
>> alone to avoid having the behavior change a little bit in every new
>> release.
> 
> I may be wrong, but I would suspect that some backup tools doing
> FS-level backup are checking on the existence of this file and skip
> it. From the point of view of operations, it does not matter much as
> any existing RECOVERYXLOG is unlinked before being replaced by a new
> one, but that would not be nice to add silently 16MB in each backup.

Yes, pgBackRest does have an "offline" mode that can be used (when the
database is shutdown) to do an FS-level backup.

It never occurred to me to exclude RECOVERYXLOG but with 1GB WAL
segments coming in v11 it might be a good idea.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename RECOVERYXLOG to RECOVERYWAL?

2017-09-01 Thread David Steele
On 9/1/17 2:06 PM, Robert Haas wrote:
> On Fri, Sep 1, 2017 at 12:57 PM, David Steele <da...@pgmasters.net> wrote:
>> I searched the various threads on the xlog -> wal rename and I couldn't
>> find any specific mention of why this was not renamed.
>>
>> I have attached a patch in case it was an oversight rather than left
>> as-is on purpose.
> 
> I don't think this really buys us anything.  If we'd applied it to v10
> maybe, but what do we get out of whacking it around now?

I was thinking it would be applied to v10.

> "Consistency", I hear you cry!  Fair point.  But we never had a goal
> of eliminating all internal references to "xlog", just the user-facing
> ones.  And since RECOVERYXLOG is not documented, I think there's a
> good argument that it's not user-facing.  You could argue that since
> it shows up in the file system it's implicitly user-facing, and maybe
> you're right; 

That's exactly my argument, in fact!

> if some other committer really wants to make this
> change, I won't grouse much.  But personally I'd favor leaving it
> alone to avoid having the behavior change a little bit in every new
> release.

Seems like since v10 is still beta and this is not really documented it
wouldn't be that big a deal to make the change.  If nothing else it
might keep the question from coming up in the future.

I'm not going to make a big fuss about it, though.  I noticed it while
testing the v10 support in pgbackRest and thought it was worth bringing up.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenFile() Permissions Refactor

2017-09-01 Thread David Steele
On 9/1/17 1:15 PM, Peter Eisentraut wrote:
> On 8/29/17 12:15, David Steele wrote:
> 
> I wonder whether we even need that much flexibility.  We already set a
> global umask, so we could just open all files with 0666 and let the
> umask sort it out.  Then we don't need all the *Perm() variants.

Well, there's one instance where the *Perm is used:

diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
-   fd = OpenTransientFile(fnamebuf, O_CREAT | O_WRONLY | O_TRUNC | 
PG_BINARY,
-  S_IRUSR | S_IWUSR | S_IRGRP 
| S_IROTH);
+   fd = OpenTransientFilePerm(fnamebuf, O_CREAT | O_WRONLY | O_TRUNC |
PG_BINARY,
+  S_IRUSR | S_IWUSR | 
S_IRGRP | S_IROTH);

I also think it's worth leaving the variants for extensions to use.
Even though there are no calls in the core extensions it's hard to say
what might be out there in the field.

> I don't like the function-like macros in fd.h.  We can use proper functions.

I had them as functions originally but thought macros might be
friendlier with compilers that don't inline.  I'm happy to change them back.

> I also wonder whether the umask save-and-restore code in copy.c and
> be-fsstubs.c is sound.  If the AllocateFile() call in between errors
> out, then there is nothing that restores the original umask.  This might
> need a TRY/CATCH block, or maybe just a chmod() afterwards.

Unless I'm mistaken this is a preexisting issue.  Would you prefer I
submit a different patch for that or combine it into this patch?

The chmod() implementation seems the safer option to me and produces
fewer code paths.  It also prevents partially-written files from being
accessible to any user but postgres.

> The mkdir() calls could perhaps use some further refactoring so you
> don't have to specify the mode everywhere.

I thought about that but feared it would be considered an overreach.
Does fd.c seem like a good place for the new function?

> This kind of code:
> 
> -   if (stat_buf.st_mode & (S_IRWXG | S_IRWXO))
> +   if (stat_buf.st_mode & PG_MODE_MASK_DEFAULT)
> ereport(FATAL,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("data directory \"%s\" has group or world access",
> DataDir),
> -errdetail("Permissions should be u=rwx (0700).")));
> +errdetail("Permissions should be (%04o).",
> PG_DIR_MODE_DEFAULT)));
> 
> can be problematic, because we are hoping that PG_MODE_MASK_DEFAULT,
> PG_DIR_MODE_DEFAULT, and the wording in the error message can stay
> consistent.

Well, the eventual goal is to make the mask/mode configurable - at least
to the extent that group access is allowed.  However, I'm happy to leave
that discussion for another day.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Rename RECOVERYXLOG to RECOVERYWAL?

2017-09-01 Thread David Steele
I searched the various threads on the xlog -> wal rename and I couldn't
find any specific mention of why this was not renamed.

I have attached a patch in case it was an oversight rather than left
as-is on purpose.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index df4843f409..fd7bfa11d1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3562,7 +3562,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
set_ps_display(activitymsg, false);
 
restoredFromArchive = RestoreArchivedFile(path, 
xlogfname,
-   
  "RECOVERYXLOG",
+   
  "RECOVERYWAL",

  XLogSegSize,

  InRedo);
if (!restoredFromArchive)
@@ -5550,10 +5550,10 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr 
endOfLog)
XLogArchiveCleanup(xlogfname);
 
/*
-* Since there might be a partial WAL segment named RECOVERYXLOG, get 
rid
+* Since there might be a partial WAL segment named RECOVERYWAL, get rid
 * of it.
 */
-   snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
+   snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYWAL");
unlink(recoveryPath);   /* ignore any error */
 
/* Get rid of any remaining recovered timeline-history file, too */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-31 Thread David Steele
On 8/31/17 4:04 PM, Robert Haas wrote:
> On Wed, Aug 30, 2017 at 8:37 PM, Michael Paquier
>  wrote:
>> Thanks for the new version. This looks fine to me.
> 
> Committed to REL9_6_STABLE with minor wordsmithing.

The edits look good to me. Thanks, Robert!

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-30 Thread David Steele
Hi Michael,

Thanks for reviewing!

On 8/29/17 9:44 PM, Michael Paquier wrote:
> On Tue, Aug 29, 2017 at 10:59 PM, David Steele <da...@pgmasters.net> wrote:
>>
>> Attached is the 9.6 patch.  It required a bit more work in func.sgml
>> than I was expecting so have a close look at that.  The rest was mostly
>> removing irrelevant hunks.
> 
> + switch to the next WAL segment.  On a standby, it is not possible to
> + automatically switch WAL segments, so you may wish to run
> + pg_switch_wal on the primary to perform a manual
> + switch. The reason for the switch is to arrange for
> [...]
> +WAL segments have been archived. If write activity on the primary
> is low, it
> +may be useful to run pg_switch_wal on the primary in order 
> to
> +trigger an immediate segment switch of the last required WAL
> It seems to me that both portions are wrong. There is no archiving
> wait on standbys for 9.6, and 
I think its clearly stated here that pg_stop_backup() does not wait for
WAL to archive on a standby.  Even, it is very important for the backup
routine to make sure that all the WAL *is* archived.

> pg_stop_backup triggers by itself the
> segment switch, so saying that enforcing pg_switch_wal on the primary
> is moot. 

pg_stop_backup() does not perform a WAL switch on the standby which is
what this sentence is referring to.  I have separated this section out
into a new paragraph to (hopefully) make it clearer.

> pg_switch_xlog has been renamed to pg_switch_wal in PG10, so
> the former name applies.

Whoops!

New patch is attached.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 03c0dbf1cd..fd696c38db 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -889,8 +889,11 @@ SELECT pg_start_backup('label', false, false);
 
 SELECT * FROM pg_stop_backup(false);
 
- This terminates the backup mode and performs an automatic switch to
- the next WAL segment.  The reason for the switch is to arrange for
+ This terminates backup mode. On a primary, it also performs an automatic
+ switch to the next WAL segment.  On a standby, it is not possible to
+ automatically switch WAL segments, so you may wish to run
+ pg_switch_xlog on the primary to perform a manual
+ switch. The reason for the switch is to arrange for
  the last WAL segment file written during the backup interval to be
  ready to archive.
 
@@ -908,7 +911,7 @@ SELECT * FROM pg_stop_backup(false);
  Once the WAL segment files active during the backup are archived, you are
  done.  The file identified by pg_stop_backup's first return
  value is the last segment that is required to form a complete set of
- backup files.  If archive_mode is enabled,
+ backup files.  On a primary, if archive_mode is enabled,
  pg_stop_backup does not return until the last segment has
  been archived.
  Archiving of these files happens automatically since you have
@@ -924,6 +927,13 @@ SELECT * FROM pg_stop_backup(false);
  pg_stop_backup terminates because of this your backup
  may not be valid.
 
+
+
+ Note that on a standby pg_stop_backup does not wait for
+ WAL segments to be archived so the backup process must ensure that all WAL
+ segments required for the backup are successfully archived.
+
+

   
 
@@ -932,9 +942,9 @@ SELECT * FROM pg_stop_backup(false);
 Making an exclusive low level backup
 
  The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps. It does not allow
- more than one concurrent backup to run, and there can be some issues on
- the server if it crashes during the backup. Prior to PostgreSQL 9.6, this
+ non-exclusive one, but it differs in a few key steps. This type of backup
+ can only be taken on a primary and does not allow concurrent backups.
+ Prior to PostgreSQL 9.6, this
  was the only low-level method available, but it is now recommended that
  all users upgrade their scripts to use non-exclusive backups if possible.
 
@@ -992,6 +1002,11 @@ SELECT pg_start_backup('label', true);
   for things to
  consider during this backup.
 
+
+  Note that if the server crashes during the backup it may not be
+  possible to restart until the backup_label file has been
+  manually deleted from the PGDATA directory.
+


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8482601294..01ebfb8d90 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18070,11 +18070,22 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_start_backup. In a non-exclusive backup, the contents of
 the backup_label and tablespace_map are returned
 in the result of the function, and sho

[HACKERS] OpenFile() Permissions Refactor

2017-08-29 Thread David Steele
Hackers,

While working on the patch to allow group reads of $PGDATA I refactored
the various OpenFile() functions to use default/global permissions
rather than permissions defined in each call.

I think the patch stands on its own regardless of whether we accept the
patch to allow group permissions (which won't make this CF).  We had a
couple different ways of defining permissions (e.g. 0600 vs S_IRUSR |
S_IWUSR) and in any case it represented quite a bit of duplication.
This way seems simpler and less error-prone.

I have intentionally not touched the open/fopen/mkdir calls in these
files since they are specialized and require per-instance consideration
(if they are changed at all):

backend/postmaster/fork_process.c
backend/postmaster/postmaster.c
backend/utils/error/elog.c
backend/utils/init/miscinit.c

All tests pass but it's possible that I've missed something or changed
something that shouldn't be changed.

I'll submit the patch to the 2017-09 CF.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index fa409d72b7..3ab1fd2db4 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1869,8 +1869,7 @@ qtext_store(const char *query, int query_len,
*query_offset = off;
 
/* Now write the data into the successfully-reserved part of the file */
-   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY,
-  S_IRUSR | S_IWUSR);
+   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY);
if (fd < 0)
goto error;
 
@@ -1934,7 +1933,7 @@ qtext_load_file(Size *buffer_size)
int fd;
struct stat stat;
 
-   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY, 0);
+   fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY);
if (fd < 0)
{
if (errno != ENOENT)
diff --git a/src/backend/access/heap/rewriteheap.c 
b/src/backend/access/heap/rewriteheap.c
index bd560e47e1..f93c194e18 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1013,8 +1013,7 @@ logical_rewrite_log_mapping(RewriteState state, 
TransactionId xid,
src->off = 0;
memcpy(src->path, path, sizeof(path));
src->vfd = PathNameOpenFile(path,
-   O_CREAT 
| O_EXCL | O_WRONLY | PG_BINARY,
-   S_IRUSR 
| S_IWUSR);
+   O_CREAT 
| O_EXCL | O_WRONLY | PG_BINARY);
if (src->vfd < 0)
ereport(ERROR,
(errcode_for_file_access(),
@@ -1133,8 +1132,7 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
 xlrec->mapped_xid, XLogRecGetXid(r));
 
fd = OpenTransientFile(path,
-  O_CREAT | O_WRONLY | 
PG_BINARY,
-  S_IRUSR | S_IWUSR);
+  O_CREAT | O_WRONLY | 
PG_BINARY);
if (fd < 0)
ereport(ERROR,
(errcode_for_file_access(),
@@ -1258,7 +1256,7 @@ CheckPointLogicalRewriteHeap(void)
}
else
{
-   int fd = OpenTransientFile(path, 
O_RDONLY | PG_BINARY, 0);
+   int fd = OpenTransientFile(path, 
O_RDONLY | PG_BINARY);
 
/*
 * The file cannot vanish due to concurrency since this 
function
diff --git a/src/backend/access/transam/slru.c 
b/src/backend/access/transam/slru.c
index 77edc51e1c..9dd77190ec 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -599,7 +599,7 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
 
SlruFileName(ctl, path, segno);
 
-   fd = OpenTransientFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
+   fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
if (fd < 0)
{
/* expected: file doesn't exist */
@@ -654,7 +654,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 * SlruPhysicalWritePage).  Hence, if we are InRecovery, allow the case
 * where the file doesn't exist, and return zeroes instead.
 */
-   fd = OpenTransientFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
+   fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
if (fd < 0)
{
if (errno != ENOENT || !InRecovery)
@@ -804,8 +804,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, 
SlruFlush 

Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-29 Thread David Steele
Hi Robert,

On 8/25/17 4:03 PM, David Steele wrote:
> On 8/25/17 3:26 PM, Robert Haas wrote:
>> On Fri, Aug 25, 2017 at 3:21 PM, David Steele <da...@pgmasters.net>
>> wrote:
>>> No problem.  I'll base it on your commit to capture any changes you
>>> made.
>>
>> Thanks, but you incorporated everything I wanted in response to my
>> first review -- so I didn't tweak it any further.
> 
> Thank you for committing that.  I'll get the 9.6 patch to you early next
> week.

Attached is the 9.6 patch.  It required a bit more work in func.sgml
than I was expecting so have a close look at that.  The rest was mostly
removing irrelevant hunks.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 03c0dbf1cd..456f6f2c98 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -889,8 +889,11 @@ SELECT pg_start_backup('label', false, false);
 
 SELECT * FROM pg_stop_backup(false);
 
- This terminates the backup mode and performs an automatic switch to
- the next WAL segment.  The reason for the switch is to arrange for
+ This terminates backup mode. On a primary, it also performs an automatic
+ switch to the next WAL segment.  On a standby, it is not possible to
+ automatically switch WAL segments, so you may wish to run
+ pg_switch_wal on the primary to perform a manual
+ switch. The reason for the switch is to arrange for
  the last WAL segment file written during the backup interval to be
  ready to archive.
 
@@ -908,7 +911,7 @@ SELECT * FROM pg_stop_backup(false);
  Once the WAL segment files active during the backup are archived, you are
  done.  The file identified by pg_stop_backup's first return
  value is the last segment that is required to form a complete set of
- backup files.  If archive_mode is enabled,
+ backup files.  On a primary, if archive_mode is enabled,
  pg_stop_backup does not return until the last segment has
  been archived.
  Archiving of these files happens automatically since you have
@@ -924,6 +927,13 @@ SELECT * FROM pg_stop_backup(false);
  pg_stop_backup terminates because of this your backup
  may not be valid.
 
+
+
+ Note that on a standby pg_stop_backup does not wait for
+ WAL segments to be archived so the backup process must ensure that all WAL
+ segments required for the backup are successfully archived.
+
+

   
 
@@ -932,9 +942,9 @@ SELECT * FROM pg_stop_backup(false);
 Making an exclusive low level backup
 
  The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps. It does not allow
- more than one concurrent backup to run, and there can be some issues on
- the server if it crashes during the backup. Prior to PostgreSQL 9.6, this
+ non-exclusive one, but it differs in a few key steps. This type of backup
+ can only be taken on a primary and does not allow concurrent backups.
+ Prior to PostgreSQL 9.6, this
  was the only low-level method available, but it is now recommended that
  all users upgrade their scripts to use non-exclusive backups if possible.
 
@@ -992,6 +1002,11 @@ SELECT pg_start_backup('label', true);
   for things to
  consider during this backup.
 
+
+  Note that if the server crashes during the backup it may not be
+  possible to restart until the backup_label file has been
+  manually deleted from the PGDATA directory.
+


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8482601294..a803e747b2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18070,11 +18070,18 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_start_backup. In a non-exclusive backup, the contents of
 the backup_label and tablespace_map are returned
 in the result of the function, and should be written to files in the
-backup (and not in the data directory).
+backup (and not in the data directory).  When executed on a primary
+pg_stop_backup will wait for WAL to be archived when archiving
+is enabled.  On a standby pg_stop_backup will return
+immediately without waiting so it's important to verify that all required
+WAL segments have been archived. If write activity on the primary is low, 
it
+may be useful to run pg_switch_wal on the primary in order to
+trigger an immediate segment switch of the last required WAL.

 

-The function also creates a backup history file in the transaction log
+When executed on a primary, the function also creates a backup history file
+in the write-ahead log
 archive area. The history file includes the label given to
 pg_start_backup, the starting and ending transaction log 
locations for
 the backup, and the starting and ending times of the b

Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-25 Thread David Steele

On 8/25/17 3:26 PM, Robert Haas wrote:

On Fri, Aug 25, 2017 at 3:21 PM, David Steele <da...@pgmasters.net> wrote:

No problem.  I'll base it on your commit to capture any changes you made.


Thanks, but you incorporated everything I wanted in response to my
first review -- so I didn't tweak it any further.


Thank you for committing that.  I'll get the 9.6 patch to you early next 
week.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-25 Thread David Steele
On 8/25/17 3:13 PM, Robert Haas wrote:
> On Fri, Aug 25, 2017 at 3:10 PM, David Steele <da...@pgmasters.net> wrote:
>>
>> Robert said he would commit this so I expect he'll do that if he doesn't
>> have any objections to the changes.
>>
>> Robert, if you would prefer me to submit this to the CF I'm happy to do so.
> 
> Ha, this note arrived just as I was working on getting this committed.
> I'll commit this to 11 and 10 presently; can you produce a version for
> 9.6?

No problem.  I'll base it on your commit to capture any changes you made.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-25 Thread David Steele
On 8/24/17 7:36 PM, Michael Paquier wrote:
> 
> True as well. The patch looks good to me. If a committer does not show
> up soon, it may be better to register that in the CF and wait. I am
> not sure that adding an open item is suited, as docs have the same
> problem on 9.6.

Robert said he would commit this so I expect he'll do that if he doesn't
have any objections to the changes.

Robert, if you would prefer me to submit this to the CF I'm happy to do so.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-24 Thread David Steele
Hi Michael,

Thanks for reviewing!  Sorry for the late response, those eclipses don't
just chase themselves...

On 8/20/17 10:22 PM, Michael Paquier wrote:
> On Fri, Aug 18, 2017 at 3:35 AM, David Steele <da...@pgmasters.net> wrote:
> 
> + Prior to PostgreSQL 9.6, this
> Markup ?

Fixed.

> +  Note well that if the server crashes during the backup it may not be
> +  possible to restart until the backup_label file has been
> +  manually deleted from the PGDATA directory.
> Missing a markup  here for PGDATA.

Fixed.

> s/Note well/Note as well/, no?

This was a literal translation of nota bene but I've changed it to
simply "Note" as that seems common in the docs.

> - This function, when called on a primary, terminates the backup mode and
> + This function terminates backup mode and
>   performs an automatic switch to the next WAL segment. The reason for the
>   switch is to arrange for the last WAL segment written during the backup
> - interval to be ready to archive.  When called on a standby, this 
> function
> - only terminates backup mode.  A subsequent WAL segment switch will be
> - needed in order to ensure that all WAL files needed to restore the 
> backup
> - can be archived; if the primary does not have sufficient write activity
> - to trigger one, pg_switch_wal should be executed on
> - the primary.
> + interval to be ready to archive.
> pg_stop_backup() still waits for the last WAL segment to be archived
> on the primary. Perhaps we'd want to mention that?

That's detailed in the next paragraph, ISTM.

> Documentation does not state yet that the use of low-level APIs for
> exclusive backups are not supported on standbys.

The first paragraph of the exclusive section states, "this type of
backup can only be taken on a primary".

> Now in the docs:
>  If the backup process monitors and ensures that all WAL segment files
>  required for the backup are successfully archived then the second
>  parameter (which defaults to true) can be set to false to have
> I would recommend adding some details here and mention
> "wait_for_archive" instead of "second parameter". 

Done.

> I am wondering as
> well if this paragraph should be put in red with a warning or
> something like that. This is really, really important to ensure
> consistent backups!

Maybe, but this logic could easily apply to a lot of sections in the
backup docs.  I'm not sure where it would end.

Thanks!
-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0e7c6e2051..95aeb35507 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -889,8 +889,11 @@ SELECT pg_start_backup('label', false, false);
 
 SELECT * FROM pg_stop_backup(false, true);
 
- This terminates the backup mode and performs an automatic switch to
- the next WAL segment.  The reason for the switch is to arrange for
+ This terminates backup mode. On a primary, it also performs an automatic
+ switch to the next WAL segment.  On a standby, it is not possible to
+ automatically switch WAL segments, so you may wish to run
+ pg_switch_wal on the primary to perform a manual
+ switch. The reason for the switch is to arrange for
  the last WAL segment file written during the backup interval to be
  ready to archive.
 
@@ -908,9 +911,12 @@ SELECT * FROM pg_stop_backup(false, true);
  Once the WAL segment files active during the backup are archived, you are
  done.  The file identified by pg_stop_backup's first return
  value is the last segment that is required to form a complete set of
- backup files.  If archive_mode is enabled,
+ backup files.  On a primary, if archive_mode is enabled and 
the
+ wait_for_archive parameter is true,
  pg_stop_backup does not return until the last segment has
  been archived.
+ On a standby, archive_mode must be always in order
+ for pg_stop_backup to wait.
  Archiving of these files happens automatically since you have
  already configured archive_command. In most cases this
  happens quickly, but you are advised to monitor your archive
@@ -926,8 +932,9 @@ SELECT * FROM pg_stop_backup(false, true);
 
 
  If the backup process monitors and ensures that all WAL segment files
- required for the backup are successfully archived then the second
- parameter (which defaults to true) can be set to false to have
+ required for the backup are successfully archived then the
+ wait_for_archive parameter (which defaults to true) can be set
+ to false to have
  pg_stop_backup return as soon as the stop backup record is
  written to the WAL.  By default, pg_stop_backup will wait
  until all WAL has been archived, which can take some time.  Thi

Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-18 Thread David Steele
On 8/18/17 3:00 PM, Robert Haas wrote:
> 
> If you update the patch I'll apply it to 11 and 10.

Attached is the updated patch.

I didn't like the vague "there can be some issues on the server if it
crashes during the backup" so I added a new paragraph at the appropriate
step to give a more detailed explanation of the problem.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0e7c6e2051..76f81975f1 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -889,8 +889,11 @@ SELECT pg_start_backup('label', false, false);
 
 SELECT * FROM pg_stop_backup(false, true);
 
- This terminates the backup mode and performs an automatic switch to
- the next WAL segment.  The reason for the switch is to arrange for
+ This terminates backup mode. On a primary, it also performs an automatic
+ switch to the next WAL segment.  On a standby, it is not possible to
+ automatically switch WAL segments, so you may wish to run
+ pg_switch_wal on the primary to perform a manual
+ switch. The reason for the switch is to arrange for
  the last WAL segment file written during the backup interval to be
  ready to archive.
 
@@ -908,9 +911,12 @@ SELECT * FROM pg_stop_backup(false, true);
  Once the WAL segment files active during the backup are archived, you are
  done.  The file identified by pg_stop_backup's first return
  value is the last segment that is required to form a complete set of
- backup files.  If archive_mode is enabled,
+ backup files.  On a primary, if archive_mode is enabled and 
the
+ wait_for_archive parameter is true,
  pg_stop_backup does not return until the last segment has
  been archived.
+ On a standby, archive_mode must be always in order
+ for pg_stop_backup to wait.
  Archiving of these files happens automatically since you have
  already configured archive_command. In most cases this
  happens quickly, but you are advised to monitor your archive
@@ -943,9 +949,9 @@ SELECT * FROM pg_stop_backup(false, true);
 Making an exclusive low level backup
 
  The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps. It does not allow
- more than one concurrent backup to run, and there can be some issues on
- the server if it crashes during the backup. Prior to PostgreSQL 9.6, this
+ non-exclusive one, but it differs in a few key steps. This type of backup
+ can only be taken on a primary and does not allow concurrent backups.
+ Prior to PostgreSQL 9.6, this
  was the only low-level method available, but it is now recommended that
  all users upgrade their scripts to use non-exclusive backups if possible.
 
@@ -1003,6 +1009,11 @@ SELECT pg_start_backup('label', true);
   for things to
  consider during this backup.
 
+
+  Note well that if the server crashes during the backup it may not be
+  possible to restart until the backup_label file has been
+  manually deleted from the PGDATA directory.
+


 
@@ -1012,15 +1023,10 @@ SELECT pg_start_backup('label', true);
 
 SELECT pg_stop_backup();
 
- This function, when called on a primary, terminates the backup mode and
+ This function terminates backup mode and
  performs an automatic switch to the next WAL segment. The reason for the
  switch is to arrange for the last WAL segment written during the backup
- interval to be ready to archive.  When called on a standby, this function
- only terminates backup mode.  A subsequent WAL segment switch will be
- needed in order to ensure that all WAL files needed to restore the backup
- can be archived; if the primary does not have sufficient write activity
- to trigger one, pg_switch_wal should be executed on
- the primary.
+ interval to be ready to archive.
 


diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b43ec30a4e..28eda97273 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18606,7 +18606,8 @@ postgres=# select pg_start_backup('label_goes_here');

 

-The function also creates a backup history file in the write-ahead log
+When executed on a primary, the function also creates a backup history file
+in the write-ahead log
 archive area. The history file includes the label given to
 pg_start_backup, the starting and ending write-ahead log 
locations for
 the backup, and the starting and ending times of the backup.  The return

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-18 Thread David Steele
Robert,

Thanks for reviewing!

On 8/18/17 2:45 PM, Robert Haas wrote:
> - the next WAL segment.  The reason for the switch is to arrange for
> + the next WAL segment when run on a primary.  On a standby you can call
> + pg_switch_wal on the primary to perform a manual
> + switch.
> + The reason for the switch is to arrange for
> 
> Tacking on "when run on a primary" onto the end of the existing
> sentence is a little ambiguous: does that clause apply only to the
> last part, or to the whole sentence?  I suggest something like: This
> terminates the backup mode.  On a primary, it also performs an
> automatic switch to the next WAL segment.  On a standby, it is not
> possible to automatically switch WAL segments, so you may wish to
> consider running pg_switch_wal on the primary to
> perform a manual switch.

Looks good.

> 
> -Making an exclusive low level backup
> +Making an exclusive low level backup on a primary
> 
> I'd omit this hunk.

OK, but I was trying to make it very clear that this backup method only
works on a primary.  If you think the text is in the first paragraph is
enough then I'm willing to go with that, though.

> - more than one concurrent backup to run, and there can be some issues on
> + more than one concurrent backup to run, must be run on a
> primary, and there
> + can be some issues on
> 
> Maybe this would be clearer: This type of backup can only be taken on
> a primary, does not allow more than one ...

Looks good.

> - This function, when called on a primary, terminates the backup mode and
> + This function terminates the backup mode and
>   performs an automatic switch to the next WAL segment. The reason for the
>   switch is to arrange for the last WAL segment written during the backup
> - interval to be ready to archive.  When called on a standby, this 
> function
> - only terminates backup mode.  A subsequent WAL segment switch will be
> - needed in order to ensure that all WAL files needed to restore the 
> backup
> - can be archived; if the primary does not have sufficient write activity
> - to trigger one, pg_switch_wal should be executed on
> - the primary.
> + interval to be ready to archive.
> 
> Why do you want to delete all that text?  It seems like good text to me.

Since the exclusive method only works on a primary...

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Update low-level backup documentation to match actual behavior

2017-08-17 Thread David Steele
As discussed in [1] our low-level backup documentation does not quite
match the actual behavior of the functions on primary vs. standby.
Since it appears we have decided that the remaining behavioral
differences after 52f8a59dd953c68 are bugs in the documentation, the
attached is a first pass at bringing the documentation up to date.

The biggest change is to recognize that exclusive backups can only be
run on a primary and to adjust the text accordingly.  Also, I did not
mention the wait_for_archive param in the exclusive instructions since
they are deprecated.

This patch should be sufficient for 10/11 but will need some minor
changes for 9.6 to remove the reference to wait_for_archive.  Note that
this patch ignores Michael's patch [2] to create WAL history files on a
standby as this will likely only be applied to master.

In addition, I have formatted the text to produce minimal diffs for
review, but it could be tightened up before commit.

-- 
-David
da...@pgmasters.net

[1]
https://www.postgresql.org/message-id/20170814152816.GF4628%40tamriel.snowman.net
[2]
https://www.postgresql.org/message-id/CAB7nPqQvVpMsqJExSVXHUwpXFRwojsb-jb4BYnxEQbjJLfw-yQ%40mail.gmail.com
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0e7c6e2051..cfffea3919 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -890,7 +890,10 @@ SELECT pg_start_backup('label', false, false);
 SELECT * FROM pg_stop_backup(false, true);
 
  This terminates the backup mode and performs an automatic switch to
- the next WAL segment.  The reason for the switch is to arrange for
+ the next WAL segment when run on a primary.  On a standby you can call
+ pg_switch_wal on the primary to perform a manual
+ switch.
+ The reason for the switch is to arrange for
  the last WAL segment file written during the backup interval to be
  ready to archive.
 
@@ -908,9 +911,12 @@ SELECT * FROM pg_stop_backup(false, true);
  Once the WAL segment files active during the backup are archived, you are
  done.  The file identified by pg_stop_backup's first return
  value is the last segment that is required to form a complete set of
- backup files.  If archive_mode is enabled,
+ backup files.  On a primary, if archive_mode is enabled and 
the
+ wait_for_archive parameter is true,
  pg_stop_backup does not return until the last segment has
  been archived.
+ On a standby, archive_mode must be always in order
+ for pg_stop_backup to wait.
  Archiving of these files happens automatically since you have
  already configured archive_command. In most cases this
  happens quickly, but you are advised to monitor your archive
@@ -940,11 +946,12 @@ SELECT * FROM pg_stop_backup(false, true);
 


-Making an exclusive low level backup
+Making an exclusive low level backup on a primary
 
  The process for an exclusive backup is mostly the same as for a
  non-exclusive one, but it differs in a few key steps. It does not allow
- more than one concurrent backup to run, and there can be some issues on
+ more than one concurrent backup to run, must be run on a primary, and 
there
+ can be some issues on
  the server if it crashes during the backup. Prior to PostgreSQL 9.6, this
  was the only low-level method available, but it is now recommended that
  all users upgrade their scripts to use non-exclusive backups if possible.
@@ -1012,15 +1019,10 @@ SELECT pg_start_backup('label', true);
 
 SELECT pg_stop_backup();
 
- This function, when called on a primary, terminates the backup mode and
+ This function terminates the backup mode and
  performs an automatic switch to the next WAL segment. The reason for the
  switch is to arrange for the last WAL segment written during the backup
- interval to be ready to archive.  When called on a standby, this function
- only terminates backup mode.  A subsequent WAL segment switch will be
- needed in order to ensure that all WAL files needed to restore the backup
- can be archived; if the primary does not have sufficient write activity
- to trigger one, pg_switch_wal should be executed on
- the primary.
+ interval to be ready to archive.
 


diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b43ec30a4e..28eda97273 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18606,7 +18606,8 @@ postgres=# select pg_start_backup('label_goes_here');

 

-The function also creates a backup history file in the write-ahead log
+When executed on a primary, the function also creates a backup history file
+in the write-ahead log
 archive area. The history file includes the label given to
 pg_start_backup, the starting and ending write-ahead log 
locations for
 the backup, and the starting and ending times of the backup.  The return

-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread David Steele

On 7/24/17 3:28 PM, David Steele wrote:


Yes, and this is another behavioral change to consider -- one that is 
more likely to impact users than the change to pg_stop_backup().  If 
pg_basebackup is not currently waiting for WAL on a standby (but does on 
a primary) then that's pretty serious.


My bad, before PG10 pg_basebackup did not check that WAL was archived on 
a primary *or* a standby unless the -x option was specified, as documented.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread David Steele

On 7/24/17 12:28 PM, Stephen Frost wrote:


* David Steele (da...@pgmasters.net) wrote:


While this patch brings pg_stop_backup() in line with the
documentation, it also introduces a behavioral change compared to
9.6.  Currently, the default 9.6 behavior on a standby is to return
immediately from pg_stop_backup(), but this patch will cause it to
block by default instead.  Since action on the master may be
required to unblock the process, I see this as a pretty significant
change.  I still think we should fix and backpatch, but I wanted to
point this out.


This will need to be highlighted in the release notes for 9.6.4 also,
assuming there is agreement to back-patch this, and we'll need to update
the documentation in 9.6 also to be clearer about what happens on a
standby.


Agreed.


"If the WAL segments required for this backup have not been archived
then it might be necessary to force a segment switch on the
primary."


I'm a bit on the fence regarding the distinction here for the
backup-from-standby and this errhint().  The issue is that the WAL for
the backup hasn't been archived yet and that may be because of either:

archive_command is failing
OR
No WAL is getting generated

In either case, both of these apply to the primary and the standby.  As
such, I'm inclined to try to mention both in the original errhint()
instead of making two different ones.  I'm open to suggestions on this,
of course, but my thinking would be more like:

-
Either archive_command is failing or not enough WAL has been generated
to require a segment switch.  Run pg_switch_wal() to request a WAL
switch and monitor your logs to check that your archive_command is
executing properly.
-


Yes, that seems more concise.  I like the idea of not having to maintain 
two separate hints.



And then change pg_switch_wal()'s errhint for when it's run on a replica
to be:


HINT: WAL control functions cannont be executed during recovery; they
should be executed on the primary instead.



Looks good to me.  This explanation is useful in general.


2) At backup.sgml L1015 it says that pg_stop_backup() will
automatically switch the WAL segment.  There should be a caveat here
for standby backups, like:

When called on a master it terminates the backup mode and performs
an automatic switch to the next WAL segment.  The reason for the
switch is to arrange for the last WAL segment written during the
backup interval to be ready to archive.  When called on a standby
the WAL segment switch must be performed manually on the master if
it does not happen due to normal write activity.


s/master/primary/g


Yes.


3) The fact that this fix requires "archive_mode = always" seems
like a pretty big caveat, thought I don't have any ideas on how to
make it better without big code changes.  Maybe it would be help to
change:

+ the backup is taken on a standby, pg_stop_backup waits
+ for WAL to be archived when archive_mode

To:

+ the backup is taken on a standby, pg_stop_backup waits
+ for WAL to be archived *only* when archive_mode


I'm thinking of rewording that a bit to say "When archive_mode = always"
instead, as I think that might be a little clearer.


Sure.


Perhaps this should be noted in the pg_basebackup docs as well?


That seems like it's probably a good idea too, as people might wonder
why pg_basebackup hasn't finished yet if it's waiting for WAL to be
archived.


Yes, and this is another behavioral change to consider -- one that is 
more likely to impact users than the change to pg_stop_backup().  If 
pg_basebackup is not currently waiting for WAL on a standby (but does on 
a primary) then that's pretty serious.


Any thoughts on this, Magnus?

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-24 Thread David Steele

On 7/23/17 11:48 PM, Masahiko Sawada wrote:

On Sat, Jul 22, 2017 at 8:04 AM, Stephen Frost  wrote:


I started discussing this with David off-list and he'd like a chance to
review it in a bit more detail (he's just returning from being gone for
a few weeks).  That review will be posted to this thread on Monday, and
I'll wait until then to move forward with the patch.


Before reviewing the patch, I would note that it looks like this issue 
was introduced in b6a323a8c before the 9.6 release.  The documentation 
states that the pg_stop_backup() function will wait for all required WAL 
to be archived, which corresponds to the default in the new patch of 
waitforarchive = true.  The commit notes that the documentation needs 
updating, but since that didn't happen I think we should consider this a 
bug in 9.6 and back patch.


While this patch brings pg_stop_backup() in line with the documentation, 
it also introduces a behavioral change compared to 9.6.  Currently, the 
default 9.6 behavior on a standby is to return immediately from 
pg_stop_backup(), but this patch will cause it to block by default 
instead.  Since action on the master may be required to unblock the 
process, I see this as a pretty significant change.  I still think we 
should fix and backpatch, but I wanted to point this out.


The patch looks sensible to me.  A few comments:

1) I would change:

"Check if the WAL segment needed for this backup have been completed, in 
which case a forced segment switch may be needed on the primary."


To something like:

"If the WAL segments required for this backup have not been archived 
then it might be necessary to force a segment switch on the primary."


2) At backup.sgml L1015 it says that pg_stop_backup() will automatically 
switch the WAL segment.  There should be a caveat here for standby 
backups, like:


When called on a master it terminates the backup mode and performs an 
automatic switch to the next WAL segment.  The reason for the switch is 
to arrange for the last WAL segment written during the backup interval 
to be ready to archive.  When called on a standby the WAL segment switch 
must be performed manually on the master if it does not happen due to 
normal write activity.


3) The fact that this fix requires "archive_mode = always" seems like a 
pretty big caveat, thought I don't have any ideas on how to make it 
better without big code changes.  Maybe it would be help to change:


+ the backup is taken on a standby, pg_stop_backup waits
+ for WAL to be archived when archive_mode

To:

+ the backup is taken on a standby, pg_stop_backup waits
+ for WAL to be archived *only* when archive_mode

Perhaps this should be noted in the pg_basebackup docs as well?

Regards,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-05-09 Thread David Steele

On 5/9/17 10:00 AM, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:


#2: Rename all these functions and columns to "lsn", as in this patch.


+1


<...>


#2 strikes me as the best option, though that's probably not much of a
surprise to anyone whose been following my comments on this thread.


+1.  If anything this analysis make me more convinced it is the right 
thing to do.


If I read this correctly, we won't change the names of any functions 
that we haven't *already* changed the names of, and only one view would 
be changed to bring it into line with the rest.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] renaming "transaction log"

2017-05-03 Thread David Steele
On 5/2/17 9:09 PM, Peter Eisentraut wrote:
> Most documentation and error messages still uses the term "transaction
> log" to refer to the write-ahead log.  Here is a patch to rename that,
> which I think should be done, to match the xlog -> wal renaming in APIs.

+1 for the idea.

The documentation changes look good to me, but the error message changes
are a random mix of "WAL" and "write-ahead log", even when referring to
the same thing.  For example:

- errmsg("could not open transaction log directory \"%s\": %m",
+ errmsg("could not open write-ahead log directory \"%s\": %m",

and:

- fprintf(stderr, _("%s: failed to remove transaction log directory\n"),
+ fprintf(stderr, _("%s: failed to remove WAL directory\n"),

It seems like they should be one or the other.  I think "write-ahead
log" is good in the documentation since it is more explanatory, but the
error messages should either be all "write-ahead log" or all "WAL".

Personally I favor "WAL" for brevity in the error messages, but I would
be OK with "write-ahead log", too.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-04-18 Thread David Steele
On 4/15/17 12:56 PM, Robert Haas wrote:
> On Fri, Apr 14, 2017 at 6:52 PM, Tom Lane  wrote:
>> Peter Eisentraut  writes:
>>> If we're talking about making things easier to understand, wouldn't a
>>> random user rather know what a WAL "location" is instead of a WAL "LSN"?
>> I wouldn't object to standardizing on "location" instead of "lsn" in the
>> related function and column names.  What I don't like is using different
>> words for the same thing.
> The case mentioned in the subject of this thread has been using the
> word "location" since time immemorial.  It's true that we've already
> renamed it (xlog -> wal) in this release, so if we want to standardize
> on lsn, now's certainly the time to do it.  I'm worried that
> pg_current_wal_lsn() is an identifier composed almost entirely of
> abbreviations and therefore possibly just as impenetrable as
> qx_current_pfq_dnr(), but maybe we should assume that LSN is a term of
> art with which knowledgeable users are required to be familiar, much
> as we are doing for "WAL".

+1, and I'm in favor of using "lsn" wherever applicable.  It seems to me
that if a user calls a function (or queries a table) that returns an lsn
then they should be aware of what an lsn is.

-- 
-David
da...@pgmasters.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-08 Thread David Steele
On 4/8/17 10:29 AM, Erik Rijkers wrote:
> On 2017-04-08 15:56, Andres Freund wrote:
>> On 2017-04-08 09:51:39 -0400, David Steele wrote:
>>> On 3/2/17 7:54 PM, Petr Jelinek wrote:
>>> >
>>> > Yes the copy patch needs rebase as well. But these ones are fine.
>>>
>>> This bug has been moved to CF 2017-07.
>>
>> FWIW, as these are bug-fixes that need to be backpatched, I do plan to
>> work on them soon.
>>
> 
> CF 2017-07 pertains to postgres 11, is that right?

In general, yes, but bugs will always be fixed as needed.  It doesn't
matter what CF they are in.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 2017-03 CF Closed

2017-04-08 Thread David Steele
The commitfest has officially ended. I have pushed the remaining patches
to the next commitfest or returned then with feedback as appropriate:

Features moved to next CF:
- new plpgsql extra_checks
- Better estimate merging for duplicate vars in clausesel.c
- initdb configurable wal_segment_size
- Vacuum: allow usage of more than 1GB of work mem
- Bug in to_timestamp() [not exactly bug, should be renamed?]
- Push down more UPDATEs/DELETEs in postgres_fdw
- postgres_fdw: support parameterized foreign joins

BUGS moved to next CF:
- snapshot builder fixes
- Fix the optimization to skip WAL-logging on table created in same
transaction
- replace broken GetExistingLocalJoinPath with CreateLocalJoinPath
- Failure at replay for corrupted 2PC files + reduce window between
end-of-recovery record and history file write

Returned with Feedback:
- Indexes with Included Columns

Final stats for the commitfest:

Committed: 116
Moved to next CF: 50
Rejected: 2
Returned with Feedback: 41
Total: 209.

Thank you to all the authors, reviewers, and committers who participated!

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw: support parameterized foreign joins

2017-04-08 Thread David Steele
On 4/7/17 9:54 AM, Arthur Zakirov wrote:
> 
> Marked the patch as "Ready for Commiter". But the patch should be
> commited only after the patch [1].

This submission has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2017-04-08 Thread David Steele
On 3/22/17 6:20 AM, Etsuro Fujita wrote:
> On 2017/02/22 19:57, Rushabh Lathia wrote:
>> Marked this as Ready for Committer.
> 
> I noticed that this item in the CF app was incorrectly marked as
> Committed.  This patch isn't committed, so I returned it to the previous
> status.  I also rebased the patch.  Attached is a new version of the patch.

This submission has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2017-04-08 Thread David Steele
On 2/27/17 4:19 AM, Artur Zakirov wrote:
> On 15.02.2017 15:26, Amul Sul wrote:
>>
>> The new status of this patch is: Ready for Committer
>>
> 
> Thank you for your review!

This submission has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-08 Thread David Steele
On 4/7/17 10:19 PM, Claudio Freire wrote:
> 
> I rebased the early free patch (patch 3) to apply on top of the v9
> patch 2 (it needed some changes). I recognize the early free patch
> didn't get nearly as much scrutiny, so I'm fine with commiting only 2
> if that one's ready to go but 3 isn't.
> 
> If it's decided to go for fixed 128M segments and a binary search of
> segments, I don't think I can get that ready and tested before the
> commitfest ends.

This submission has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-08 Thread David Steele
On 4/7/17 2:59 AM, Beena Emerson wrote:
> I ran tests and following are the details:
> 
> Machine details:
> Architecture:  ppc64le
> Byte Order:Little Endian
> CPU(s):192
> On-line CPU(s) list:   0-191
> Thread(s) per core:8
> Core(s) per socket:1
> Socket(s): 24
> NUMA node(s):  4
> Model: IBM,8286-42A
> 
> clients>  16  32   64  
>  128
> size
> 16MB  18895.63486 28799.48759 37855.39521 27968.88309
> 32MB  18313.1461  29201.44954 40733.80051 32458.74147
> 64 MB18055.73141 30875.28687 42713.54447 38009.60542
> 128MB   18234.31424 33208.65419 48604.5593  45498.27689
> 256MB19524.36498 35740.19032 54686.16898 54060.11168
> 512MB 20351.90719 37426.72174 55045.60719 56194.99349
> 1024MB   19667.67062 35696.19194 53666.60373 54353.0614
> 
> I did not get any degradation, in fact, higher values showed performance
> improvement for higher client count.

This submission has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-04-08 Thread David Steele
On 3/28/17 1:21 AM, Tsunakawa, Takayuki wrote:
> From: Michael Paquier [mailto:michael.paqu...@gmail.com]
>> Okay. I got the message, and I agree with what you say here. You are right
>> by the way, the error messages just use "two-phase file" and not "two-phase
>> STATE file", missed that previously.
>> --
> Thanks, marked as ready for committer, as the code is good and Alexander 
> reported the test success.

This bug has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-04-08 Thread David Steele
On 4/4/17 2:47 PM, Peter Geoghegan wrote:
> 
> At the very least, you should change comments to note the issue. I
> think it's highly unlikely that this could ever result in a failure to
> find a split point, which there are many defenses against already, but
> I think I would find that difficult to prove. The intent of the code
> is almost as important as the code, at least in my opinion.

This submission as been Returned with Feedback.  Please feel free to
resubmit to a future commitfest.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-04-08 Thread David Steele
On 4/7/17 3:24 PM, Robert Haas wrote:
> 
> There's probably more to think about here, but those are my question
> on an initial read-through.

This bug has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-08 Thread David Steele
On 3/2/17 7:54 PM, Petr Jelinek wrote:
> 
> Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making clausesel.c Smarter

2017-04-08 Thread David Steele
On 4/7/17 5:33 PM, Claudio Freire wrote:
> 
> Also, can you add a test case? Cost values could make the test
> fragile, so if that gives you trouble I'll understand, but it'd be
> best to try and test this if possible.

This submission has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql - additional extra checks

2017-04-08 Thread David Steele
> On 1/13/17 6:55 AM, Marko Tiikkaja wrote:
>> On Fri, Jan 13, 2017 at 2:46 AM, Jim Nasby > > wrote:
>>
>> On 1/11/17 5:54 AM, Pavel Stehule wrote:
>>
>> +too_many_rows
>> +
>> + 
>> +  When result is assigned to a variable by
>> INTO clause,
>> +  checks if query returns more than one row. In this case
>> the assignment
>> +  is not deterministic usually - and it can be signal some
>> issues in design.
>>
>>
>> Shouldn't this also apply to
>>
>> var := blah FROM some_table WHERE ...;
>>
>> ?
>>
>> AIUI that's one of the beefs the plpgsql2 project has.
>>
>>
>> No, not at all.  That syntax is undocumented and only works because
>> PL/PgSQL is a hack internally.  We don't use it, and frankly I don't
>> think anyone should.

This submission has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Warn users about duplicate configuration parameters

2017-04-07 Thread David Steele
On 4/7/17 11:22 AM, Tatsuo Ishii wrote:
>>> Recently I've discovered that if there are multiple values of the same
>>> parameter in postgresql.conf PostgreSQL will silently use the last one.
>>> It looks like not the best approach to me. For instance, user can find
>>> the first value in the config file and expect that it will be used, etc.
>>>
>>> I suggest to warn users about duplicated parameters. Here is a
>>> corresponding patch.
>>>
>>> Thoughts?
>>
>> -1 - I frequently just override earlier parameters by adding an include
>> at the end of the file.  Also, with postgresql.auto.conf it's even more
>> common to override parameters.
> 
> -1 from me too by the same reason Andres said.

-1 from me for the same reason.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread David Steele
On 4/6/17 6:52 PM, Tomas Vondra wrote:
> On 04/06/2017 11:45 PM, David Steele wrote:
>>
>> How many people in the field are running custom builds of
>> Postgres?  And of those, how many have changed the WAL segment size?
>> I've never encountered a non-standard segment size or talked to anyone
>> who has.  I'm not saying it has *never* happened but I would venture to
>> say it's rare.
> 
> I agree it's rare, but I don't think that means we can just consider the
> option as 'unsupported'. We're even mentioning it in the docs as a valid
> way to customize granularity of the WAL archival.
> 
> I certainly know people who run custom builds, and some of them run with
> custom WAL segment size. Some of them are our customers, some are not.
> And yes, some of them actually patched the code to allow 256MB WAL
> segments.

I feel a little better knowing that *somebody* is doing it in the field.

>> Just because we don't change the default doesn't mean that others won't.
>>  I still think testing for sizes other than 16MB is severely lacking and
>> I don't believe caveat emptor is the way to go.
> 
> Aren't you mixing regression and performance testing? I agree we need to
> be sure all segment sizes are handled correctly, no argument here.

Not intentionally.  Our standard test suite is only regression as far as
I can see.  All the performance testing I've seen has been done ad hoc.

>> I don't have plans to add animals.  I think we'd need a way to tell
>> 'make check' to use a different segment size for tests and then
>> hopefully reconfigure some of the existing animals.
> 
> OK. My point was that we don't have that capability now, and the latest
> patch is not adding it either.

Agreed.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread David Steele
On 4/6/17 5:05 PM, Tomas Vondra wrote:
> On 04/06/2017 08:33 PM, David Steele wrote:
>> On 4/5/17 7:29 AM, Simon Riggs wrote:
>>
>>> 2. It's not clear to me the advantage of being able to pick varying
>>> filesizes. I see great disadvantage in having too many options, which
>>> greatly increases the chance of incompatibility, annoyance and
>>> breakage. I favour a small number of values that have been shown by
>>> testing to be sweet spots in performance and usability. (1GB has been
>>> suggested)
>>
>> I'm in favor of 16,64,256,1024.
>>
> 
> I don't see a particular reason for this, TBH. The sweet spots will be
> likely dependent hardware / OS configuration etc. Assuming there
> actually are sweet spots - no one demonstrated that yet.

Fair enough, but my feeling is that this patch has never been about
server performance, per se.  Rather, is is about archive management and
trying to stem the tide of WAL as servers get bigger and busier.
Generally, archive commands have to make a remote connection to offload
WAL and that has a cost per segment.

> Also, I don't see how supporting additional WAL sizes increases chance
> of incompatibility. We already allow that, so either the tools (e.g.
> backup solutions) assume WAL segments are always 16MB (in which case are
> essentially broken) or support valid file sizes (in which case they
> should have no issues with the new ones).

I don't see how a compile-time option counts as "supporting that" in
practice.  How many people in the field are running custom builds of
Postgres?  And of those, how many have changed the WAL segment size?
I've never encountered a non-standard segment size or talked to anyone
who has.  I'm not saying it has *never* happened but I would venture to
say it's rare.

> If we're going to do this, I'm in favor of deciding some reasonable
> upper limit (say, 1GB or 2GB sounds good), and allowing all 2^n values
> up to that limit.

I'm OK with that.  I'm also OK with providing a few reasonable choices.
I guess that means I'll just go with the majority opinion.

>>> 3. New file allocation has been a problem raised with this patch for
>>> some months now.
>>
>> I've been playing around with this and I don't think short tests show
>> larger sizes off to advantage.  Larger segments will definitely perform
>> more poorly until Postgres starts recycling WAL.  Once that happens I
>> think performance differences should be negligible, though of course
>> this needs to be verified with longer-running tests.
>>
> I'm willing to do some extensive performance testing on the patch. I
> don't see how that could happen in the next few day (before the feature
> freeze), particularly considering we're interested in long tests.

Cool.  I've been thinking about how to do some meaningful tests for this
(mostly pgbench related).  I'd like to hear what you are thinking.

> The question however is whether we need to do this testing when we don't
> actually change the default (at least the patch submitted on 3/27 does
> seem to keep the 16MB). I assume people specifying a custom value when
> calling initdb are expected to know what they are doing (and I don't see
> how we can prevent distros from choosing a bad value in their packages -
> they could already do that with configure-time option).

Just because we don't change the default doesn't mean that others won't.
 I still think testing for sizes other than 16MB is severely lacking and
I don't believe caveat emptor is the way to go.

> Do we actually have any infrastructure for that? Or do you plan to add
> some new animals with different WAL segment sizes?

I don't have plans to add animals.  I think we'd need a way to tell
'make check' to use a different segment size for tests and then
hopefully reconfigure some of the existing animals.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread David Steele
On 4/5/17 7:29 AM, Simon Riggs wrote:
> On 5 April 2017 at 06:04, Beena Emerson  wrote:
>>
>> The  WALfilename - LSN mapping disruption for higher values you mean? Is
>> there anything else I have missed?
> 
> I see various issues raised but not properly addressed
> 
> 1. we would need to drop support for segment sizes < 16MB unless we
> adopt a new incompatible filename format.
> I think at 16MB the naming should be the same as now and that
> WALfilename -> LSN is very important.
> For this release, I think we should just allow >= 16MB and avoid the
> issue thru lack of time.

+1.

> 2. It's not clear to me the advantage of being able to pick varying
> filesizes. I see great disadvantage in having too many options, which
> greatly increases the chance of incompatibility, annoyance and
> breakage. I favour a small number of values that have been shown by
> testing to be sweet spots in performance and usability. (1GB has been
> suggested)

I'm in favor of 16,64,256,1024.

> 3. New file allocation has been a problem raised with this patch for
> some months now.

I've been playing around with this and I don't think short tests show
larger sizes off to advantage.  Larger segments will definitely perform
more poorly until Postgres starts recycling WAL.  Once that happens I
think performance differences should be negligible, though of course
this needs to be verified with longer-running tests.

If archive_timeout is set then there will also be additional time taken
to zero out potentially larger unused parts of the segment.  I don't see
this as an issue, however, because if archive_timeout is being triggered
then the system is very likely under lower than usual load.

> Lack of clarity and/or movement on these issues is very, very close to
> getting the patch rejected now. Let's not approach this with the
> viewpoint that I or others want it to be rejected, lets work forwards
> and get some solid changes that will improve the world without causing
> problems.

I would definitely like to see this go in, though I agree with Peter
that we need a lot more testing.  I'd like to see some build farm
animals testing with different sizes at the very least, even if there's
no time to add new tests.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread David Steele
On 4/4/17 12:55 PM, Ashutosh Sharma wrote:
> 
> As I am not seeing any response from Tomas for last 2-3 days and since
> the commit-fest is coming towards end, I have planned to work on the
> review comments that I had given few days back and submit the updated
> patch. PFA new version of patch that takes care of all the review
> comments given by me. I have also ran pgindent on btreefuncs.c file
> and have run some basic test cases. All looks fine to me now!

Awesome, Ashutosh!

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-04-04 Thread David Steele
On 4/4/17 11:42 AM, Stephen Frost wrote:
> * David Steele (da...@pgmasters.net) wrote:
>> On 3/22/17 4:42 PM, Peter Eisentraut wrote:
>>> On 3/22/17 15:14, Stephen Frost wrote:
>>>>> -SELECT * FROM pg_stop_backup(false);
>>>>> +SELECT * FROM pg_stop_backup(false [, true ]);
>>>>>
>>>>> I think that it's better to get rid of "[" and "]" from the above because
>>>>> IMO this should be the command example that users actually can run.
>>>> Using the '[' and ']' are how all of the optional arguments are
>>>> specified in the documentation, see things like current_setting() in our
>>>> existing documentation:
>>>
>>> In the synopsis, but not in concrete examples.
>>
>> Wasn't quite sure what the protocol was in this case, and figured it
>> was easier to subtract than to add.
> 
> Forgot to close this out sooner, apologies.
> 
> This has been committed now.

Thank you, Stephen!

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-04 Thread David Steele
Hi Anastasia,

On 3/13/17 9:14 PM, Peter Geoghegan wrote:
> On Wed, Jan 25, 2017 at 3:11 PM, Peter Geoghegan  wrote:
>> On Wed, Jan 25, 2017 at 3:11 PM, Tom Lane  wrote:
>>> Please.  You might want to hit the existing ones with a separate patch,
>>> but it doesn't much matter; I'd be just as happy with a patch that did
>>> both things.
>>
>> Got it.
> 
> Attached is a patch that does both things at once. The internal
> function tuplesort_gettuple_common() still mentions repeated fetches,
> since that context matters to its internal callers. The various
> external-facing functions have a simpler, stricter contract, as
> discussed.
> 
> I didn't end up adding a line like "copy=FALSE is recommended only
> when the next tuplesort manipulation will be another
> tuplesort_gettupleslot fetch into the same slot", which you suggested.
> When the tuplesort state machine is in any state following
> "performing" a sort, there are very few remaining sane manipulations.
> Just things like skipping or seeking around for tuples, and actually
> ending the tuplesort and releasing resources.

You are signed up to review this patch.  Do you know when you'll have a
chance to do that?

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread David Steele
On 4/4/17 9:43 AM, Robert Haas wrote:
> On Tue, Apr 4, 2017 at 9:32 AM, David Steele <da...@pgmasters.net> wrote:
>> My goal is to help people focus on patches that have a chance.  At this
>> point I think that includes poking authors who are not being responsive
>> using the limited means at my disposal.
> 
> +1.  Pings on specific threads can help clear things up when, for
> example, the author and reviewer are each waiting for the other.  And,
> as you say, they also help avoid the situation where a patch just
> falls off the radar and misses the release for no especially good
> reason, which naturally causes people frustration.
> 
> I think your pings have been quite helpful, although I think it would
> have been better in some cases if you'd done them sooner.  Pinging
> after a week, with a 3 day deadline, when there are only a few days
> left in the CommitFest isn't really leaving a lot of room to maneuver.

Thanks for the feedback!  My thinking is that I don't want to bug people
too soon, but there's a maximum number of days a thread should be idle.
Over the course of the CF that has gone from 10 days, to 7, 5, and 3.

I don't look at all patches every day so it can be a bit uneven, i.e.,
all patches are allowed certain amount of idle time, but some may get a
bit more depending on when I check up on them.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread David Steele
On 4/4/17 9:11 AM, Simon Riggs wrote:
> On 4 April 2017 at 09:05, David Steele <da...@pgmasters.net> wrote:
>> Hi Tomas,
>>
>> On 4/1/17 5:40 AM, Ashutosh Sharma wrote:
>>>
>>> Apart from above comments, your patch looks good to me. I have also
>>> marked this patch as 'Waiting for Author' in the commitfest. Thanks.
>>
>> The CF has been extended until April 7 but time is still growing short
>> and this thread has been idle for three days.  Please respond with a new
>> patch by 2017-04-05 00:00 AoE (UTC-12) or this submission will be marked
>> "Returned with Feedback".
> 
> We know time is growing short. Is it necessary to say that on every
> single thread? What benefit do we get from that?

Hackers has become such a fire hose that I don't take it for granted
that all messages are read by everyone.  This may be news to some
authors, especially those that have not responded on a thread since
March 31.

> We voted to extend the time until the deadline. In some cases we may
> well take it all the way to the deadline, so fighting you or others
> trying to reject things *before* the deadline *because* of the
> deadline seems weird.

My goal is to help people focus on patches that have a chance.  At this
point I think that includes poking authors who are not being responsive
using the limited means at my disposal.

> How about we just leave everything until the deadline, then apply the
> sword swiftly to anything that remains?

I didn't take the extension to mean that I should stop CF management.
However, If you think I have exceeded my mandate over the course of the
CF, then I'm happy to discuss that.

Regards,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CORRESPONDING clause design

2017-04-04 Thread David Steele
On 4/1/17 1:54 PM, Tom Lane wrote:
> 
> I'll set this back to Waiting on Author, but I think the chances of
> getting to a committable patch before the end of the commitfest are
> about nil.

I think this is especially true now that another three days have passed.

This submission has been marked "Returned with Feedback".  Please feel
free to resubmit to a future commitfest.

Regards,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread David Steele
Hi Tomas,

On 4/1/17 5:40 AM, Ashutosh Sharma wrote:
> 
> Apart from above comments, your patch looks good to me. I have also
> marked this patch as 'Waiting for Author' in the commitfest. Thanks.

The CF has been extended until April 7 but time is still growing short
and this thread has been idle for three days.  Please respond with a new
patch by 2017-04-05 00:00 AoE (UTC-12) or this submission will be marked
"Returned with Feedback".

Thank,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we cacheline align PGXACT?

2017-04-04 Thread David Steele
On 4/4/17 8:55 AM, Alexander Korotkov wrote:
> On Mon, Apr 3, 2017 at 9:58 PM, Andres Freund  
> I'm inclined to push this to the next CF, it seems we need a lot more
> benchmarking here.
> 
> 
> No objections.

This submission has been moved to CF 2017-07.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-04 Thread David Steele
On 3/31/17 10:45 AM, David Steele wrote:
> On 3/29/17 8:13 AM, Rahila Syed wrote:
> 
>> Thanks for reporting. I have identified the problem and have a fix.
>> Currently working on allowing
>> adding a partition after default partition if the default partition does
>> not have any conflicting rows.
>> Will update the patch with both of these.
> 
> The CF has been extended but until April 7 but time is still growing
> short.  Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12)
> or this submission will be marked "Returned with Feedback".

This submission has been marked "Returned with Feedback".  Please feel
free to resubmit to a future commitfest.

Regards,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw: support parameterized foreign joins

2017-03-31 Thread David Steele

Hi Arthur.

On 3/23/17 8:45 AM, Etsuro Fujita wrote:

On 2017/03/21 18:38, Etsuro Fujita wrote:

On 2017/03/16 22:23, Arthur Zakirov wrote:

Can you rebase the patch? It is not applied now.



Ok, will do.  Thanks for the report!


Done.  Also, I added regression tests and revised code and comments a
bit.  (As for create_foreignscan_path(), I haven't done anything about
that yet.)  Please find attached a new version created on top of [1].


Are you planning to review this patch?

Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-31 Thread David Steele

On 3/31/17 10:46 AM, Craig Ringer wrote:


Patches 1 and 2 were the key parts and thanks to Robert's helpful
review, advice and edits they're committed now.

Committed, done. Yay.


Excellent.  I have marked this a "Committed" by Robert.

One down...

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-31 Thread David Steele

Hi,

On 3/30/17 2:12 PM, Daniel Verite wrote:

Vaishnavi Prabakaran wrote:


Hmm, With batch mode, after sending COPY command to server(and server
started processing the query and goes into COPY state) , client does not
immediately read the result , but it keeps sending other queries to the
server. By this time, server already encountered the error
scenario(Receiving different message during COPY state) and sent error
messages


IOW, the test intentionally violates the protocol and then all goes wonky
because of that.
That's why I was wondering upthread what's it's supposed to test.
I mean, regression tests are meant to warn against a desirable behavior
being unknowingly changed by new code into an undesirable behavior.
Here we have the undesirable behavior to start with.
What kind of regression could we fear from that?


The CF has been extended until April 7 but time is still growing short. 
Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this 
submission will be marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-31 Thread David Steele

On 3/29/17 11:08 AM, Tomas Vondra wrote:


The attached patch is the best I came up with - it essentially shares
just the tuple-forming part, which is exactly the same in both cases.


I have marked this submission "Needs Review".

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-03-31 Thread David Steele

On 3/29/17 8:13 AM, Rahila Syed wrote:


Thanks for reporting. I have identified the problem and have a fix.
Currently working on allowing
adding a partition after default partition if the default partition does
not have any conflicting rows.
Will update the patch with both of these.


The CF has been extended but until April 7 but time is still growing 
short.  Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) 
or this submission will be marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-31 Thread David Steele

On 3/26/17 7:44 AM, Emre Hasegeli wrote:

If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.


I renamed it with other two.


At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.


I added a sentence about it.


I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.


I copy-pasted expression index support from btcostestimate() as well,
and extended the regression test.

I think this function can use more polishing before committing, but I
don't know where to begin.  There are single functions for every
access method in here.  I don't like them being in there to begin
with.  selfuncs.s is quite long with all kinds of dependencies and
dependents.  I think it would be better to have the access method
selectivity estimation functions under src/access.  Maybe we should
start doing so by moving this to src/access/brin/brin_selfuncs.c.
What do you think?


I have marked this patch "Needs Review".

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-31 Thread David Steele

On 3/25/17 12:12 AM, Peter Eisentraut wrote:

I'm wondering if this is a perl version/platform issue around

$tx->pump until $stdout =~ /[[:digit:]]+[\r\n]$/;

where we're not recognising the required output from psql when we get it.

What's in src/test/recovery/tmp_check/log/regress_log_011* ?

I couldn't use PostgresNode::psql or PostgresNode::safe_psql here
because the session must stay open.


The problem was that psql needs to be called with -X.  Fix committed.


It's not clear to me what remains to be done on this patch.  I feel, at 
the least, that patches 3 and 4 need to be rebased and the status set 
back to "Needs Review".


This thread has been idle for six days.  Please respond with a new patch 
by 2017-04-04 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-31 Thread David Steele

On 3/29/17 2:23 AM, Masahiko Sawada wrote:

On Wed, Mar 29, 2017 at 12:23 AM, David Steele <da...@pgmasters.net> wrote:

On 3/23/17 1:54 AM, Masahiko Sawada wrote:


On Wed, Mar 15, 2017 at 7:51 AM, Peter Geoghegan <p...@bowt.ie> wrote:


On Tue, Mar 14, 2017 at 3:10 PM, Peter Geoghegan <p...@bowt.ie> wrote:


We already have BTPageOpaqueData.btpo, a union whose contained type
varies based on the page being dead. We could just do the same with
some other field in that struct, and then store epoch there. Clearly
nobody really cares about most data that remains on the page. Index
scans just need to be able to land on it to determine that it's dead,
and VACUUM needs to be able to determine whether or not there could
possibly be such an index scan at the time it considers recycling..



ISTM that we need all of the fields within BTPageOpaqueData even for
dead pages, actually. The left links and right links still need to be
sane, and the flag bits are needed. Plus, the field that stores an XID
already is clearly necessary. Even if they weren't needed, it would
probably still be a good idea to keep them around for forensic
purposes. However, the page header field pd_prune_xid is currently
unused for indexes, and is the same width as CheckPoint.nextXidEpoch
(the extra thing we might want to store -- the epoch).

Maybe you could store the epoch within that field when B-Tree VACUUM
deletes a page, and then compare that within _bt_page_recyclable(). It
would come before the existing XID comparison in that function. One
nice thing about this idea is that pd_prune_xid will be all-zero for
index pages from the current format, so there is no need to take
special care to make sure that databases that have undergone
pg_upgrade don't break.



Thank you for the suggestion!
If we store the poch within union field, I think we will not be able
to use BTPageOpaqueData.btpo.xact at the same time. Since comparing
btpo.xact is still necessary to determine if that page is recyclable
we cannot store the epoch into the same union field. And if we store
it into BTPageOpaqueData, it would break disk compatibility.



I have marked this patch "Waiting for Author".

This thread has been idle for five days.  Please respond with a new patch by
2017-03-30 00:00 AoE (UTC-12) or this submission will be marked "Returned
with Feedback".



I was thinking that the status of this patch is still "Needs review"
because I sent latest version patch[1]. We're still under discussion
how safely we can skip to the cleanup vacuum on btree index. That
patch has some restrictions but it would be good for first step.


I've marked this patch as "Needs Review".  Feel free to do that on your 
own if you think I've made a mistake in classifying a patch.


My view is that the patch is stalled and something might be required on 
your part to get it moving again, perhaps trying another approach.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-03-28 Thread David Steele

On 3/24/17 12:51 PM, Ashutosh Sharma wrote:

Hi,

On Fri, Mar 24, 2017 at 9:00 PM, David Steele <da...@pgmasters.net> wrote:

Hi Ashutosh,

On 3/22/17 8:52 AM, Amit Kapila wrote:


On Wed, Mar 22, 2017 at 12:07 AM, David Steele <da...@pgmasters.net>
wrote:



Amit, Magnus, you are signed up as reviewers for this patch.  Do you know
when you'll have a chance to take a look?



I have provided my feedback and I could not test it on my machine.  I
think Ashutosh Sharma has tested it.  I can give it another look, but
not sure if it helps.



I know you are not signed up as a reviewer, but have you take a look at this
patch?


Well, I had a quick look into the patch just because I wanted the test
it as I am having windows setup. But, I never looked into the patch
from reviewer's perspective. The intention was to reverify the test
results shared by Tsunawaka.


Thanks, Ashutosh.

It's not clear to me what state this patch should be in.  Is there more 
review that needs to be done or is it ready for a committer?


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-28 Thread David Steele

Hi Thomas,

On 3/28/17 1:41 AM, Rafia Sabih wrote:

On Mon, Mar 27, 2017 at 12:20 PM, Thomas Munro


I thought this last point about Windows might be fatal to my design,
but it seems that Windows since at least version 2000 has support for
Unixoid unlinkability via the special flag FILE_SHARE_DELETE.


<...>



Please find the attached file for the explain analyse output of these
queries on head as well as patch.
Would be working on analysing the performance of this patch on 300 scale factor.


I have marked this submission "Waiting for Author".  A new patch is 
needed to address Andres' comments and you should have a look at Rafia's 
results.


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-28 Thread David Steele

Hi Pavan,

On 3/28/17 11:04 AM, Robert Haas wrote:

On Mon, Mar 27, 2017 at 10:25 PM, Pavan Deolasee
 wrote:

On Tue, Mar 28, 2017 at 1:59 AM, Robert Haas  wrote:

On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
 wrote:

It's quite hard to say that until we see many more benchmarks. As author
of
the patch, I might have got repetitive with my benchmarks. But I've seen
over 50% improvement in TPS even without chain conversion (6 indexes on
a 12
column table test).


This seems quite mystifying.  What can account for such a large
performance difference in such a pessimal scenario?  It seems to me
that without chain conversion, WARM can only apply to each row once
and therefore no sustained performance improvement should be possible
-- unless rows are regularly being moved to new blocks, in which case
those updates would "reset" the ability to again perform an update.
However, one would hope that most updates get done within a single
block, so that the row-moves-to-new-block case wouldn't happen very
often.


I think you're confusing between update chains that stay within a block vs
HOT/WARM chains. Even when the entire update chain stays within a block, it
can be made up of multiple HOT/WARM chains and each of these chains offer
ability to do one WARM update. So even without chain conversion, every
alternate update will be a WARM update. So the gains are perpetual.


You're right, I had overlooked that.  But then I'm confused: how does
the chain conversion stuff help as much as it does?  You said that you
got a 50% improvement from WARM, because we got to skip half the index
updates.  But then you said with chain conversion you got an
improvement of more like 100%.  However, I would think that on this
workload, chain conversion shouldn't save much.  If you're sweeping
through the database constantly performing updates, the updates ought
to be a lot more frequent than the vacuums.

No?


It appears that a patch is required to address Amit's review.  I have 
marked this as "Waiting for Author".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-28 Thread David Steele

On 3/23/17 1:54 AM, Masahiko Sawada wrote:

On Wed, Mar 15, 2017 at 7:51 AM, Peter Geoghegan  wrote:

On Tue, Mar 14, 2017 at 3:10 PM, Peter Geoghegan  wrote:

We already have BTPageOpaqueData.btpo, a union whose contained type
varies based on the page being dead. We could just do the same with
some other field in that struct, and then store epoch there. Clearly
nobody really cares about most data that remains on the page. Index
scans just need to be able to land on it to determine that it's dead,
and VACUUM needs to be able to determine whether or not there could
possibly be such an index scan at the time it considers recycling..


ISTM that we need all of the fields within BTPageOpaqueData even for
dead pages, actually. The left links and right links still need to be
sane, and the flag bits are needed. Plus, the field that stores an XID
already is clearly necessary. Even if they weren't needed, it would
probably still be a good idea to keep them around for forensic
purposes. However, the page header field pd_prune_xid is currently
unused for indexes, and is the same width as CheckPoint.nextXidEpoch
(the extra thing we might want to store -- the epoch).

Maybe you could store the epoch within that field when B-Tree VACUUM
deletes a page, and then compare that within _bt_page_recyclable(). It
would come before the existing XID comparison in that function. One
nice thing about this idea is that pd_prune_xid will be all-zero for
index pages from the current format, so there is no need to take
special care to make sure that databases that have undergone
pg_upgrade don't break.



Thank you for the suggestion!
If we store the poch within union field, I think we will not be able
to use BTPageOpaqueData.btpo.xact at the same time. Since comparing
btpo.xact is still necessary to determine if that page is recyclable
we cannot store the epoch into the same union field. And if we store
it into BTPageOpaqueData, it would break disk compatibility.


I have marked this patch "Waiting for Author".

This thread has been idle for five days.  Please respond with a new 
patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-28 Thread David Steele

On 3/23/17 11:27 PM, Peter Eisentraut wrote:

On 3/17/17 18:35, Tomas Vondra wrote:

On 03/17/2017 05:23 PM, Peter Eisentraut wrote:

I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).

If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.

If doing it in C, it will be a bit tricky to pass the SRF context
around.  There is no "DirectFunctionCall within SRF context", AFAICT.


Not sure what it has to do with DirectFunctionCall? You want to call the
bytea variant from the existing one? Wouldn't it be easier to simply
define a static function with the shared parts, and pass around the
fctx/fcinfo? Not quite pretty, but should work.


Perhaps what was added in

would actually work here.


This thread has been idle for five days.  Please respond with a new 
patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG

2017-03-28 Thread David Steele

Hi Takeshi,

On 3/23/17 1:33 AM, Haribabu Kommi wrote:


The test patch looks good to me.


This thread has been idle for five days.  Please respond with a new 
patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Incremental sort

2017-03-28 Thread David Steele

Hi Alexander,

On 3/20/17 10:19 AM, Heikki Linnakangas wrote:

On 03/20/2017 11:33 AM, Alexander Korotkov wrote:

Please, find rebased patch in the attachment.


I had a quick look at this.


<...>


According to 'perf', 85% of the CPU time is spent in ExecCopySlot(). To
alleviate that, it might be worthwhile to add a special case for when
the group contains exactly one group, and not put the tuple to the
tuplesort in that case. Or if we cannot ensure that the Incremental Sort
is actually faster, the cost model should probably be smarter, to avoid
picking an incremental sort when it's not a win.


This thread has been idle for over a week.  Please respond with a new 
patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

2017-03-28 Thread David Steele

On 3/24/17 10:50 AM, David Steele wrote:

Hi Pritam,

On 3/17/17 5:41 PM, Pritam Baral wrote:


So sorry. I'm attaching the correct version of the original with this,
in case you want to test the limited implementation, because I still
have to go through Tom's list of suggestions.

BTW, the patch is for applying on top of REL9_6_2, and while I
suspect it may work on master too, I haven't tested it since the
original submission (Feb 23).


Also, I noticed that patch haven't regression tests.  Some mention of
this optimization in docs is also nice to have.  > > -- >
Alexander Korotkov > Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company


This thread has been idle for a week.  Please respond and/or post a new
patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked
"Returned with Feedback".


This submission has been marked "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest.


Regards,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch proposal

2017-03-27 Thread David Steele
On 3/26/17 7:34 PM, Venkata B Nagothi wrote:
> Hi David, 
> 
> On Thu, Mar 23, 2017 at 4:21 AM, David Steele <da...@pgmasters.net
> <mailto:da...@pgmasters.net>> wrote:
> 
> On 3/21/17 8:45 PM, Venkata B Nagothi wrote:
> 
>     On Tue, Mar 21, 2017 at 8:46 AM, David Steele
> <da...@pgmasters.net <mailto:da...@pgmasters.net>
> 
> Unfortunately, I don't think the first patch
> (recoveryStartPoint)
> will work as currently implemented.  The problem I see is
> that the
> new function recoveryStartsHere() depends on pg_control
> containing a
> checkpoint right at the end of the backup.  There's no guarantee
> that this is true, even if pg_control is copied last.  That
> means a
> time, lsn, or xid that occurs somewhere in the middle of the
> backup
> can be selected without complaint from this code depending
> on timing.
> 
> 
> Yes, that is true.  The latest best position, checkpoint
> position, xid
> and timestamp of the restored backup of the backup is shown up
> in the
> pg_controldata, which means, that is the position from which the
> recovery would start.
> 
> 
> Backup recovery starts from the checkpoint in the backup_label, not
> from the checkpoint in pg_control.  The original checkpoint that
> started the backup is generally overwritten in pg_control by the end
> of the backup.
> 
> 
> Yes, I totally agree. My initial intention was to compare the recovery
> target position(s) with the contents in the backup_label, but, then, the
> checks would fails if the recovery is performed without the backup_label
> file. Then, i decided to compare the recovery target positions with the
> contents in the pg_control file.
> 
> 
> Which in-turn means, WALs start getting replayed
> from that position towards --> minimum recovery position (which
> is the
> end backup, which again means applying WALs generated between
> start and
> the end backup) all the way through to  --> recovery target
> position.
> 
> 
> minRecoveryPoint is only used when recovering a backup that was made
> from a standby.  For backups taken on the master, the backup end WAL
> record is used.
> 
> The best start position to check with would the position shown
> up in the
> pg_control file, which is way much better compared to the current
> postgresql behaviour.
> 
> 
> I don't agree, for the reasons given previously.
> 
> 
> As explained above, my intention was to ensure that the recovery start
> positions checks are successfully performed irrespective of the presence
> of the backup_label file.
> 
> I did some analysis before deciding to use pg_controldata's output
> instead of backup_label file contents.
> 
> Comparing the output of the pg_controldata with the contents of
> backup_label contents.
> 
> *Recovery Target LSN*
> 
> START WAL LOCATION (which is 0/9C28) in the backup_label =
> pg_controldata's latest checkpoint's REDO location (Latest
> checkpoint's REDO location:  0/9C28)
> 
> *Recovery Target TIME*
> 
> backup start time in the backup_label (START TIME: 2017-03-26
> 11:55:46 AEDT) = pg_controldata's latest checkpoint time (Time of
> latest checkpoint :  Sun 26 Mar 2017 11:55:46 AM AEDT)
> 
> *Recovery Target XID*
> 
> To begin with backup_label does contain any start XID. So, the only
> option is to depend on pg_controldata's output. 
> After a few quick tests and thorough observation, i do notice that,
> the pg_control file information is copied as it is to the backup
> location at the pg_start_backup. I performed some quick tests by
> running few transactions between pg_start_backup and pg_stop_backup.
> So, i believe, this is ideal start point for WAL replay.
> 
> Am i missing anything here ?

You are making assumptions about the contents of pg_control vs.
backup_label based on trivial tests.  With PG defaults, the backup must
run about five minutes before the values in pg_control and backup_label
will diverge.  Even if pg_control and backup_label do match, those are
the wrong values to use, and will get more incorrect the longer the
backup runs.

I believe there is a correct way to go about this, at least for time and
LSN, and I don't think your very approximate solution will pass muster
with a committer.

Since we are nearly at the end of the CF, I have marked this submission
"Returned with Feedback".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread David Steele

Hi Robert,

On 3/24/17 3:00 PM, Robert Haas wrote:

On Wed, Mar 22, 2017 at 6:05 PM, David Steele <da...@pgmasters.net> wrote:

Wait, really?  I thought you abandoned this approach because there's
then no principled way to handle WAL segments of less than the default
size.


I did say that, but I thought I had hit on a compromise.

But, as I originally pointed out the hex characters in the filename are not
aligned correctly for > 8 bits (< 16MB segments) and using different
alignments just made it less consistent.


I don't think I understand what the compromise is.  Are you saying we
should have one rule for segments < 16MB and another rule for segments

16MB?  I think using two different rules for file naming depending

on the segment size will be a negative for both tool authors and
ordinary users.


Sorry for the confusion, I meant to say that if we want to keep LSNs in 
the filenames and not change alignment for the current default, then we 
would need to drop support for segment sizes < 16MB (more or less what I 
said below).  Bad editing on my part.



It would be OK if we were willing to drop the 1,2,4,8 segment sizes because
then the alignment would make sense and not change the current 16MB
sequence.


Well, that is true.  But the thing I'm trying to do here is to keep
this patch down to what actually needs to be changed in order to
accomplish the original purchase, not squeeze more and more changes
into it.


Attached is a patch to be applied on top of Beena's v8 patch that 
preserves LSNs in the file naming for all segment sizes.  It's not quite 
complete because it doesn't modify the lower size limit everywhere, but 
I think it's enough so you can see what I'm getting at.  This passes 
check-world and I've poked at it in other segment sizes as well manually.


Behavior for the current default of 16MB is unchanged, and all other 
sizes go through a logical progression.


1GB:
00010040
00010080
000100C0
00010001

256GB:

00010010
00010020
00010030
...
000100E0
000100F0
00010001

64GB:

000100010004
000100010008
00010001000C
...
0001000100F8
0001000100FC
00010001

I believe that maintaining an easy correspondence between LSN and 
filename is important.  The cluster will not always be up to help with 
these calculations and tools that do the job may not be present or may 
have issues.


I'm happy to merge this with Beena's patch (and tidy my patch up) if 
this looks like an improvement to everyone.


--
-David
da...@pgmasters.net
diff --git a/src/include/access/xlog_internal.h 
b/src/include/access/xlog_internal.h
index e3f616a..08d6494 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -93,10 +93,12 @@ typedef XLogLongPageHeaderData *XLogLongPageHeader;
 extern uint32 XLogSegSize;
 #define XLOG_SEG_SIZE XLogSegSize
 
-/* XLogSegSize can range from 1MB to 1GB */
-#define XLogSegMinSize 1024 * 1024
+/* XLogSegSize can range from 16MB to 1GB */
+#define XLogSegMinSize 1024 * 1024 * 16
 #define XLogSegMaxSize 1024 * 1024 * 1024
 
+#define XLogSegMinSizeBits 24
+
 /* Default number of min and max wal segments */
 #define DEFAULT_MIN_WAL_SEGS 5
 #define DEFAULT_MAX_WAL_SEGS 64
@@ -158,10 +160,14 @@ extern uint32 XLogSegSize;
 /* Length of XLog file name */
 #define XLOG_FNAME_LEN24
 
+#define XLogSegNoLower(logSegNo) \
+   (((logSegNo) % XLogSegmentsPerXLogId) * \
+   (XLogSegSize >> XLogSegMinSizeBits))
+
 #define XLogFileName(fname, tli, logSegNo) \
snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli,   \
 (uint32) ((logSegNo) / XLogSegmentsPerXLogId), \
-(uint32) ((logSegNo) % XLogSegmentsPerXLogId))
+(uint32) XLogSegNoLower(logSegNo))
 
 #define XLogFileNameById(fname, tli, log, seg) \
snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli, log, seg)
@@ -181,17 +187,18 @@ extern uint32 XLogSegSize;
 strcmp((fname) + XLOG_FNAME_LEN, ".partial") == 0)
 
 #define XLogFromFileName(fname, tli, logSegNo) \
-   do {
\
-   uint32 log; 
\
-   uint32 seg; 
\
-   sscanf(fname, "%08X%08X%08X", tli, , ); \
-   *logSegNo = (uint64) log * XLogSegmentsPerXLogId + seg; \
+   do {
\
+   uint32 log;  

Re: [HACKERS] Supporting huge pages on Windows

2017-03-24 Thread David Steele

Hi Ashutosh,

On 3/22/17 8:52 AM, Amit Kapila wrote:

On Wed, Mar 22, 2017 at 12:07 AM, David Steele <da...@pgmasters.net> wrote:


Amit, Magnus, you are signed up as reviewers for this patch.  Do you know
when you'll have a chance to take a look?



I have provided my feedback and I could not test it on my machine.  I
think Ashutosh Sharma has tested it.  I can give it another look, but
not sure if it helps.


I know you are not signed up as a reviewer, but have you take a look at 
this patch?


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cast result of copyNode()

2017-03-24 Thread David Steele

On 3/21/17 6:52 PM, Mark Dilger wrote:



On Mar 21, 2017, at 2:13 PM, David Steele <da...@pgmasters.net> wrote:

Hi Mark,

On 3/9/17 3:34 PM, Peter Eisentraut wrote:

On 3/7/17 18:27, Mark Dilger wrote:

You appear to be using a #define macro to wrap a function of the same name
with the code:

#define copyObject(obj) ((typeof(obj)) copyObject(obj))


Yeah, that's a bit silly.  Here is an updated version that changes that.


Do you know when you'll have a chance to take a look at the updated patch?


The patch applies cleanly, compiles, and passes all the regression tests
for me on my laptop.  Peter appears to have renamed the function copyObject
as copyObjectImpl, which struct me as odd when I first saw it, but I don't have
a better name in mind, so that seems ok.

If the purpose of this patch is to avoid casting so many things down to (Node 
*),
perhaps some additional work along the lines of the patch I'm attaching are
appropriate.  (This patch applies on top Peter's v2 patch).  The idea being to
keep objects as (Expr *) where appropriate, rather than casting through (Node *)
quite so much.

I'm not certain that this is (a) merely a bad idea, (b) a different idea than 
what
Peter is proposing, and as such should be submitted independently, or
(c) something that aught to be included in Peter's patch prior to commit.
I only applied this idea to one file, and maybe not completely in that file, 
because
I'd like feedback before going any further along these lines.


I have marked this "Waiting on Author" pending Peter's input.

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-24 Thread David Steele

Hi Vaishnavi,

On 3/19/17 9:32 PM, Vaishnavi Prabakaran wrote:

On Fri, Mar 17, 2017 at 12:37 AM, Daniel Verite  I would also like to hear Craig's opinion on it before applying this fix
> to the original patch, just to make sure am not missing anything here.


Craig?



Am going to include the test which you shared in the test patch. Please
let me know if you want to cover anymore specific cases to gain
confidence.


I have marked this submission "Waiting for Author" since it appears a 
new patch is required based on input and adding a new test.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-24 Thread David Steele

Hi Alexander,

On 3/16/17 1:35 PM, David Steele wrote:

On 2/21/17 9:54 AM, Bernd Helmle wrote:

Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov:

+1
And you could try to use pg_wait_sampling
<https://github.com/postgrespro/pg_wait_sampling> to sampling of wait
events.


I've tried this with your example from your blog post[1] and got this:

(pgbench scale 1000)

pgbench -Mprepared -S -n -c 100 -j 100 -T 300 -P2 pgbench2

SELECT-only:

SELECT * FROM profile_log ;
 ts |  event_type   | event | count
+---+---+---
 2017-02-21 15:21:52.45719  | LWLockNamed   | ProcArrayLock | 8
 2017-02-21 15:22:11.19594  | LWLockTranche | lock_manager  | 1
 2017-02-21 15:22:11.19594  | LWLockNamed   | ProcArrayLock |24
 2017-02-21 15:22:31.220803 | LWLockNamed   | ProcArrayLock | 1
 2017-02-21 15:23:01.255969 | LWLockNamed   | ProcArrayLock | 1
 2017-02-21 15:23:11.272254 | LWLockNamed   | ProcArrayLock | 2
 2017-02-21 15:23:41.313069 | LWLockNamed   | ProcArrayLock | 1
 2017-02-21 15:24:31.37512  | LWLockNamed   | ProcArrayLock |19
 2017-02-21 15:24:41.386974 | LWLockNamed   | ProcArrayLock | 1
 2017-02-21 15:26:41.530399 | LWLockNamed   | ProcArrayLock | 1
(10 rows)

writes pgbench runs have far more events logged, see the attached text
file. Maybe this is of interest...


[1] http://akorotkov.github.io/blog/2016/03/25/wait_monitoring_9_6/


This patch applies cleanly at cccbdde.  It doesn't break compilation on
amd64 but I can't test on a Power-based machine

Alexander, have you had a chance to look at Bernd's results?


I'm marking this submission "Waiting for Author" as your input seems to 
be required.


This thread has been idle for a week.  Please respond and/or post a new 
patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-24 Thread David Steele

On 3/16/17 1:54 AM, vinayak wrote:


On 2017/03/16 14:46, Haribabu Kommi wrote:


As the view name already contains WAL, I am not sure
whether is it required to include WAL in every column?
I am fine to change if others have the same opinion of
adding WAL to column names.


Ok.


So what is the status of this patch now?  Should it be marked "Ready for 
Committer"?


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

2017-03-24 Thread David Steele

Hi Pritam,

On 3/17/17 5:41 PM, Pritam Baral wrote:


So sorry. I'm attaching the correct version of the original with this,
in case you want to test the limited implementation, because I still
have to go through Tom's list of suggestions.

BTW, the patch is for applying on top of REL9_6_2, and while I
suspect it may work on master too, I haven't tested it since the
original submission (Feb 23).


Also, I noticed that patch haven't regression tests.  Some mention of
this optimization in docs is also nice to have.  > > -- >
Alexander Korotkov > Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company


This thread has been idle for a week.  Please respond and/or post a new 
patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-03-24 Thread David Steele

Hi Dmitry,

On 3/21/17 4:42 PM, Dmitry Dolgov wrote:

On 21 March 2017 at 18:16, David Steele <da...@pgmasters.net

<mailto:da...@pgmasters.net>> wrote:


This thread has been idle for over a week.


Yes, sorry for the late reply. I'm still trying to find a better
solution for
some of the problems, that arose in this patch.


On 15 March 2017 at 00:10, Tom Lane <t...@sss.pgh.pa.us

<mailto:t...@sss.pgh.pa.us>> wrote:

Dmitry Dolgov <9erthali...@gmail.com <mailto:9erthali...@gmail.com>>

writes:


I looked through this a bit.


Do you have an idea when you will have a patch ready?  We are now into 
the last week of the commitfest.  I see one question for Tom, but it's 
not clear that this would prevent you from producing a new patch.


Please post a new patch by 2017-03-28 00:00 AoE (UTC-12) or this 
submission will be marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]

2017-03-24 Thread David Steele

Hi Kaigai,

On 3/21/17 1:11 PM, David Steele wrote:

On 3/13/17 3:25 AM, Jeevan Chalke wrote:


I have reviewed this patch further and here are my comments:


This thread has been idle for over a week.  Please respond and/or post a
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".


This submission has been marked "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest.


Regards,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-03-24 Thread David Steele

Hi Ivan,

On 3/21/17 1:06 PM, David Steele wrote:

Hi Ivan,

On 3/12/17 10:20 PM, Thomas Munro wrote:

On Fri, Mar 10, 2017 at 1:49 AM, Ivan Kartyshov
<i.kartys...@postgrespro.ru> wrote:

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the
discretion
of the community.

Will be glad to hear your suggestion about syntax, code and patch.


Hi Ivan,

Here is some feedback based on a first read-through of the v4 patch.
I haven't tested it yet.


This thread has been idle for over a week.  Please respond and/or post a
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".


This submission has been marked "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest.


Regards,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] kNN for SP-GiST

2017-03-24 Thread David Steele

On 3/21/17 11:45 AM, David Steele wrote:

Hi Nikita,

On 3/9/17 8:52 AM, Alexander Korotkov wrote:


I take a look to this patchset.  My first notes are following.


This thread has been idle for quite a while.  Please respond and/or post
a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".


This submission has been marked "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest.


Regards,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding the optional clause 'AS' in CREATE TRIGGER

2017-03-24 Thread David Steele

Hi Naoki,

On 3/17/17 11:58 AM, Surafel Temesgen wrote:


I am sending the review of this patch


This thread has been idle for a week.  Please respond and/or post a new 
patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked 
"Returned with Feedback".


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread David Steele

On 3/23/17 4:45 PM, Peter Eisentraut wrote:

On 3/22/17 17:33, David Steele wrote:

I think if we don't change the default size it's very unlikely I would
use alternate WAL segment sizes or recommend that anyone else does, at
least in v10.

I simply don't think it would get the level of testing required to be
production worthy


I think we could tweak the test harnesses to run all the tests with
different segment sizes.  That would get pretty good coverage.


I would want to see 1,16,64 at a minimum.  More might be nice but that 
gets a bit ridiculous at some point.  I would be fine with different 
critters having different defaults.  I don't think that each critter 
needs to test each value.



More generally, the methodology that we should not add an option unless
we also change the default because the option would otherwise not get
enough testing is a bit dubious.


Generally, I would agree, but I think this is a special case.  This 
option has been around for a long time and we are just now exposing it 
in a way that's useful to end users.  It's easy to see how various 
assumptions may have arisen around the default and led to code that is 
not quite right when using different values.  Even if that's not true in 
the backend code, it might affect bin, and certainly affects third party 
tools.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread David Steele

On 3/24/17 12:27 AM, Peter Eisentraut wrote:

On 3/23/17 16:58, Stephen Frost wrote:

The backup tools need to also get the LSN from the pg_stop_backup and
verify that they have the WAL file associated with that LSN.


There is a function for that.


They also
need to make sure that they have all of the WAL files between the
starting LSN and the ending LSN.  Doing that requires understanding how
the files are named to make sure there aren't any missing.


There is not a function for that, but there could be one.


A function would be nice, but tools often cannot depend on the database 
being operational so it's still necessary to re-implement them.  Having 
a sane sequence in the WAL makes that easier.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-22 Thread David Steele

Hi Robert,

On 3/22/17 3:45 PM, Robert Haas wrote:

On Wed, Mar 22, 2017 at 3:24 PM, David Steele <da...@pgmasters.net> wrote:

One of the reasons to go with the LSN is that we would actually be
maintaining what happens when the WAL files are 16MB in size.

David's initial expectation was this for 64MB WAL files:

00010040
00010080
000100CO
00010001



This is the 1GB sequence, actually, but idea would be the same for 64MB
files.


Wait, really?  I thought you abandoned this approach because there's
then no principled way to handle WAL segments of less than the default
size.


I did say that, but I thought I had hit on a compromise.

But, as I originally pointed out the hex characters in the filename are 
not aligned correctly for > 8 bits (< 16MB segments) and using different 
alignments just made it less consistent.


It would be OK if we were willing to drop the 1,2,4,8 segment sizes 
because then the alignment would make sense and not change the current 
16MB sequence.


Even then, there are some interesting side effects.  For 1GB segments 
the "0001000100C0" segment would include LSNs 1/C000 
through 1/.  This is correct but is not an obvious filename to 
LSN mapping, at least for LSNs that appear later in the segment.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-22 Thread David Steele

On 3/22/17 4:42 PM, Peter Eisentraut wrote:

On 3/22/17 15:14, Stephen Frost wrote:

-SELECT * FROM pg_stop_backup(false);
+SELECT * FROM pg_stop_backup(false [, true ]);

I think that it's better to get rid of "[" and "]" from the above because
IMO this should be the command example that users actually can run.

Using the '[' and ']' are how all of the optional arguments are
specified in the documentation, see things like current_setting() in our
existing documentation:


In the synopsis, but not in concrete examples.


Wasn't quite sure what the protocol was in this case, and figured it was 
easier to subtract than to add.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-22 Thread David Steele

On 3/22/17 3:39 PM, Peter Eisentraut wrote:

On 3/22/17 15:37, Peter Eisentraut wrote:

If changing WAL sizes catches on, I do think we should keep thinking
about a new format for a future release,


I think that means that I'm skeptical about changing the default size
right now.


I think if we don't change the default size it's very unlikely I would 
use alternate WAL segment sizes or recommend that anyone else does, at 
least in v10.


I simply don't think it would get the level of testing required to be 
production worthy and I doubt that most tool writers would be quick to 
add support for a feature that very few people (if any) use.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-22 Thread David Steele

On 3/22/17 3:09 PM, Stephen Frost wrote:

* Robert Haas (robertmh...@gmail.com) wrote:

On Wed, Mar 22, 2017 at 1:49 PM, Stephen Frost  wrote:

Then perhaps we do need to be thinking of moving this to PG11 instead of
exposing an option that users will start to use which will result in WAL
naming that'll be confusing and inconsistent.  I certainly don't think
it's a good idea to move forward exposing an option with a naming scheme
that's agreed to be bad.




One of the reasons to go with the LSN is that we would actually be
maintaining what happens when the WAL files are 16MB in size.

David's initial expectation was this for 64MB WAL files:

00010040
00010080
000100CO
00010001


This is the 1GB sequence, actually, but idea would be the same for 64MB 
files.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch proposal

2017-03-22 Thread David Steele

On 3/21/17 8:45 PM, Venkata B Nagothi wrote:

On Tue, Mar 21, 2017 at 8:46 AM, David Steele <da...@pgmasters.net

Unfortunately, I don't think the first patch (recoveryStartPoint)
will work as currently implemented.  The problem I see is that the
new function recoveryStartsHere() depends on pg_control containing a
checkpoint right at the end of the backup.  There's no guarantee
that this is true, even if pg_control is copied last.  That means a
time, lsn, or xid that occurs somewhere in the middle of the backup
can be selected without complaint from this code depending on timing.


Yes, that is true.  The latest best position, checkpoint position, xid
and timestamp of the restored backup of the backup is shown up in the
pg_controldata, which means, that is the position from which the
recovery would start.


Backup recovery starts from the checkpoint in the backup_label, not from 
the checkpoint in pg_control.  The original checkpoint that started the 
backup is generally overwritten in pg_control by the end of the backup.



Which in-turn means, WALs start getting replayed
from that position towards --> minimum recovery position (which is the
end backup, which again means applying WALs generated between start and
the end backup) all the way through to  --> recovery target position.


minRecoveryPoint is only used when recovering a backup that was made 
from a standby.  For backups taken on the master, the backup end WAL 
record is used.



The best start position to check with would the position shown up in the
pg_control file, which is way much better compared to the current
postgresql behaviour.


I don't agree, for the reasons given previously.


The tests pass (or rather fail as expected) because they are written
using values before the start of the backup.  It's actually the end
of the backup (where the database becomes consistent on recovery)
that defines where PITR can start and this distinction definitely
matters for very long backups.  A better test would be to start the
backup, get a time/lsn/xid after writing some data, and then make
sure that time/lsn/xid is flagged as invalid on recovery.

The current behaviour is that, if the XID shown-up by the pg_controldata
is say 1400 and the specified recovery_target_xid is say 200, then,
postgresql just completes the recovery and promotes at the immediate
consistent recovery target, which means, the DBA has to all the way
start the restoration and recovery process again to promote the database
at the intended position.


I'm not saying that the current behavior is good, only that the proposed 
behavior is incorrect as far as I can tell.



It is also problematic to assume that transaction IDs commit in
order. If checkPoint.latestCompletedXid contains 5 then a recovery
to xid 4 may or may not be successful depending on the commit
order.  However, it appears in this case the patch would disallow
recovery to 4.

If the txid 4 is the recovery target xid, then, the appropriate backup
taken previous to txid 4 must be used or as an alternative recovery
target like recovery_target_timestamp must be used to proceed to the
respective recovery target xid.


I'm not sure I follow you here, but I do know that using the last 
committed xid says little about which xids precede or follow it.



I think this logic would need to be baked into recoveryStopsAfter()
and/or recoveryStopsBefore() in order to work.  It's not clear to me
what that would look like, however, or if the xid check is even
practical.


The above two functions would determine if the recovery process has to
stop at a particular position or not and the patch has nothing to do
with it.

To summarise, this patch only determines if the WAL replay has to start
at all. In other words, if the specified recovery target falls prior to
the restored database position, then, the WAL replay should not start,
which prevent the database from getting promoted at an unintended
recovery target.


I understand what you are trying to do.  My point is that the 
information you are working from (whatever checkpoint happens to be in 
pg_control when recovery starts) is not sufficient for the task.  This 
method is inaccurate and would disallow valid recovery scenarios.


I suggest doing a thorough read-through of StartupXLOG(), particularly 
the section where the backup_label is loaded.


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-21 Thread David Steele

On 3/21/17 2:34 PM, Fujii Masao wrote:

On Tue, Mar 21, 2017 at 11:03 AM, Tsunakawa, Takayuki
<tsunakawa.ta...@jp.fujitsu.com> wrote:

From: David Steele [mailto:da...@pgmasters.net]

Well, that's embarrassing.  When I recreated the function to add defaults
I messed up the AS clause and did not pay attention to the results of the
regression tests, apparently.

Attached is a new version rebased on 88e66d1.  Catalog version bump has
also been omitted.


I was late to reply because yesterday was a national holiday in Japan.  I 
marked this as ready for committer.  The patch applied cleanly and worked as 
expected.


The patch basically looks good to me, but one comment is;
backup.sgml (at least the description for "Making a non-exclusive
low level backup) seems to need to be updated.


Agreed.  Added in the attached patch and rebased on 8027556.

Thanks!
--
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 2d67521..c04eb46 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -887,7 +887,7 @@ SELECT pg_start_backup('label', false, false);
 
  In the same connection as before, issue the command:
 
-SELECT * FROM pg_stop_backup(false);
+SELECT * FROM pg_stop_backup(false [, true ]);
 
  This terminates the backup mode and performs an automatic switch to
  the next WAL segment.  The reason for the switch is to arrange for
@@ -895,6 +895,15 @@ SELECT * FROM pg_stop_backup(false);
  ready to archive.
 
 
+ If the backup process monitors the WAL archiving process independently,
+ the second parameter (which defaults to true) can be set to false to
+ prevent pg_stop_backup from blocking until all WAL is
+ archived.  Instead, the function will return as soon as the stop backup
+ record is written to the WAL.  This option must be used with caution:
+ if WAL archiving is not monitored correctly then the result might be a
+ useless backup.
+
+
  The pg_stop_backup will return one row with three
  values. The second of these fields should be written to a file named
  backup_label in the root directory of the backup. The
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9408a25..4dc30ca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18355,7 +18355,7 @@ SELECT set_config('log_statement_stats', 'off', false);
   
   

-pg_stop_backup(exclusive 
boolean)
+pg_stop_backup(exclusive 
boolean , wait_for_archive boolean 
)
 
setof record
Finish performing exclusive or non-exclusive on-line backup 
(restricted to superusers by default, but other users can be granted EXECUTE to 
run the function)
@@ -18439,7 +18439,13 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_start_backup. In a non-exclusive backup, the contents of
 the backup_label and tablespace_map are returned
 in the result of the function, and should be written to files in the
-backup (and not in the data directory).
+backup (and not in the data directory).  There is an optional second
+parameter of type boolean.  If false, the pg_stop_backup
+will return immediately after the backup is completed without waiting for
+WAL to be archived.  This behavior is only useful for backup
+software which independently monitors WAL archiving. Otherwise, WAL
+required to make the backup consistent might be missing and make the backup
+useless.

 

diff --git a/src/backend/access/transam/xlogfuncs.c 
b/src/backend/access/transam/xlogfuncs.c
index 96aa15e..1ef9f2b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -190,6 +190,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
boolnulls[3];
 
boolexclusive = PG_GETARG_BOOL(0);
+   boolwait_for_archive = PG_GETARG_BOOL(1);
XLogRecPtr  stoppoint;
 
/* check to see if caller supports us returning a tuplestore */
@@ -232,7 +233,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 * Stop the exclusive backup, and since we're in an exclusive 
backup
 * return NULL for both backup_label and tablespace_map.
 */
-   stoppoint = do_pg_stop_backup(NULL, true, NULL);
+   stoppoint = do_pg_stop_backup(NULL, wait_for_archive, NULL);
exclusive_backup_running = false;
 
nulls[1] = true;
@@ -250,7 +251,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 * Stop the non-exclusive backup. Return a copy of the backup 
label
 * and tablespace map so they can be written to disk by the 
caller.
 */
-   stoppoint = do_pg_stop_backup(label_file->data, true, NULL);
+   stoppoint = do_pg_stop_backup(label_file->data, 
wait_for_archive, NULL);
nonexclusive_backu

[HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-03-21 Thread David Steele

Hi Alexander,

On 3/11/17 7:06 AM, Pavel Stehule wrote:


I am sending a updated version with separated sort direction in special
variable

There is a question. Has desc direction sense for columns like schema or
table name?

Using desc, asc for size is natural. But for tablename?


Do you know when you'll have a chance to review the updated patch?

Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread David Steele

On 3/21/17 3:22 PM, Robert Haas wrote:

On Tue, Mar 21, 2017 at 9:04 AM, Stephen Frost  wrote:

In short, I'm also concerned about this change to make WAL file names no
longer match up with LSNs and also about the odd stepping that you get
as a result of this change when it comes to WAL file names.


OK, that's a bit surprising to me, but what do you want to do about
it?  If you take the approach that Beena did, then you lose the
correspondence with LSNs, which is admittedly not great but there are
already helper functions available to deal with LSN -> filename
mappings and I assume those will continue to work. If you take the
opposite approach, then WAL filenames stop being consecutive, which
seems to me to be far worse in terms of user and tool confusion.


They are already non-consecutive.  Does 00010002 really 
logically follow 0001000100FF?  Yeah, sort of, if you know 
the rules.



Also
note that, both currently and with the patch, you can also reduce the
WAL segment size.  David's proposed naming scheme doesn't handle that
case, I think, and I think it would be all kinds of a bad idea to use
one file-naming approach for segments < 16MB and a separate approach
for segments > 16MB.  That's not making anything easier for users or
tool authors.


I believe it does handle that case, actually.  The minimum WAL segment 
size is 1MB so they would increase like:


00010001
000100010010
000100010020
...
00010001FFF0

You could always calculate the next WAL file by adding 
(wal_seg_size_in_mb << 20) to the previous WAL file's LSN.  This would 
even work for WAL segments > 4GB.  Overall, I think this would make 
calculating WAL ranges simpler than it is now.


The biggest downside I can see is that this would change the naming 
scheme for the default of 16MB compared to previous versions of 
Postgres.  However, for all other wal-seg-size values changes would need 
to be made anyway.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-03-21 Thread David Steele

On 3/21/17 2:31 PM, Andrew Dunstan wrote:

On 03/21/2017 01:37 PM, David Steele wrote:

>>

This thread has been idle for months since Tom's review.

The submission has been marked "Returned with Feedback".  Please feel
free to resubmit to a future commitfest.


Please revive. I am planning to look at this later this week.


Revived in "Waiting on Author" state.

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we cacheline align PGXACT?

2017-03-21 Thread David Steele

Hi Alexander

On 3/10/17 8:08 AM, Alexander Korotkov wrote:


Results look good for me.  Idea of committing both of patches looks
attractive.
We have pretty much acceleration for read-only case and small
acceleration for read-write case.
I'll run benchmark on 72-cores machine as well.


Have you had a chance to run those tests yet?

Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   >